Though computer scientists agree that conference publications enjoy greater status in computer science than in other disciplines, there is little quantitative evidence to support this view. The importance of journal publication in academic promotion makes it a highly personal issue, since focusing exclusively on journal papers misses many significant papers published by CS conferences.
Here, we aim to quantify the relative importance of CS journal and conference papers, showing that CS papers in leading conferences match the impact of papers in mid-ranking journals and surpass the impact of papers in journals in the bottom half of the Thompson Reuters rankings (http://www.isiknowledge.com) for impact measured in terms of citations in Google Scholar. We also show that poor correlation between this measure and conference acceptance rates indicates conference publication is an inefficient market where venues equally challenging in terms of rejection rates offer quite different returns in terms of citations.
How to measure the quality of academic research and performance of particular researchers has always involved debate. Many CS researchers feel that performance assessment is an exercise in futility, in part because academic research cannot be boiled down to a set of simple performance metrics, and any attempt to introduce them would expose the entire research enterprise to manipulation and gaming. On the other hand, many researchers want some reasonable way to evaluate academic performance, arguing that even an imperfect system sheds light on research quality, helping funding agencies and tenure committees make more informed decisions.
One long-standing way of evaluating academic performance is through publication output. Best practice for academics is to write key research contributions as scholarly articles for submission to relevant journals and conferences; the peer-review model has stood the test of time in determining the quality of accepted articles. However, today’s culture of academic publication accommodates a range of publication opportunities yielding a continuum of quality, with a significant gap between the lower and upper reaches of the continuum; for example, journal papers are routinely viewed as superior to conference papers, which are generally considered superior to papers at workshops and local symposia. Several techniques are used for evaluating publications and publication outlets, mostly targeting journals. For example, Thompson Reuters (the Institute for Scientific Information) and other such organizations record and assess the number of citations accumulated by leading journals (and some high-ranking conferences) in the ISI Web of Knowledge (http://www.isiknowledge.com) to compute the impact factor of a journal as a measure of its ability to attract citations. Less-reliable indicators of publication quality are also available for judging conference quality; for example, a conference’s rejection rate is often cited as a quality indicator2,8 on the grounds that a high rejection rate means a more selective review process able to generate higher-quality papers. However, as the devil is in the details, the details in this case vary among academic disciplines and subdisciplines.
Here, we examine the issue of publication quality from a CS/engineering perspective, describing how related publication practices differ from those of other disciplines, in that CS/engineering research is mainly published in conferences rather than in journals. This culture presents an important challenge when evaluating CS research because traditional impact metrics are better suited to evaluating journal rather than conference publications.
In order to legitimize the role of conference papers to the wider scientific community, we offer an impact measure based on an analysis of Google Scholar citation data suited to CS conferences. We validate this new measure with a large-scale experiment covering 8,764 conference and journal papers to demonstrate a strong correlation between traditional journal impact and our new citation score. The results highlight how leading conferences compare favorably to mid-ranking journals, surpassing the impact of journals in the bottom half of the traditional ISI Web of Knowledge ranking. We also discuss a number of interesting anomalies in the CS conference circuit, highlighting how conferences with similar rejection rates (the traditional way of evaluating conferences) can attract quite different citation counts. We also note interesting geographical distinctions in this regard, particularly with respect to European and U.S. conferences.
Publication Practice
CS is a relatively new field of study (the first schools of CS emerged as recently as the 1980s) and can be differentiated from many other academic disciplines in important ways. The quick pace of innovation in the field has yielded to unusual publication practices, at least by the standards of other more-traditional disciplines. CS conference papers are considered a more important form of publication than is generally the case in other scientific disciplines.14 When computer scientists have interesting or significant research results to report they prepare a conference paper for the community’s international conference; for example, the International Joint Conferences on Artificial Intelligence (http://www.ijcai.org/), the Association for the Advancement of Artificial Intelligence (http://www.aaai.org) for researchers in artificial intelligence, and the International Conference on Computational Linguistics (http://www.coling-2010.org) for researchers in computational linguistics. Research accepted for publication at a conference normally counts as “archival” for that research.
The research might also be published in extended form in a journal; for example the journal Artificial Intelligence in Medicine publishes extended conference papers from related medical conferences. CS conference papers are usually submitted as full papers and undergo a comprehensive peer-review evaluation involving from three to five external reviewers. As a result, the leading CS conferences could have very high rejection rates, resulting in high-quality published papers that attract considerable attention (and citations) from other researchers. This contrasts with the role of conferences in other disciplines where conference papers are usually extended abstracts not subject to the full rigor of peer review, rarely attracting the same level of critical attention from other researchers. In these other disciplines, journal papers are the only archival publications.
This matters due to the important role publications play in academic promotion and other forms of research assessment. When such assessment spans multiple disciplines, it is not unusual for conference papers to be excluded, with only journal papers (and perhaps books and book chapters) judged to be eligible. Given the preponderance of conference papers in CS, computer scientists are at a disadvantage relative to their peers in other disciplines, exacerbated by the fact there is often a complete lack of understanding of the other points of view on the issue. Many computer scientists feel conferences are a timely and appropriate means of disseminating research results, with some viewing publishing in journals as somewhat superfluous; evidence is the relatively low ISI Web of Knowledge impact factors for CS journals.1
In 2007, the median impact factor reported by the ISI Web of Knowledge for CS journals was 0.83, considerably lower than the factors reported in physics (1.28), medicine (1.46), and biology (1.36). The tradition of publishing work in a conference venue rather than a journal is advantageous to CS researchers for many reasons: First, the research tends to be fast-paced, and the “publish or perish” culture prevails. The timeframe from submission to publication release for a conference is often less than half that of a journal, allowing the latest findings to become public knowledge more quickly. Another CS trend is the sharing of findings with other researchers,13 facilitated by the peer-review model where feedback is provided through oral presentations at conferences and the physical colocation of experts at the same time. Researchers in other fields feel that if the research had merit it would have been published in a journal, arguing the solution for computer scientists should be journal publication.
The CS viewpoint is greatly weakened by the obvious variability among CS conferences and lack of any real objective measure comparable to journal impact as a way to evaluate conferences. Some leading conferences have rejection rates as high as 90%, while others reject significantly fewer papers than they accept. However, the rejection rate of a conference, the argument usually goes, does not adequately measure quality. A more objective measure of conference quality is needed, one that is readily computed and approximates a measure of conference impact.
Methodology
Until recently, the ability to offer large-scale bibliographic database services (such as type of citation analysis in journal-impact assessment) was limited to organizations like Thompson Reuters that maintained citation databases of thousands of academic journals. This service is available online via the ISI Web of Knowledge Web site (http://www.isiknowledge.com), giving researchers and other interested parties access to a range of bibliographic and citation-analysis services. However, service coverage is limited, focusing mainly on journals while excluding many common publications targeted by CS researchers.
CiteSeer (http://citeseer.ist.psu.edu), Scopus (http://www.scopus.com), and Google Scholar (http://scholar.google.com/), as well as other services have addressed this gap by maintaining online citation databases providing better coverage of both conference proceedings and journals11; for example, Google Scholar automatically extracts citation information from a range of online publications sources, from large-scale professional services (such as Springer-Link) to publication lists maintained by individual researchers on personal homepages. As a result, large-scale citation analyses can be performed by mining the Google Scholar database. Here, we describe one such study (conducted in 2008) covering more than 8,000 articles from a range of conferences and journals in the interests of developing a measure of impact based solely on Google Scholar data.
ISI Web of Knowledge impact. By far the most popular single score used to assess the impact of research journals is the ISI Web of Knowledge impact factor, which is based on the seminal work on citation analysis.6 In any given year a journal’s ISI Web of Knowledge impact is a ratio of the average number of citations in the previous year compared to articles in the two preceding years; for example, in 2008, the ISI Web of Knowledge impact factor for a journal was the average number of citations in 2007 publications to published papers from the same journal in 2005 and 2006. We used the average of three recently published ISI Web of Knowledge scores: the ISI Web of Knowledge impact factors from 2005, 2006, and 2007.
The purpose of the ISI Web of Knowledge Impact factor—assign a numeric score to the quality of scientific publications—has also drawn criticism over the years,3,4,5,9,16 including:
- Poor correlation between impact factor and citation count for individual articles;
- No correction for self-citation, either at author or journal level;
- Incomplete coverage of the database;
- A score based only on a short time period; and
- Long articles with long lists of references (such as to review papers), biasing the score.
Our aim is not to criticize the ISI Web of Knowledge impact factor but show that our impact factor (based on Google Scholar) correlates well with it. Table 1 includes a sample of the ISI Web of Knowledge impact factors for a subset of CS journals, along with the Google Scholar impact factors for the same journals. Note, too, that the journals span a range of ISI Web of Knowledge impact factors, with the top-ranking International Journal of Computer Vision earning an impact of 4.37, while Artificial Intelligence for Engineering Design, Analysis and Manufacturing has a more modest impact of 0.27. In the period we considered—2005 to 2007—the median ISI Web of Knowledge impact factor for CS journals varied from 0.80 to 0.84. Given this median, we divided the journals into three tiers according to their impact factors: high-ranking (A*) journals with impact factors ≥ 2; medium-ranking (A) journals with impact factors ≥ 0:9 but < 2; and low-ranking (B) journals with impact factors < 0:9. So our A and A* categories represent journals in the top half of the ISI Web of Knowledge ranking. CS journals not in the ISI Web of Knowledge ranking (there are many) are not considered.
Google Scholar and DBLP. Google Scholar aims to bring Google-like search to bibliographic data, providing a familiar Google-search-style interface allowing users to locate research articles based on a range of feature filters, including for “article” (such as author and title), “publication” (such as type, year, and name), and “subject.” Google Scholar responds to queries with a result list in which each result corresponds to a particular article. For example, Figure 1 includes results for a search for articles by Marvin Minsky, each including information about article type, title, author(s), and publication, as well as citations. For example, the first result, corresponding to Minsky’s famous “A Framework for Representing Knowledge” article,12 indicates Google Scholar found 3,350 citations to the article in its database. Moreover, following the citation link leads directly to these 3,350 citations.
Google Scholar provides access to the necessary bibliographic data to perform a large-scale citation analysis of a range of CS conferences and journals,11 requiring only a list of articles to seed the analysis and is where a service like the Digital Bibliography & Library Project7 (DBLP, http://www.informatik.uni-trier.de/~ley/db/) fits in (see Figure 2). DBLP is a publication database of publication records, documenting papers published in a range of conferences and journals, as well as some workshops and symposia. DBLP does not provide citation data but can be used to provide a suitable list of seed articles for analysis.
Measure of impact based on Google Scholar. Our 2008 study focused on articles from conferences and journals in artificial intelligence and machine learning for two reasons: Both are mature, active areas of CS research, and both are familiar to the authors, proving useful when selecting appropriate seed conferences and journals and independently verifying the results of the study. We focused on 15 conferences and 15 journals circa 20002003 (see Tables 1 and 2), including first-, second-, and third-tier venues roughly in line with ISI rankings. All listed conferences and journals were covered by DBLP: The 15 conferences provided access to 3,258 articles; the 15 journals provided access to 5,506 articles.
We extracted each article from the DBLP XML records (http://dblp.uni-trier.de/xml/) and submitted them to Google Scholar. Using Google Scholar’s advanced-search options, we constructed queries to return the Google Scholar entry for each seed article. We removed all punctuation and nonstandard characters from titles and converted year of publication into a two-year window, rather than a fixed year. Each query included title, author name(s), and year of publication. Google Scholar returned 50 potential matches. An article was identified when the title, year, and at least one author name matched; once identified, we extracted the citation data for each article by iterating through the list of citing articles to produce our Google Scholar impact factor. The “Cited By x” figure reported by Google for each article was often inaccurate so we retrieved each citing article, checking and counting it in turn. The number reported was usually one or two fewer than the number in the list. We took care not to overload the system with requests as we gathered the data over a few weeks in 2008. Google Scholar compiles its repository from professional services and researcher-maintained pages. Google Scholar successfully returned citation lists for 89.5% of our seed set, with the missing articles evenly distributed between the journal and conference seed article sets. The result was that each article located through Google Scholar was associated with a total number of citations, which could then be aggregated for individual conferences (or conference series) and journals.
Table 1 lists both the ISI impact factors and our Google Scholar impact factors for the 15 test journals for which we had reliable impact factors from ISI. A key question concerned the strength of the correlation between the ISI and Google Scholar impact factors. If we found no strong correlation, we could not be confident that the Google Scholar impact factor captured the essence of the impact. Figure 3 plots the impact factors, showing they were well correlated. The Pearson correlation coefficient between them is 0.86, a result supported by similar work comparing h-index scores from Google Scholar and ISI Web of Knowledge by Meho and Rogers.10 Given that we were interested in ranking publications based on these factors, or ranking as primary objective, the Spearman rank correlation may be a more relevant statistic. We calculated a rank correlation of 0.88 between impact factors for all journals in the study.
The strength of the correlation suggests the Google Scholar impact factor indeed captured the essence of ISI impact, the gold-standard for journal ranking. However, the benefit of our Google Scholar impact factor was its use in evaluating both conferences and journals (based on Google Scholar data) to systematically compare journal and conference paper impact.
Journal and Conference Citations Compared
Earlier we proposed a straightforward metric for assessing the impact of scientific publications based an aggregate citation count using Google Scholar citation data, validating it through a strong correlation between it and the more standard ISI Web of Knowledge impact factor across a validation set of journals. Here, we describe the Google Scholar results over the larger set of conferences and journals used as the basis of our more comprehensive analysis.
We divided our validation set of journals into three basic classes—A*, A, and B—according to their ISI Web of Knowledge impact factors. Having computed the Google Scholar impact factors we were able to place them on a continuum; Figure 4 outlines the positions of the 15 test conference series on the same continuum for a more direct comparison between journals and conferences, showing that many conferences under evaluation performed well compared to the benchmark journals. The A* journals stood out in their ability to attract citations, and the leading journals, including the Journal of the ACM, Pattern Analysis and Machine Intelligence, and Machine Learning, achieved Google Scholar impact factor of 30 and above. Also interesting is how well many of the conferences performed, particularly in relation to the A journals.
Any assumed relationship between conference rejection rates and a conference’s ability to attract citations is weak, at best.
Atop the conference ranking was the Association for the Advancement of Artificial Intelligence with a Google Scholar impact factor of 20, placing it in the lower part of the A* journals and comparing favorably with the Journal of Artificial Intelligence. The A journals corresponded to Google Scholar impact factors of 8 to 19, accommodating a range of CS conferences, from the International Joint Conference on Artificial Intelligence and the International Conference on Adaptive Hypermedia and Adaptive Web-based Systems, with Google Scholar impact factors of 17, down to the European Conference on Case-Based Reasoning and the Text Retrieval Conference with Google Scholar impact factors of 8.
Conference Impact and Rejection Rates
One commonly held view in the academic community is that conference rejection rates are a useful proxy for future impact15 Indeed, rejection rates are sometimes accepted for this purpose in various research-assessment exercises, including for academic promotion. It is useful then to consider the relationship between conference rejection rates and expected citation count (based on our Google Scholar impact factor) to see whether this view holds up. Figure 5 is a scatter plot of Google Scholar impact factor against conference rejection rates for 23 conferences across the 15 conference series being evaluated; note the data points reflect a subset of the full set of conferences, namely those for which we obtained reliable rejection rates; for more on the rejection rates, see http://lorcancoyle.org/research/citationanalysis.
While our study results indicate some correlation between Google Scholar impact factor and rejection rates, the related Pearson correlation score of 0.54 was not very convincing, reflecting considerable variation in the relationship between conference rejection rates and a conference paper’s ability to attract citations. Conferences with similar rejection rates often achieve very different Google Scholar impact factors. Some conferences with very different rejection rates also still manage to achieve similar impact factors. For instance, the Association for the Advancement of Artificial Intelligence conference achieved a Google Scholar impact factor of 20, with a rejection rate of 65%75%. Its European counterpart, the European Conference on Artificial Intelligence is equally selective but achieves a median Google Scholar impact factor of only 7 (see Table 3).
Another example of a lack of correlation between paper acceptance rate and citation count was the 2002 European Conference on Case-Based Reasoning, which had a rejection rate of approximately 33% and Google Scholar impact factor of 7, achieving a citation rate better than the European Conference on Machine Learning 2000 and the European Conference on Artificial Intelligence 2002, which had twice its rejection rate.
Note the apparent bias toward well-cited U.S. conferences over their similarly selective though less-well-cited European counterparts. The AAAI-ECAI example is a case in point, with both conferences targeting the same research area and attracting submissions from a similar community of researchers in a way that was equally selective. Yet the U.S.-centric AAAI enjoys an expected citation count (computed from the product of the median citation count and the rejection rate of the conference) more than twice that of ECAI.
This apparent regional bias is also evident in another pair of related conferences: the International Conference on Machine Learning and ECML (see Table 4). Once again, a more U.S.-centric conference series—ICML—attracts twice as many citations as a similarly selective Euro-centric ECML conference series. Possible explanations include: Non-European researchers are likely to miss publications at European venues (such as ECAI and ECML), so papers at these conferences pick up references from only European researchers, and a pre-selection process allows researchers to hold their best work for the international conference.
To further test the strength of the correlation between Google Scholar impact factor and rejection rate we examined the available data for three conference series that took place in each of the four years covered by our study; UAI, ICML, and ECML were the only conferences with published rejection rates available for every year of the study. Table 5 outlines the Google Scholar impact factors and rejection rates for each year in the study. There was no significant correlation between rejection rate and Google Scholar impact factor. The Pearson score for UAI was only 0.27, for ICML 0.24, and for ECML 0.42, suggesting that, at least for these conferences, the annual change in rejection rates had little bearing on expected citation count.
These results highlight the fact that any assumed relationship between conference rejection rates and a conference’s ability to attract citations is weak, at best, so other factors play a more important role in attracting future citations.
Conclusion
Evaluating research output is complex and contentious. Funding agencies need comprehensive research evaluation metrics to be able to objectively assess the return on their research investment, while academic institutions increasingly rely on such metrics to guide their academic promotions. In terms of publication output there is general consensus among CS researchers as to the importance of citations when evaluating research papers. For example, the ISI Web of Knowledge maintains comprehensive records of the citations attracted by leading academic journals, providing the raw material for tried-and-true metrics like impact factor.
Unfortunately, this approach does not serve all disciplines equally, especially given their traditional bias toward journal articles. Here, we’ve highlighted how CS research traditionally places greater emphasis on conference publications and how, as a result, CS researchers can suffer when it comes to ISI Web of Knowledge-based research assessment. Conference papers are generally excluded from such exercises.
The view that conference rejection rates are a good proxy for conference quality did not hold up to scrutiny, reflecting a low coefficient of correlation between the rejection rate of conferences and their Google Scholar scores.
We examined publication quality from a computer science/engineering perspective, justifying common publication practices in these disciplines by demonstrating how the leading conferences attract significant citation scores in line with leading journals. We performed a citation analysis on almost 9,000 conference and journal papers, drawing on citation data from Google Scholar and aligning the citations with ISI Web of Knowledge journal rankings. The results highlight several points:
Citation correlation. There is a strong correlation between citation scores computed from Google Scholar data and comparable data from the ISI Web of Knowledge index, validating the use of our new Google Scholar score as an alternative citation-based evaluation metric applicable to both journals and conferences;
Conferences vs. journals. The conferences in the analysis performed well compared to journals; a significant number of them achieved median citation rates comparable to A (ISI Web of Knowledge) journals;
Rejection proxy. The view that conference rejection rates are a good proxy for conference quality did not hold up to scrutiny, reflecting a low coefficient of correlation between the rejection rate of conferences and their Google Scholar scores; and
Regional bias. There is a strong regional bias between similar conferences, with U.S.-centric conferences attracting much higher citation scores than their non-U.S. counterparts.
Acknowledgment
This work was supported by Science Foundation Ireland through grants 07/CE/I1147, 04/RPI/1544, 03/CE2/I303 1, and 05/IN.1/I24.
Figures
Figure 1. Results of a Google Scholar query for documents by Marvin Minsky.
Figure 2. Screenshot of DBLP page for ICCBR 2003, along with conference details and a list of all its papers; included are three invited papers and the first three of 51 full papers.
Figure 3. Scatter plot of the correlation of Google Scholar impact factor and ISI Web of Knowledge impact factor.
Figure 4. Unified view of CS journals and conferences ranked according to Google Scholar impact factor.
Figure 5. Scatterplot of Google Scholar impact factors vs. rejection rates for a subset of conferences for which reliable rejection rates were available.
Tables
Table 1. Impact factors of journals included in the evaluation. The column labeled “ISI IF” lists the mean 20052007 ISI Web of Knowledge impact factor; the column “GS IF” lists the impact factor we calculated based on Google Scholar.
Table 2. Overall Google Scholar impact factor and years covered (when each conference was held) for each conference proceedings published 20002003.
Table 3. Google Scholar impact factors of AAAI vs. ECAI in 2000 and 2002, with corresponding rejection rates; AAAI and ECAI are each held every two years.
Table 4. Google Scholar impact factors for ICML vs. ECML in 2000 and 2003, with corresponding rejection rates.
Table 5. Median Google Scholar impact factors for UAI, ICML, and ECML in 2000 and 2003, with corresponding rejection rates.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment