Scholars know that diversity of thought produces better and faster solutions to complex problems.20 For this reason, as well as other practical and ethical reasons, the computing disciplines strive to improve women's representation in the field.5 These efforts often concentrate on piquing the interest of young girls and college students but less often on women's engagement as leaders and problem solvers. Here, we address an aspect of women's leadership by measuring trends and influences on women's authorship of computing-conference papers. Our findings contribute to knowledge about the conditions that promote gender diversity in this important aspect of the intellectual life of the field.
Analyzing data from ACM-affiliated conferences reveals several trends:
We collected and mined data from more than 3,000 ACM-affiliated conferences, workshops, symposia, and forums, 19662009, providing evidence of women's increased contribution to this form of professional engagement and contribution to computing. Using custom software called Genderyzera,15 we identified gender for 90% of the 356,703 authors who published papers for ACM events 19662009.b
Summary results outlined in Figure 1 show that over the past 43 years, women comprised 22% of all authors whose gender was ascertainable. From 1966 to 2009, women went from 7% to 27% of ACM conference paper authors; this growth averaged about 0.44 percentage points per year, in established and new conferences alike.
The persistent gender imbalance in computing19 challenges researchers and practitioners in terms of both explanations and effective intervention. Insight from social psychology, sociology, and feminist studies of technology suggest deep cultural beliefs about gender and technology affect all of usmen and womenwhen it comes to women's participation in stereotypically male activities. These stereotypes steer men into and women away from fields like computing, where the alignment of "feminine" and "technical" is less than obvious.
People assess women's (and men's) characteristics and accomplishments in light of gender stereotypes that classify technical activities like computing as fitting masculine interests and abilities.7,9,22 As a result, women and those who assess their performance tend to downgrade women's technical abilities relative to those of men with the same computing skills and accomplishments. At the same time, men and the managers who assess their performance could inflate their perceptions of men's technical abilities. Both genders unconsciously shift their appraisal of others and themselves to better align with the gender stereotypes.7 For women, this process often results in low confidence and avoidance or departure from computing.6,18
Mounting evidence supports the stereotype explanation for occupational gender segregation4,22; for example, cross-national comparisons have found that even in societies where men and women are believed to be equally competent and able to achieve at the highest levels, they segregate into different occupations when given the opportunity.4 Deeply held cultural beliefs that men and women are fundamentally different come into play when social structures fail to inhibit the influence of these stereotypes, according to Charles's and Bradley's empirical study of 21 countries.4 They argued that in countries where the educational system keeps able girls on the path toward math-based careers through their teens, after which the particularly strong influence of gender stereotypes lessened, women were better represented in computing. This observation implies that social structures (such as educational requirements) can override stereotypes about gender and technology and offer a path toward gender balance in computing.
The results indicated that for every additional woman with a computing Ph.D., women's author credits grew by 3.6 papers.
Theories about the influence of gender and technology stereotypes suggest at least three possible hypotheses about the general characteristics of women's contributions as authors of computing-conference papers: One is it might be more difficult for women than for men in computing to contribute to the intellectual life of the field; for example, this difficulty could result if reviewers unconsciously downgrade ratings of women's submissions3 or if low confidence inhibits women's willingness to publicly exhibit their technical thoughts and advancements in practice.
A second hypothesis is that women in computing might find publishing conference papers among their more easily accomplished professional activities. This ease might relate to alignment of verbal skills with stereotypically feminine abilities. Such alignment is so strong that "textbooks routinely cite sex differences in language competence... as established fact,"23 with women believed to be more verbally proficient than men. Stereotypes about women's verbal abilities might offer a comfortable "gender-authentic" role for women in computing. If this is true, it suggests that women might be relatively well represented among authors of computing-conference papers and also cluster in the subfields best aligned with feminine stereotypes.
A third hypothesis is that although stereotypes could hinder entry for most women, those who do go into computing perceive little influence and proceed on equal footing with their male colleagues. Potential support for this hypothesis comes from studies showing that women scientists have more egalitarian unconscious associations about gender and science than do women who are not scientists.21
Finally, other contemporary social structures might work through stereotypes or, independently, influence women's ACM conference authorship; for example, women's opportunities to write could be unduly affected if teaching and service expectations are heavier for women than for men. These expectation differences could stem from men and women being employed in different types of academic institutions, industry labs, and corporations that are more or less hierarchical in structure.24 Likewise, access to professional networks that afford collaboration might differ for men and for women. Care-related responsibilities, falling disproportionately on women, also appear to have complex influence on paper productivity.10 All these potential influences warrant investigation but are beyond our scope here.
Here, we report on our initial investigations into the extent women contribute to advancing thought through computing-conference papers and how these contributions are distributed across computing subfields.
We obtained the data for the study on April 2630, 2009 by screenscraping the proceedings for every conference in the ACM Digital Library. We began with the ACM proceedings page, followed by each list of proceedings and each year's proceedings pagec for each conference. We then processed proceedings pages, using a custom script Jofish Kaye developed to extract author names and paper titles. The data comprised 432 ACM-affiliated conferences, workshops, and symposiad held from 1966 to 2009. The resulting data set represents approximately 86,000 papers in approximately 3,100 proceedings. The average conference had 117 authors, though there was considerable variation; the standard deviation in number of authors was 134. There were also many small conference-like events; of the 432 conferences, the average number of authors was fewer than 100 at 316 conferences and fewer than 50 at 164 conferences.
Large data sets of names offer both opportunities and challenges for quantifying women's participation. Because most non-Chinese names are gender-specific, they can be used to identify the male and female representation among people listed in the data. We automated this process with a program called the Genderyzer developed by Kaye and freely available for public use at http://genderyzer.com. It compares first names to a data set drawn from various sources, including census data, official national name lists, baby-naming Web sites, and crowd-sourced data. Genderyzer labels each name as one of the following categories: male, female, unknown, initials, and ambiguous15; unknown names are displayed to both the current and the following user for crowdsourcing.
To determine Genderyzer's accuracy before employing it for the study, we conducted tests involving creating independent large data sets of names with a priori known gender and geographic origin. We created six separate data sets with a total of 4,700 observations, including U.S., international, and Asian names drawn from the records of international sports competitions. Applying Genderyzer to this data produced measures of accuracy for the software's performance. We then used it to combine the data sets into two files that mimicked the gender and country-of-origin characteristics of Ph.D. degree recipients in computer and information science in the U.S. over the previous five years.e Using this data set, we employed a random-sampling algorithm to gauge how well the software performs on a sample with an arbitrary number of observations from a population we believed was very close to the population of ACM conferences participants in terms of ethnic origin and sex as determined by Genderyzer. This technique allowed us to formulate 95% confidence intervals for the sex composition of the names the software identified as "ambiguous" and "unknown."f
The following example clarifies the method we used for determining the gender composition in a sample of names. Assume we have a population of names with a known 2:2:1 American-Asian-international ratio and 3:1 male-female ratio out of a random sample of 100 names. Based on our testing, we know that Genderyzer on average correctly identifies sex for 70% of the names and places the rest in one of three categories: "unknown," "ambiguous," and "initials" (rather than full names). We repeatedly drew 50 random samples of size 100 from the combined data sets to calculate the average number and 95% confidence intervals for the number of female names in the "ambiguous" and "unknown" categories. Since we had a priori knowledge of sex and country-of-origin variables, we could identify that on average in such a sample 40% and 45% of the names in "unknown" and "ambiguous" categories, respectively, belonged to females. We used the results from our random trials to make well-supported assumptions about the distribution in the "ambiguous" and "unknown" categories to augment the percentage of identifiable names.
We had no knowledge of the actual distribution of "initials" so assumed that names in that category had the same gender composition as the weighted average of our four other categories in the same year. It is possible that this assumption could lead to underestimation of women's representation if women use initials more than men to avoid possible gender bias. We chose to be conservative in our estimation.
The annual number of conference papers published by ACM as represented in our data set grew from 149 in 1966 to 12,222 in 2008. This increase is hardly surprising, given the phenomenal growth and differentiation of computing, as well as the general growth in academic publishing.13 The number of authors grew even more dramaticallyfrom 389 to 37,944 during the same period. This difference in growth rates of papers and authors is explained by the increasing prevalence of collaborative authorship. In 1966, papers had on average 2.6 authors, but, by 2008, papers had on average 3.1 authors. Over those 43 years, most authors were men, though women authors were increasingly prevalent in recent years. In 2008, there were approximately 2.3 male authors and 0.8 woman authors per published ACM conference paper.
Women's authorship increased as they garnered Ph.D. degrees. Figure 2 reflects the substantial increases in women's share of authorship of ACM conference papers. The increase averaged 0.44 percentage points annually, 19662009, with 10-year intervals finding women's share of authorship at 8% in 1968, 15% in 1978, 18% in 1988, 21% in 1998, and 25% in 2008.
Figure 2 also shows this rise in women's participation was not an artifact of newly created conferences catering to women. Tracking a set of 64 longstanding ACM conferencesg resulted in the same trend in the full data set, confirming that women's authorship grew about 1820 percentage points from 1966 to 2009. We explore two possible explanations for this trend. One may be the increase in women's representation among potential authors, as women earned more computing Ph.D. degrees. As women's representation in the community increased, one would expect a concomitant increase in their contribution to the intellectual life of the field. The second is that women may have benefited disproportionately from collaboration. Other explanations are also possible, some we hope to explore in future articles.
Research and publication are important activities for most professionals with Ph.D. degrees. Therefore, it should come as little surprise to learn that the proportion of women Ph.D. recipients in computer scienceh strongly correlates with women's conference authorship.i A substantial portion of the upward trend is accounted for by increased women's share of Ph.D. degrees in computing. There is a moderately strong positive association between absolute growth in women Ph.D. graduates and paper authorship (B = 0.76 significant at 1%).
Visually comparing the trends in women's Ph.D.s and authorship makes possible two observations, as reflected in Figure 3:
Parallel rates. Growth in women's publishing rates paralleled growth in women's Ph.D. degree rates; the average annual growth in women's share of ACM conference authorship was 0.44 percentage points, compared with 0.45 points for computing Ph.D. degrees; and
Publishing rate. Women publish at higher rates than one might expect from their representation among Ph.D. holders. In 1967, women's representation among authors was about four points greater than among Ph.D. degree recipients. This overrepresentation persisted in most years, holding for both annual and cumulative percentage of women Ph.D. holders.
We also analyzed the relationship between growth in the cumulative number of Ph.D.s and number of author credits, while accounting for autocorrelation, by running regression in first differences. The results indicated that for every additional woman with a computing Ph.D., women's author creditsj grew by 3.6 papers. Growth was less for men; additional Ph.D.s corresponded to only 2.6 more author credits. We found no correlation between being first or subsequent author and gender. Women and men were equally likely to be first authors on their papers.
These results appear to contradict well-established findings that academic men publish more than academic women, so it is important to recognize that several potential unknown factors might affect men and women authors differently, some of which we discuss later in the section on implications.
Women's sole authorship and collaborative authorship both increased. We investigated the possibility that more collaboration by women could contribute to women's apparent productivity and help explain the upward trend in women's representation among authors. Our analysis suggests that collaboration alone explains little about the increase in women's share of authorship.
Figure 4 outlines both the exponential increase in the number of multiple-authored papers presented at ACM-sponsored conferences and the small increase in number of papers published by lone authors. By 2008, collaboration was most common, with 97% of all ACM conference papers written collaboratively.
From 1966 to 2009, individual men published more ACM papers than did individual women but at about the same gender representation seen for authors overall. The number of papers published by individual men peaked in 2006 at a little over 1,300. Individual women also contributed the most papers that yearabout 415, or 24% of all individually authored papers that year.
The trend toward increased co-authorship appears to have begun in the mid-1980s and accelerated in the late 1990s. By contrast, the trend in women's representation among ACM conference authors increased at about the same pace since the early 1970s. This observation suggests that while women may have benefited from the increasingly common practice of co-authorship, collaboration probably does not explain much of the trend toward gender parity among computing-conference-paper authors.
Women's share of paper authorship varies across ACM conferences; Figure 5 plots women's percentage of authors in 64 longstanding conferences, each averaged over the 10-year interval 19982008. The range is from a mean of 10%44% women authors, with most conferences having 17%29% women authors. The average percentage of women authors among the 64 conferences was 23%, with a standard deviation of 6%. Tables showing all the largek ACM conferences with especially high or low average participation of women authors 19982008 are available online. We focused on relatively large conferences, dropping those with fewer than 100 authors.l Looking at the extremes might hint at alignment with gender stereotypes as a factor in the distribution of women authors across conferences; at the high end, the conference topics are children, education, and human-computer interaction. Any potential misalignment with feminine stereotypes is less obvious at the low end.
The trend over time (not shown in Figure 5) for most (40) of the 64 conferences was a clear upward slope in women's authorship. A number of conferences (18) had neutral slopes for the trend in women's authorship, while (six) had negative slopes indicating declines in women's share of authorship over time, though included conferences with high female participation to begin with.m There was no obvious pattern reflecting which of the conferences had positive, neutral, or negative slopes.
Conference topic relates to authorship overall, and to women's authorship. Thus far, the data appears to support the hypothesis that female authors might be more prevalent in conferences focused on specific topics. Investigating further, we coded each conference according to its ACM-designated general-topic classifications: Algorithms, Design, Documentation, Economics, Experimentation, Human Factors, Languages, Legal Aspects, Management, Measurement, Performance, Reliability, Security, Standardization, Theory, and Verification. At face value alone, it seems reasonable to expect the topic classifications most closely aligned with feminine stereotypes would be Human Factors, Design, and Documentation, and those most closely aligned with masculine stereotypes would be Algorithms, Theory, and Security.
We analyzed a subset of the full data set (n = 391 conferences) for which we were able to obtain additional information. The data contained all the cases with publicly available information on paper-acceptance rate (used as a proxy for conference prestige), along with conference location and ACM general classifications terms for each conference. Most conferences in this subset were held between 1998 and 2008, but we included 91 conferences dating earlier than 1998 to maximize the number of observations. The earliest conference we included in the set was held in 1981.
Coding by conference topic shows variance in the prevalence of authors publishing on certain topics. The descriptive results show that, like men, women were most likely to publish papers in ACM conferences on Design and on Theory. Human Factors and Algorithms were the next most popular conference topics, with women much more likely than men to publish in conferences on Human Factors and men more likely than women to publish in conferences on Algorithms. The greatest gender differences were evident in conferences focusing on Human Factors, Languages, Algorithms, and Performance, in decreasing order.
Following up on the descriptive evidence, our final statistical analysis used ordinary-least-squares regression to measure factors contributing to variation in women's percentage of authors published in a conference. The results show that, controlling for year, conference topic substantially predicts author gender composition for a conference. Conference acceptance rate is also weakly associated with women's authorship (B = 0.07, Beta = 0.14, significant at the 0.001 level). Conferences with more papers accepted were slightly more likely for a greater share of those papers to have women authors. The "Human Factors" topic had the strongest relationship with women's share of authorship (B = 0.049, Beta = 0.314, significant at 0.001 level). Other topics significantly and positively correlated with the percentage of women authors were Documentation, Management, and Measurement (with respective values of B = 0.040, 0.038, 0.030 and Beta = 0.122, 0.256, 0.192, all significant at the 0.001 level). Algorithms was the topic with the strongest negative association with women's share of authorship (B = 0.033, Beta = 0.209, significant at the .001 level). Other topics with significant negative correlations with percentage of women authors were Performance and Reliability (with respective values of B = 0.021, 0.036 and Beta = 0.144, 0.180).
These findings lend mild support to the hypothesis that alignment with gender stereotypes predicts the extent of women's authorship. As expected, conferences focused on Human Factors and on Documentation were associated with a greater proportion of women authors, while conferences on Algorithms were associated with a greater proportion of men authors. We found no evidence supporting our hypothesis that Design, Theory, and Security would also be gender skewed.
The descriptive evidence presented here documents two facts:
Substantial growth. Women's contribution to computing's intellectual life, measured by their share of ACM conference-paper authorship, grew substantially between the late 1960s and 2009, though they remain little more than a quarter of all conference authors; and
Variation across conferences. The extent of women's authorship varies across conferences.
In addition to describing trends, we've taken the first steps toward identifying factors that affect trends in women as authors. We tested whether women's Ph.D.s are related to their authorship and found them to be strongly related, and that the relationship between Ph.D.s and authorship is even stronger for women than for men. This result suggests that women in the pool of eligible authors generally overcome challenges they may face in publishing conference papers.
More interesting, our findings indicate that for each computing Ph.D. holder, there are more women author credits than men author credits in ACM conferences. This finding appears to contradict well-established research results about academic productivity. Many studies have shown that academic men publish more than their women colleagues.1,10,11
Women were especially evident in conferences devoted to human factors and management and scarce among conferences devoted to algorithms and reliability.
The apparent contradiction in productivity results could have several causes. The current study measured the relationship between author credits and likely authors. In contrast, other studies averaged papers published per male or female faculty member.1 Our per-Ph.D. measure would inflate women's productivity if women authors were more likely than men authors to be without Ph.D.s in a computing discipline or to hold degrees from non-U.S. institutions. Alternatively, men and women might tend to publish in different venues, with women overrepresented at ACM conferences compared to journals, IEEE, and other non-ACM computing conferences. It might also be the case that men more than women hold positions in industry, where publishing is less career-critical than in academe. Finally, computing may be a special case, with conference-paper authorship patterns different from those in other academic disciplines. Each potential explanation calls for more in-depth study and postponing any celebration of women's success in this arena.
Whether or not our finding about disproportionate productivity holds up to further scrutiny, the results still show a clear benefit to the discipline from increasing women's number and representation among Ph.D. degree holders. The benefit is evident in the strong positive association between women Ph.D.s and women's contributions as authors. Every additional woman Ph.D. yields an additional contribution to the intellectual life, as well as to the diversity, of computing. Gender balance among the thought leaders in computing remains a distant goal, but women's educational achievement is moving us all toward that end.
We also considered whether conference topic and prestige predict the variation in women's authorship across conferences and found weak supporting evidence. The data showed that, as the stereotypes-hypothesis predicted, some gender differences persist in topics and prestige; for example, women were especially evident in conferences devoted to human factors and management and scarce among conferences devoted to algorithms and reliability. They were also slightly less likely to publish in the more prestigious conferences (measured by paper-acceptance rates).
The dissimilarity in topics on which men and women publish probably reflects gender differences in their thesis topics. Another possible cause is that reviewers are influenced by gender stereotypes if blind review processes are not used. Additional research is needed to investigate potential bias when not using a blind review process. However, regardless of the timing and mechanism, the observed pattern of gender difference is somewhat consistent with gender stereotypes, offering mild support for the hypothesis that gender stereotypes contribute to the segregation of men and women into different computing subfields.
The evidence may have offered only a limited explanation for cross-conference variation due to measurement problems and missing explanatory factors; for example, ACM conference-topic classifications might be a poor measure because it is unclear they accurately represent the dominant themes in a conference. Likewise, our method of categorizing topics as more or less stereotypically masculine or feminine was ad hoc. A more rigorous approach could better test whether an association exists between the conference topic and gender stereotypes. Further investigation is needed to find accurate topic descriptions, stereotypes related to those topics, and empirical links between the two.
Author data documents that the number of ACM conference papers grew from 1966 to 2009, and while women remain severely underrepresented in computing, we found a substantial increase in their share of papers published, a trend that came about as women earned more Ph.D. degrees in computing. Furthermore, the ratio of Ph.D. holders to papers published indicates that women were relatively more productive authors of ACM conference papers than were men over the same years. Women's contributions, while increasing in most ACM conferences, were greater in some conferences than in others. Gender stereotypes may contribute to this clustering.
This research into women's contributions to computing continues with the support of the National Center for Women & Information Technology (http://www.ncwit.org/) and ACM. With ACM journal data and more detailed conference data, investigations are exploring the gender composition of journal authorship and influence. Comparisons of computing with other disciplines where women are well- or underrepresented are also under way. Like this first attempt to track and explain trends and variation, these planned investigations will shed light on the conditions that promote women's participation in computing, and the common benefits derived from their contributions.
Thanks to the ACM and the National Center for Women & Information Technology for their support of this project.
2. American Federation of Information Processing Societies. AFIPS Records 19601990; http://www.cbi.umn.edu/collections/inv/cbi00044.html
3. Budden, A., Tregenza, T., Aarssen, L., Koricheva, J., Leimu, R., and Lortie, C. Double-blind review favours increased representation of female authors. Trends in Ecology and Evolution 23, 1 (2007), 46.
4. Charles, M. and Bradley, K. A matter of degrees: Female underrepresentation in computer science programs cross-nationally. In Women and Information Technology: Research on Underrepresentation, J.M. Cohoon and W. Aspray, Eds. MIT Press, Cambridge, MA, 2006
11. Fox, M.F. and Mohapatra S. Social-organizational characteristics of work and publication productivity among academic scientists in doctoral-granting departments. Journal of Higher Education 78, 5 (2007), 543571.
15. Kaye, J. Some statistical analyses of CHI. In Proceedings of the 27th International Conference Extended Abstracts on Human Factors in Computing Systems (boston, Apr. 49). ACM Press, New York, 2009, 25852594.
16. Kochan, T., Bezrukova, K., Ely, R., Jackson, S., Joshi, A., Jehn, K., Leonard, J., Levine, D., and Thomas, D. The effects of diversity on business performance: Report of the Diversity Research Network. Human Resource Management 42 (2003), 321.
19. National Center for Women & Information Technology. Women in IT: The Facts. Boulder, CO; http://www.ncwit.org/pdf/NCWIT_WomenInItFacts_FINAL.pdf
25. Williams, K. and O'Reilly, C.A. Demography and diversity in organizations: A review of 40 years of research. In Research in Organizational Behavior, B.M. Staw and L.L. Cummings, Eds. JAI, Greenwich, CT, 1998, 77140.
a. Created by author Kaye while a student at Cornell University and an employee of Nokia; http://genderyzer.com
e. Source: National Center for Education Statistics; http://webcaspar.nsf.gov/. The data set provides information on number of Ph.D. graduates in computer science in U.S. academic institutions.
h. Source: National Center for Education Statistics, accessed through http://webcaspar.nsf.gov/. The data set (from the National Science Foundation) provides information on number of Ph.D. graduates in computer science in U.S. academic institutions. "Computer science" includes computer and information science, general; information sciences and systems; computer science; management information systems; and management science.
j. "Author credits" count authors each time they publish, so we are not comparing degree recipients to papers published or to author names; we instead compare degree recipients with instances of authorship.
l. Averaging across all years for which we have data produced little difference compared with averaging across only the 10 most recent years; 83% of the listed conferences stayed in the same categories.
m. Women's share of authorship increased in ACSC, ADAC, ASPLOS, CCS, CHI, CODES+ISSS, COMM, CPR, DAC, FPCA, GH, HDPC, HT, ICAAD, ICSE, ISLPED, ICSA, ISPD, KDD, LCTES, MICRO, OOPSLA, PACT, PADS, PODS, JCDL-DL, SAC, SCG, SIGCSE, SIGGRAPH, SIGIR, SIGIMETRICS, SIGIMOD, SIGSOFT, SIGUCCS, SPAA, SPM, STOC, UIST, and WSC; was neutral in VRST, SIGDOC, PLDI, PODC, POPL, SBCCI, SC, MSWIM, ISSAC, ITCSE, IUI, MM, ICS, DOLAP, APL, DATE, DIAL_M, and CIKM; and fell in GIS, DIS, AGENTS, ICFP, SIGAda, SODA. Full conference names for each acronym are available from the authors or online from http://www.acm.org.
©2011 ACM 0001-0782/11/0800 $10.00
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from firstname.lastname@example.org or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2011 ACM, Inc.
The following letter was published in the Letters to the Editor of the March 2012 CACM (http://cacm.acm.org/magazines/2012/3/146236).
The article "Gender and Computing Conference Papers" by J. McGrath Cohoon et al. (Aug. 2011) explored how women are increasingly publishing in ACM conference proceedings, with the percentage of women authors going from 7% in 1966 to 27% in 2009. In 2011, we carried out a similar study of women's participation in the years 19602010 and found a significant difference from Cohoon et al. In 1966, less than 3% of the authors of published computing papers were women, as opposed to about 16.4% in 2009 and 16.3% in 2010. We have since sought to identify possible reasons for that difference: For one, Cohoon et al. sought to identify the gender of ACM conference-paper authors based on author names, referring to a database of names. They analyzed 86,000 papers from more than 3,000 conferences, identifying the gender of 90% of 356,703 authors. To identify the gender of "unknown" or "ambiguous" names, they assessed the probability of a name being either male or female.
Our study considered computing conferences and journals in the DBLP database of 1.5 million papers (most from ACM conferences), including more than four million authors (more than 900,000 different people). Like Cohoon et al., we also identified gender based on author names from a database of names, using two methods to address ambiguity: The first (similar to Cohoon et al.) used the U.S. census distribution to predict the gender of a name; the second assumed ambiguous names reflect the same gender distribution as the general population. Our most accurate results were obtained through the latter method unambiguous names.
We identified the gender of more than 2.6 million authors, leaving out almost 300,000 ambiguous and 1.1 million unknown names (with 220,000 limited to just initials). We performed two tests to validate the method, comparing our results to a manual gender identification of two subsets (thousands of authors) of the total population of authors. In each, the results were similar to those obtained through our automated method (17.26% and 16.19%).
Finally, Cohoon et al. compared their results with statistics on the gender of Ph.D. holders, concluding that the productivity of women is greater than that of men. Recognizing this result contradicts established conclusions, they proposed possible explanations, including "Men and women might tend to publish in different venues, with women over-represented at ACM conferences compared to journals, IEEE, and other non-ACM computing conferences."
However, the contradiction was due to the fact that their estimation of "unknown" or "ambiguous" names overestimated the number of women publishing in ACM conference proceedings.
Jos Mara Cavero, Beln Vela, and Paloma Cceres
The difference in our results from those of Cavero et al. Likely stems from our use of a different dataset. We conducted our analyses exclusively on ACM conference publications. Cavero et al. included journals, which have fewer women authors. We carefully tested and verified our approach to unknown and ambiguous names on data where gender was known, showing that women were overrepresented among unknown and ambiguous names. We could therefore construct a more accurate dataset that corrected for the miscounting of women.
J. McGrath Cohoon
Joseph "Jofish" Kaye
Palo Alto, CA
Displaying 1 comment