First Authorship Gender Gap in the Geosciences

Although gender parity has been reached at the graduate level in the geosciences, women remain a minority in faculty positions. First authorship of peer‐reviewed scholarship is a measure of academic success and is often used to project potential in the hiring process. Given the importance of first author publications for hiring and advancement, we sought to quantify whether women are underrepresented as first authors relative to their representation in the field of geoscience. We compiled first author names across 13 leading geoscience journals from January 2013 to April 2019 (n = 35,183). Using a database of 216,286 names from 79 countries, across 89 languages, we classified the likely gender associated with each author's given (first) name. We also estimated the gender distribution of authors who publish using only initials, which may itself be a strategy employed by some women to preempt perceived (and actual) gender bias in the publication process. Female names represent 13–30% of all first authors in our database and are substantially underrepresented relative to the proportion of women in early career positions (30–50%). The proportion of female‐name first authors varies substantially by subfield, reflecting variation in representation of women across geoscience subdisciplines. In geoscience, the quantification of this first authorship gender gap supports the hypothesis that the publication process—namely, achievement or allocation of first authorship—is biased by social factors, which may modulate career success of women in the sciences.


Introduction
First authorship of papers in peer-reviewed journals is crucial to academic success, promotion, and competitive research funding (Evans & Houston, 2011;Way et al., 2019). Authorship is key to moving up the career ladder from graduate school to postdoctoral positions to faculty appointments (Lerchenmueller & Sorenson, 2018). In the natural sciences, women are underrepresented at the highest academic tiers (Bendels et al., 2018;Filardo et al., 2014;Glass, 2015;Macphee & Canetto, 2015). Representation of women in academic geoscience drops off substantially at every successive tier, with the greatest discrepancy at the highest ranks. This representation varies by career stage and subfield; 40-50% of Ocean, Atmospheric, and Earth Sciences graduate students (Bernard & Cooperdock, 2018), 30-36% at the assistant professor level, and only 11.5-13% at the full professor level (Glass, 2015;Macphee & Canetto, 2015) are women.
A critical contributor to this gender gap is the transition from postdoc to the first faculty position (Dutt et al., 2016), and studies suggest this discrepancy results from differences in academic productivity and perceived potential (Lerchenmueller & Sorenson, 2018). While academic productivity, measured by publication record, is often assumed to represent inherent scientific talent (Heesen, 2017), the strongest predictor of scholarly productivity is work environment, which highlights the importance of social factors in determining academic success (Way et al., 2019). For decades, publication analyses have revealed a significant gender gap in authorship (Cole & Zuckerman, 1984), publication in high impact journals (Brooks et al., 2014), and citation rates (Caplar et al., 2017;King et al., 2017). While recent assessments document the persistence of a gender discrepancy in first authorship of peer-reviewed publications in the sciences (Bendels et al., 2018;Filardo et al., 2014;Holman et al., 2018;Shen et al., 2018;West et al., 2013), an in-depth study focused on the geosciences has yet to be done. Analysis of authorship imbalances contributes to a stream of recent scholarship quantifying gender inequities in the geosciences at research conferences (Ford et al., 2018;King et al., 2018), in peer review (Lerback & Hanson, 2017), and in recommendation letters (Dutt et al., 2016).
Given the importance of first authorship for career advancement (Lerchenmueller & Sorenson, 2018), we sought to assess the extent to which female first authors are underrepresented among 13 of the major ©2020 The Authors. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. geoscience journals. In this field, it is first authors who conventionally perform the majority of the research and the writing. We used data mining to quantify the representation of women as first authors from January 2013 to April 2019 in leading geoscience journals (Nature Geoscience, Geology, Geological Society of America Bulletin, Journal of Geophysical Research (JGR)-all fields, Geophysical Research Letters, Quaternary Science Reviews, and Geochimica et Cosmochimica Acta). Sixty-two percent of first author names were categorizable by gender (Table 1). We compared our results to the representation of women in early career stages (from 30% of assistant professors to 50% of granted PhDs) and assume that first authorship is dominated by early career scientists (see section 4).
One factor that potentially confounds any analysis of women's representation in science is that women may be more likely to initialize their given name in order to mask their gender as a preemptive defense against implicit bias (as substantiated by studies showing that a name's gender influences competence assessments; Moss-Racusin et al., 2012a). In this study, we compared initialed author names to all authors in the complete mined database and identified the likely given name based on coauthorship overlap. We then assigned a likely gender to these first authors with initialized given names in order to assess the extent to which the practice of initializing names impacts measures of women's representation in the geoscience. We include open-access code in a GitHub repository to reproduce this approach in future studies, because quantifying authorship gender ratios will be useful to repeat for specific subdisciplines as well as to test for change over time (see section 4).

Results
In the majority of journals analyzed (10 of the 13), female names made up fewer than 30% of gender-categorizable first author names. The proportion of female name first authors varies substantially by subfield and likely reflects the representation of women across subdisciplines. We found that female names represented at lowest, 23% of categorizable first author names in Journal of Geophysical Research Space Physics (where representation of women in student or early career positions is close to 20%; Porter & Ivie, 2019), and at most 36% of categorizable first author names in Journal of Geophysical Research Biogeosciences (where women represent~40% of submitted conference abstract authors at the American Geophysical Union (AGU); Ford et al., 2018). These proportions are for 26,623 categorizable first author names, excluding unmatched initials (see section 4) across the journals analyzed (Figures 1 and 2).
Of the full database, including noncategorizable names, the percentage of female names ranges from 13-30% across all journals. Male names (green; Figure 1) represent 25-61% of all names, while uncategorized names (black; Figure 1) and unmatched initials (purple; Figure 1) represent 5-16% and 11-48% of all names, respectively. Early career scientists, defined as those who received their highest degree within the last 10 years, constitute the majority (~70%; see section 4) of first authors in AGU geoscience journals. We do not know what proportion of these early career authors are women. Thus, we simply assume that first authorship generally represents early career scientists, and women should be represented in the same proportion as documented in early career stages. Nevertheless, the percentage of female names (13-30%) is substantially below the representation of women at this career level (30-50%; translucent purple bars; Figures 1 and 2).
Of the matched initialed given names, we found that 29% were categorizable as female names (n ¼ 417, out of 1,434). This percentage varied by journal from 14% (Geological Society of America Bulletin) to 41% (Quaternary Science Reviews). In Geological Society of America Bulletin, female names represent 25% of all first authors, suggesting that men are more likely to publish using initials in this journal, whereas in Quaternary Science Reviews female names are slightly overrepresented in initialed names (female names represent 36% of all first authors). Although we are unable to match all initialed first author names, the percentage of female names in matched initialed given names (29%) is proportional to the overall representation of female names across all 13 journals (28%), indicating there is not a significant gender bias in authors' decision to publish in the geosciences using only initials.

Discussion
Geoscience is not the only field with a first author gender gap. In other disciplines, a similar first authorship gender gap was quantified by Bendels et al., including in the biological sciences (female names represent 35% of first authors) and chemistry (female names represent 23% of first authors); this gap persists across communities internationally (Bendels et al., 2018  The results from this study are limited by the range of journals selected for analysis and the specific subfields of geoscience these journals represent. Future studies could reproduce this analysis with other subdiscipline-specific journals using the open-access code provided from this study in a GitHub repository (see section 4). One limitation to our approach is the choice of gender-categorizing method and database, which is constrained to a binary and does not represent ultimate gender identity. For example, genderize. io will not be able to identify the gender for names pertaining to cultures where given names are not gendered (e.g., some East Asian cultures). Furthermore, in this study, we compared our results to the representation of women in geoscience within the United States, even though the author names included in this study come from a range of international institutions, and the proportion of women in different geoscience career stages varies by country. We also assume that first authorship is dominated by early career scientists and the proportion of women in early career stages is represented in publications. Our analysis would be improved with more detailed data sets on the representation of women in different career stages and subdisciplines within the geosciences.
We cannot draw a firm conclusion about what drives the identified disparity in first authorship but we can speculate based on the existing literature. Biases may exist at many different stages of the publication process. At the graduate school level, women may receive less mentoring or encouragement to write and submit first author research articles (Moss-Racusin et al., 2012b;Unkovic et al., 2016). A study analyzing authorship in political science journals found a gender bias in the perception of likely acceptance in journals, and therefore, in the ultimate decision to submit articles (Breuning et al., 2018).
Double-blind review, which is not widely used in the geosciences, has been shown to reduce gender gaps in publication acceptance rates (Budden et al., 2008), although a study on peer review in ecology suggests that reviewers do not rate papers differently based on first author gender in that field (Borusk et al., 2009). First authors may respond differently to a paper's rejection, as studies on confidence suggest that men's self-assessment of competence is substantially higher than those of women (Bench et al., 2015;Sarsons & Xu, 2015). Because of this higher level of confidence, men may be more likely to resubmit a paper following a rejection, contributing to a higher rate of male first authorship in top journals. To understand what causes our finding of a gender disparity in first author publication rates, it would be helpful to understand disparities at different stages in the publication process in the geosciences. Are women submitting fewer papers, are women's papers being rejected at higher rates, or do women resubmit at lower rates compared to male counterparts? Answering these questions would require journals to track gender (in addition to other social metrics such as career stage) in submitted and accepted manuscripts.
As with gender, journals could consider other demographic sources of inequities such as race. However, it is more tractable to infer the gender of given names than to identify race. For many journals, first author demographics are not tracked at submission, and therefore self-reported gender or race data are not available. Improved data sets documenting representation by gender, race/ethnicity, sexuality, and nationality, across different career stages in a range of disciplines, may help identify where biases exist in the publication pipeline (Morgan et al., 2018).
Our findings support efforts to implement journal practices, such as double-blind review, which reduce the impact of perceptions of first author gender and have been shown to increase the success of women in publishing articles (Budden et al., 2008). In addition, mentoring is an important element in academic productivity for early career scientists, and gender has been shown to influence the degree of mentorship provided (Moss-Racusin et al., 2012b). The gender-pairing of faculty mentors with students can result in different scholarly productivities (Pezzoni et al., 2016), and links between gender, mentoring, and publication might be highlighted by institutional leaders to raise awareness around social bias in mentoring. Scientific communities might also consider other ways to recognize the various contributions of authors, reevaluating the weight placed on first author publications (Larivière et al., 2016).
Data documenting gender biases in the publication process in addition to studies identifying the impact of social factors on productivity (Breuning et al., 2018;Evans & Houston, 2011;Sarsons & Xu, 2015;Way et al., 2019) challenge the view that science careers advance solely on merit. Underrepresentation of female first authors relative to their presence in the geosciences contributes to a growing body of evidence that suggests success in science is strongly modulated by social factors (Way et al., 2019) and that these factors influence tangible products such as first-authored publications. Efforts by journals, funders, and professional societies to understand what practices produce gender disparities in scholarly achievement will be required to reduce bias in and out of the publication pipeline.

Materials and Methods
The code used to produce the results included in this study can be found online (https://github.com/kevindoyle/geoscience-first-authorship). We compiled author names from January 2013 to April 2019 across a range of 13 geoscience journals (Nature Geoscience, Geology, Geological Society of America Bulletin, Journal of Geophysical Research (JGR)-all fields, Geophysical Research Letters, Quaternary Science Reviews, and Geochimica et Cosmochimica Acta). We selected these journals to include a range of general geoscience journals as well as discipline-specific journals across Earth, Ocean, and Atmospheric Sciences. We web-scraped author and article names from each journal website (n ¼ 35,183) by iteratively changing query parameters in the websites' search page URLs. The search result pages were rendered and downloaded using the Python package selenium (Muthukadan, n.d.). Author names and article titles were parsed from the downloaded pages by navigating the HTML tree using the python package BeautifulSoup (Richardson, 2020). An author's given name was identified as the first token of an author's name string. Tokens were created using whitespace as a delimiter. Of these author names, 24,525 are unique full names and 7,157 are unique given names. We classified the gender of author's given names using the genderize.io API (2020), accessed through a python client (getGenders, n.d.). The genderize.io database contains 216,286 distinct given names across 79 countries and 89 languages. This library categorizes names as "female", "male", or "uncategorized" and returns the probability that the given name is classified as a specific gender. In running the scraped author names through this database, we assigned the category "female" or "male" if the probability was above 50% for the given gender. This approach is limited to gendered given names, which may not hold across all cultures. Furthermore, this approach assumes the first name is the given name, which is not true for some cultures where family names are the first name.
Of 35,183 first author names, 9,994 names were initials (28%). To improve the accuracy of our results, we attempted to identify the noninitialed given name of initialed authors. This was done by comparing initialed names to all authors in the complete database of publications across these 13 journals. For a given initialed name, we used the associated family name (identified as the last token in the name string) to find all articles that included a coauthor with that family name. We then compared the extent of overlap in coauthor names between the list of articles containing this family name. The article with the greatest overlap in coauthorship (minimum overlap of one) was selected to identify the given name of the initialed first author. We were able to match 1,434 of 9,994 (14.3%) of initialed first authors. In calculating the overall representation of female or