From 7a0fd104208c281f5ce32b9605aeca5182198e9f Mon Sep 17 00:00:00 2001
From: Trang Le This version of the manuscript contains changes subsequent to the version 1.0 release. This version of the manuscript contains changes subsequent to the version 1.0 release.
This manuscript
-(permalink)
+(permalink)
was automatically generated
-from greenelab/iscb-diversity-manuscript@5dc6311
+from greenelab/iscb-diversity-manuscript@2001465
on March 19, 2020.
Analysis of ISCB honorees and keynotes reveals disparities
Countries of Affiliations
This approach returns a single country for an affiliation when successful.
When labeling affiliations with countries, we only used these values when geotext did not return results or had ambiguity amongst countries without multiple matches.
For more details on this approach, consult the accompanying notebook and label dataset.
For ISCB honorees, during the curation process, if an honoree was listed with their affiliation at the time, we recorded this affiliation for analysis. +For ISCB Fellows, we used the affiliation listed on the ISCB page. +Because we could not find affiliations for the 1997 and 1998 RECOMB keynote speakers’ listed for these years, they were left blank. +If an author or speaker had more than one affiliation, each was inversely weighted by the number of affiliations that individual had.
We predicted the gender of honorees and authors using the https://genderize.io API, which produces predictions trained on over 100 million name-gender pairings collected from the web. We used author and honoree first names to retrieve predictions from genderize.io. @@ -373,11 +377,10 @@
Along with the corresponding author names, we collected their affiliations recorded in each publication for this analysis. -During the honoree curation process, if an honoree was listed with their affiliation at the time, we recorded this affiliation for analysis. -For ISCB Fellows, we used the affiliation listed on the ISCB page. -Because we could not find affiliations for the 1997 and 1998 RECOMB keynote speakers’ listed for these years, they were left blank. -If an author or speaker has more than one affiliation, each is inversely weighted by the number of affiliations that individual has.
+For each country, we computed the expected number of honorees by multiplying the proportion of authors whose affiliations were in that country with the total number of honorees. +We then performed an enrichment analysis to examine the difference in country affiliation proportions between ISCB honorees and field-specific corresponding authors. +We calculated each country’s enrichment by dividing the observed proportion of honorees by the expected proportion of honorees. +The variance of the log2 enrichment was estimated using the delta method with a small continuity correction to avoid dividing by 0 [20].
We curated a dataset of ISCB honorees that included 411 honorees who were keynote speakers at international ISCB-associated conferences (ISMB, RECOMB, and PSB) as well as ISCB Fellows. @@ -398,7 +401,7 @@
We observed a slow increase of the proportion of predicted female authors, arriving at just over 20% in 2019 (Fig. 2, left). We observe very similar trend within each journal, but estimated female proportion has increased the least in PLOS Computational Biology (see notebook). ISCB Fellows and keynote speakers appear to be more evenly split between men and women compared to the population of authors published in computational biology and bioinformatics journals (Fig. 2, right); however, it has not yet reached parity. -Further, taking all the years together, a Welch two-sample t-test does not reveal any statistically significant difference in the mean probability of ISCB speakers predicted to be female compared to that of authors (\(t_{418} = 0.753\), \(p = 0.226\)). +Further, taking all the years together, a Welch two-sample t-test did not reveal any statistically significant difference in the mean probability of ISCB honorees predicted to be female compared to that of authors (t418 = 0.753, p = 0.226). We observed an increasing trend of honorees who were women in each honor category, especially in the group of ISCB Fellows (see notebook), which markedly increased after 2015. Through 2019, there were a number of examples of meetings or ISCB Fellow classes with a high probability of recognizing only male honorees and none that appeared to have exclusively female honorees. However, the 2020 PSB keynotes, though outside of the primary range of our analyses, had nearly all the probability ascribed to female speakers.
@@ -429,8 +432,8 @@We directly compared honoree and author results from 1997 to 2020 for the predicted proportion of white, Asian, and other categories (Fig. 3E). -We found that, over the years, white honorees have been significantly overrepresented (\(t_{348} = 15.0\), \(p < 10^{-16}\)) and Asian honorees have been significantly underrepresented (\(t_{368} = -21.8\), \(p < 10^{-16}\)). -We also observed a higher mean probability of ISCB speakers predicted to be in Other categories compared to authors (\(t_{336} = 2.18\), \(p = 0.0296\)).
+We found that, over the years, white honorees have been significantly overrepresented (t348 = 15.0, p < 10-16) and Asian honorees have been significantly underrepresented (t368 = -21.8, p < 10-16). +We also observed a higher mean probability of ISCB speakers predicted to be in Other categories compared to authors (t336 = 2.18, p = 0.0296).We next aimed to predict the name origin groups of honorees and authors. We constructed a training dataset with more than 700,000 name-nationality pairs by parsing the English-language Wikipedia. @@ -471,6 +474,254 @@
We analyzed the countries of affiliation between corresponding authors and ISCB honorees. +For each country, we report a value of log enrichment (LOE) and its 95% confidence intervals (Table 2). +A positive value of LOE indicates a higher proportion of honorees affiliated with that country compared to authors. +A LOE value of 1 represents a one-fold enrichment (i.e., observed number of honorees is twice as much as expected). +In the 20 countries with the most publications, we found an overrepresentation of honorees affiliated with institutions and companies in the US (97 speakers more than expected, LOE = 0.6, 95% CI (0.5, 0.8)) and Israel (12 speakers more than expected, LOR = 1.6 (0.9, 2.3)) and an underrepresentation of honorees affiliated with those in China, France, Italy, the Netherlands, Taiwan, and India (Fig. 6).
+Country | +Author proportion | +Observed | +Expected | +Observed - Expected | +Enrichment | +Log2(Enrichment) | +95% Confidence Interval | +
---|---|---|---|---|---|---|---|
United States | +38.76% | +237.5 | +152.7 | +84.8 | +1.6 | +0.6 | +(0.5, 0.8) | +
United Kingdom | +8.36% | +36.0 | +32.9 | +3.1 | +1.1 | +0.1 | +(-0.3, 0.6) | +
Germany | +7.55% | +27.0 | +29.7 | +-2.7 | +0.9 | +-0.1 | +(-0.7, 0.4) | +
China | +5.82% | +3.0 | +22.9 | +-19.9 | +0.1 | +-2.9 | +(-4.5, -1.3) | +
France | +3.86% | +4.0 | +15.2 | +-11.2 | +0.3 | +-1.9 | +(-3.3, -0.5) | +
Italy | +3.04% | +2.0 | +12.0 | +-10.0 | +0.2 | +-2.6 | +(-4.5, -0.6) | +
Canada | +3.03% | +12.0 | +11.9 | +0.1 | +1.0 | +0.0 | +(-0.8, 0.8) | +
Japan | +2.44% | +9.0 | +9.6 | +-0.6 | +0.9 | +-0.1 | +(-1, 0.8) | +
Spain | +2.39% | +6.0 | +9.4 | +-3.4 | +0.6 | +-0.7 | +(-1.8, 0.5) | +
Australia | +2.33% | +5.0 | +9.2 | +-4.2 | +0.5 | +-0.9 | +(-2.1, 0.4) | +
Netherlands | +1.91% | +1.0 | +7.5 | +-6.5 | +0.1 | +-2.9 | +(-5.6, -0.2) | +
Switzerland | +1.81% | +7.0 | +7.1 | +-0.1 | +1.0 | +-0.0 | +(-1.1, 1) | +
Israel | +1.46% | +17.5 | +5.8 | +11.7 | +3.0 | +1.6 | +(0.9, 2.3) | +
Sweden | +1.34% | +6.0 | +5.3 | +0.7 | +1.1 | +0.2 | +(-1, 1.3) | +
Korea | +1.30% | +1.0 | +5.1 | +-4.1 | +0.2 | +-2.4 | +(-5.1, 0.3) | +
Taiwan | +1.25% | +0.0 | +4.9 | +-4.9 | +0.0 | ++ | (-Inf, -Inf) | +
India | +1.20% | +0.0 | +4.7 | +-4.7 | +0.0 | ++ | (-Inf, -Inf) | +
Belgium | +1.04% | +1.0 | +4.1 | +-3.1 | +0.2 | +-2.0 | +(-4.7, 0.7) | +
Singapore | +0.88% | +1.0 | +3.5 | +-2.5 | +0.3 | +-1.8 | +(-4.5, 0.9) | +
Finland | +0.85% | +0.0 | +3.4 | +-3.4 | +0.0 | ++ | (-Inf, -Inf) | +
A major challenge that we faced in carrying out this work was to narrow down geographic origins for some groups of names. Some groupings, such as Group D, are geographically disparate. @@ -479,7 +730,7 @@
Biases in authorship practices may also result in our underestimation of the composition of minoritized scientists within the field. -We estimated the composition of the field using corresponding author status, but in neuroscience [20] and other disciplines [21] women are underrepresented among such authors. +We estimated the composition of the field using corresponding author status, but in neuroscience [21] and other disciplines [22] women are underrepresented among such authors. Such an effect would cause us to underestimate the number of women in the field. Though this effect has been studied with respect to gender, we are not aware of similar work examining race, ethnicity, or name origins.
We acknowledged that our supervised learning approaches are neither error free nor bias free. @@ -491,9 +742,9 @@
An important questions to ask when measuring representation is what the right level of representation is. We suggest that considering equity may be more appropriate than strictly diversity. -In addition to holding fewer corresponding authorship positions, on average, female scientists of different disciplines are cited less often [22], invited by journals to submit papers less often [21], suggested as reviewers less often [24], and receive significantly worse review scores [23]. -Societies, both through their honorees and the individuals who deliver keynotes at their meetings, can play a positive role in improving the presence of female STEM role models, which, for example, may lead to higher persistence for undergraduate women in geoscience [25]. -Efforts are underway to create Wikipedia entries for more female [26] and black, Asian, and minority scientists [27], which can help early-career scientists identify role models. +In addition to holding fewer corresponding authorship positions, on average, female scientists of different disciplines are cited less often [23], invited by journals to submit papers less often [22], suggested as reviewers less often [25], and receive significantly worse review scores [24]. +Societies, both through their honorees and the individuals who deliver keynotes at their meetings, can play a positive role in improving the presence of female STEM role models, which, for example, may lead to higher persistence for undergraduate women in geoscience [26]. +Efforts are underway to create Wikipedia entries for more female [27] and black, Asian, and minority scientists [28], which can help early-career scientists identify role models. We find that ISCB’s honorees and keynote speakers, though not yet reaching gender parity, appear to be more evenly split between men and women than the field as a whole. On the other hand, honorees include significantly fewer people of color than the field as a whole, and Asian scientists are dramatically underrepresented among honorees. Although we estimate the fraction of non-white and non-Asian authors to be relatively similar to the estimated honoree rate, we note that both are represented at levels substantially lower than in the US population. @@ -503,7 +754,7 @@
This manuscript was written openly on GitHub using Manubot [28]. +
This manuscript was written openly on GitHub using Manubot [29]. The Manubot HTML version is available under a Creative Commons Attribution (CC BY 4.0) License at https://greenelab.github.io/iscb-diversity-manuscript/. Our analysis of authors and ISCB-associated honorees is available under CC BY 4.0 at https://github.com/greenelab/iscb-diversity, with source code also distributed under a BSD 3-Clause License. Rendered Python and R notebooks from this repository are browsable at https://greenelab.github.io/iscb-diversity/. @@ -618,55 +869,61 @@
20. Statistics in epidemiology: methods, techniques, and applications
+Hardeo Sahai, Anwer Khurshid
+CRC Press (1996)
+ISBN: 9780849394447
20. Persistent Underrepresentation of Women’s Science in High Profile Journals
+
21. Persistent Underrepresentation of Women’s Science in High Profile Journals
Yiqin Alicia Shen, Jason M. Webster, Yuichi Shoda, Ione Fine
bioRxiv (2018-03-08) https://doi.org/cmh5
DOI: 10.1101/275362
21. The gender gap in science: How long until women are equally represented?
+
22. The gender gap in science: How long until women are equally represented?
Luke Holman, Devi Stuart-Fox, Cindy E. Hauser
PLOS Biology (2018-04-19) https://doi.org/gdb9db
DOI: 10.1371/journal.pbio.2004956 · PMID: 29672508 · PMCID: PMC5908072
22. The extent and drivers of gender imbalance in neuroscience reference lists
+
23. The extent and drivers of gender imbalance in neuroscience reference lists
Jordan D. Dworkin, Kristin A. Linn, Erin G. Teich, Perry Zurn, Russell T. Shinohara, Danielle S. Bassett
arXiv (2020-01-07) https://arxiv.org/abs/2001.01002
23. Gender differences in peer review outcomes and manuscript impact at six journals of ecology and evolution
+
24. Gender differences in peer review outcomes and manuscript impact at six journals of ecology and evolution
Charles W. Fox, C. E. Timothy Paine
Ecology and Evolution (2019-03-04) https://doi.org/gfwjjb
DOI: 10.1002/ece3.4993 · PMID: 30962913 · PMCID: PMC6434606
24. Journals invite too few women to referee
+
25. Journals invite too few women to referee
Jory Lerback, Brooks Hanson
Nature (2017-01-26) https://doi.org/gf4jjz
DOI: 10.1038/541455a · PMID: 28128272
25. Role modeling is a viable retention strategy for undergraduate women in the geosciences
+
26. Role modeling is a viable retention strategy for undergraduate women in the geosciences
Paul R. Hernandez, Brittany Bloodhart, Amanda S. Adams, Rebecca T. Barnes, Melissa Burt, Sandra M. Clinton, Wenyi Du, Elaine Godfrey, Heather Henderson, Ilana B. Pollack, Emily V. Fischer
Geosphere (2018-10-31) https://doi.org/gghp9d
DOI: 10.1130/ges01659.1
26. Why we’re editing women scientists onto Wikipedia
+
27. Why we’re editing women scientists onto Wikipedia
Jess Wade, Maryam Zaringhalam
Nature (2018-08-14) https://doi.org/gdz52z
DOI: 10.1038/d41586-018-05947-8
27. Why we’re creating Wikipedia profiles for BAME scientists
+
28. Why we’re creating Wikipedia profiles for BAME scientists
Nicola O’Reilly
Nature (2019-03-07) https://doi.org/gfwhcr
DOI: 10.1038/d41586-019-00812-8
28. Open collaborative writing with Manubot
+
29. Open collaborative writing with Manubot
Daniel S. Himmelstein, Vincent Rubinetti, David R. Slochower, Dongbo Hu, Venkat S. Malladi, Casey S. Greene, Anthony Gitter
PLOS Computational Biology (2019-06-24) https://doi.org/c7np
DOI: 10.1371/journal.pcbi.1007128 · PMID: 31233491 · PMCID: PMC6611653