From 7a0fd104208c281f5ce32b9605aeca5182198e9f Mon Sep 17 00:00:00 2001 From: Trang Le Date: Thu, 19 Mar 2020 15:01:44 +0000 Subject: [PATCH] Merge pull request #87 from greenelab/affi-results [ci skip] This build is based on https://github.com/greenelab/iscb-diversity-manuscript/commit/2001465edf8ac343a7d4b0026fd72f4ad0518ede. This commit was created by the following CI build and job: https://github.com/greenelab/iscb-diversity-manuscript/commit/2001465edf8ac343a7d4b0026fd72f4ad0518ede/checks https://github.com/greenelab/iscb-diversity-manuscript/runs/59003639 --- citations.tsv | 1 + manuscript.html | 317 +++++++++++++++++++++++++++++++++++++++++++----- manuscript.md | 81 ++++++++++--- manuscript.pdf | Bin 1295402 -> 1429702 bytes references.json | 31 +++++ variables.json | 16 +-- 6 files changed, 392 insertions(+), 54 deletions(-) diff --git a/citations.tsv b/citations.tsv index 03f93d5..e5fdd01 100644 --- a/citations.tsv +++ b/citations.tsv @@ -25,5 +25,6 @@ doi:10.1145/3132847.3133008 doi:10.1145/3132847.3133008 doi:10.1145/3132847.3133 doi:10.1371/journal.pbio.2004956 doi:10.1371/journal.pbio.2004956 doi:10.1371/journal.pbio.2004956 17aOPYsbT doi:10.1371/journal.pcbi.1003903 doi:10.1371/journal.pcbi.1003903 doi:10.1371/journal.pcbi.1003903 JHzGjxks doi:10.1371/journal.pcbi.1007128 doi:10.1371/journal.pcbi.1007128 doi:10.1371/journal.pcbi.1007128 YuJbg3zO +isbn:978-0849394447 isbn:978-0849394447 isbn:9780849394447 wH0Hk1gE url:https://factfinder.census.gov/help/en/race.htm url:https://factfinder.census.gov/help/en/race.htm url:https://factfinder.census.gov/help/en/race.htm Hhiridja url:https://web.archive.org/web/20010405061504/http://www.census.gov/Press-Release/www/2001/raceqandas.html url:https://web.archive.org/web/20010405061504/http://www.census.gov/Press-Release/www/2001/raceqandas.html url:https://web.archive.org/web/20010405061504/http://www.census.gov/Press-Release/www/2001/raceqandas.html KcIUCUC7 diff --git a/manuscript.html b/manuscript.html index 00e101e..db6c5a5 100644 --- a/manuscript.html +++ b/manuscript.html @@ -67,13 +67,13 @@ - - - + + + - - + + @@ -83,12 +83,12 @@

Analysis of ISCB honorees and keynotes reveals disparities

-

This version of the manuscript contains changes subsequent to the version 1.0 release.

+

This version of the manuscript contains changes subsequent to the version 1.0 release.

This manuscript -(permalink) +(permalink) was automatically generated -from greenelab/iscb-diversity-manuscript@5dc6311 +from greenelab/iscb-diversity-manuscript@2001465 on March 19, 2020.

@@ -251,6 +251,10 @@

Countries of Affiliations

This approach returns a single country for an affiliation when successful. When labeling affiliations with countries, we only used these values when geotext did not return results or had ambiguity amongst countries without multiple matches. For more details on this approach, consult the accompanying notebook and label dataset.

+

For ISCB honorees, during the curation process, if an honoree was listed with their affiliation at the time, we recorded this affiliation for analysis. +For ISCB Fellows, we used the affiliation listed on the ISCB page. +Because we could not find affiliations for the 1997 and 1998 RECOMB keynote speakers’ listed for these years, they were left blank. +If an author or speaker had more than one affiliation, each was inversely weighted by the number of affiliations that individual had.

Estimation of Gender

We predicted the gender of honorees and authors using the https://genderize.io API, which produces predictions trained on over 100 million name-gender pairings collected from the web. We used author and honoree first names to retrieve predictions from genderize.io. @@ -373,11 +377,10 @@

Estimation of Name Origin Groups

Affiliation Analysis

-

Along with the corresponding author names, we collected their affiliations recorded in each publication for this analysis. -During the honoree curation process, if an honoree was listed with their affiliation at the time, we recorded this affiliation for analysis. -For ISCB Fellows, we used the affiliation listed on the ISCB page. -Because we could not find affiliations for the 1997 and 1998 RECOMB keynote speakers’ listed for these years, they were left blank. -If an author or speaker has more than one affiliation, each is inversely weighted by the number of affiliations that individual has.

+

For each country, we computed the expected number of honorees by multiplying the proportion of authors whose affiliations were in that country with the total number of honorees. +We then performed an enrichment analysis to examine the difference in country affiliation proportions between ISCB honorees and field-specific corresponding authors. +We calculated each country’s enrichment by dividing the observed proportion of honorees by the expected proportion of honorees. +The variance of the log2 enrichment was estimated using the delta method with a small continuity correction to avoid dividing by 0 [20].

Results

Curated Honorees and Literature-derived Potential Honorees

We curated a dataset of ISCB honorees that included 411 honorees who were keynote speakers at international ISCB-associated conferences (ISMB, RECOMB, and PSB) as well as ISCB Fellows. @@ -398,7 +401,7 @@

Assessing Gender Div

We observed a slow increase of the proportion of predicted female authors, arriving at just over 20% in 2019 (Fig. 2, left). We observe very similar trend within each journal, but estimated female proportion has increased the least in PLOS Computational Biology (see notebook). ISCB Fellows and keynote speakers appear to be more evenly split between men and women compared to the population of authors published in computational biology and bioinformatics journals (Fig. 2, right); however, it has not yet reached parity. -Further, taking all the years together, a Welch two-sample t-test does not reveal any statistically significant difference in the mean probability of ISCB speakers predicted to be female compared to that of authors (\(t_{418} = 0.753\), \(p = 0.226\)). +Further, taking all the years together, a Welch two-sample t-test did not reveal any statistically significant difference in the mean probability of ISCB honorees predicted to be female compared to that of authors (t418 = 0.753, p = 0.226). We observed an increasing trend of honorees who were women in each honor category, especially in the group of ISCB Fellows (see notebook), which markedly increased after 2015. Through 2019, there were a number of examples of meetings or ISCB Fellow classes with a high probability of recognizing only male honorees and none that appeared to have exclusively female honorees. However, the 2020 PSB keynotes, though outside of the primary range of our analyses, had nearly all the probability ascribed to female speakers.

@@ -429,8 +432,8 @@

Asses

We directly compared honoree and author results from 1997 to 2020 for the predicted proportion of white, Asian, and other categories (Fig. 3E). -We found that, over the years, white honorees have been significantly overrepresented (\(t_{348} = 15.0\), \(p < 10^{-16}\)) and Asian honorees have been significantly underrepresented (\(t_{368} = -21.8\), \(p < 10^{-16}\)). -We also observed a higher mean probability of ISCB speakers predicted to be in Other categories compared to authors (\(t_{336} = 2.18\), \(p = 0.0296\)).

+We found that, over the years, white honorees have been significantly overrepresented (t348 = 15.0, p < 10-16) and Asian honorees have been significantly underrepresented (t368 = -21.8, p < 10-16). +We also observed a higher mean probability of ISCB speakers predicted to be in Other categories compared to authors (t336 = 2.18, p = 0.0296).

Predicting Name Origin Groups with LSTM Neural Networks and Wikipedia

We next aimed to predict the name origin groups of honorees and authors. We constructed a training dataset with more than 700,000 name-nationality pairs by parsing the English-language Wikipedia. @@ -471,6 +474,254 @@

Assessing t (B) For each region, the mean predicted probability of Pubmed articles is shown as teal LOESS curve, and the mean probability and 95% confidence interval of the ISCB honoree predictions are shown as dark circles and vertical lines. +

Affiliation Analysis

+

We analyzed the countries of affiliation between corresponding authors and ISCB honorees. +For each country, we report a value of log enrichment (LOE) and its 95% confidence intervals (Table 2). +A positive value of LOE indicates a higher proportion of honorees affiliated with that country compared to authors. +A LOE value of 1 represents a one-fold enrichment (i.e., observed number of honorees is twice as much as expected). +In the 20 countries with the most publications, we found an overrepresentation of honorees affiliated with institutions and companies in the US (97 speakers more than expected, LOE = 0.6, 95% CI (0.5, 0.8)) and Israel (12 speakers more than expected, LOR = 1.6 (0.9, 2.3)) and an underrepresentation of honorees affiliated with those in China, France, Italy, the Netherlands, Taiwan, and India (Fig. 6).

+
+ + ++++++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 2: Enrichment and depletion in proportion of ISCB honorees compared to Pubmed corresponding authors of 20 countries with the most publications. +The table lists the countries and their corresponding enrichment, which we computed by dividing the observed proportion of honorees by expected proportion of honorees. +The expected proportion was calculated using corresponding author proportions. +A positive Log2(Enrichment) indicated a higher proportion of honorees than corresponding authors affiliated with that country. +The full table with all countries can be browsed interactively in the corresponding analysis notebook. +
CountryAuthor proportionObservedExpectedObserved - ExpectedEnrichmentLog2(Enrichment)95% Confidence Interval
United States38.76%237.5152.784.81.60.6(0.5, 0.8)
United Kingdom8.36%36.032.93.11.10.1(-0.3, 0.6)
Germany7.55%27.029.7-2.70.9-0.1(-0.7, 0.4)
China5.82%3.022.9-19.90.1-2.9(-4.5, -1.3)
France3.86%4.015.2-11.20.3-1.9(-3.3, -0.5)
Italy3.04%2.012.0-10.00.2-2.6(-4.5, -0.6)
Canada3.03%12.011.90.11.00.0(-0.8, 0.8)
Japan2.44%9.09.6-0.60.9-0.1(-1, 0.8)
Spain2.39%6.09.4-3.40.6-0.7(-1.8, 0.5)
Australia2.33%5.09.2-4.20.5-0.9(-2.1, 0.4)
Netherlands1.91%1.07.5-6.50.1-2.9(-5.6, -0.2)
Switzerland1.81%7.07.1-0.11.0-0.0(-1.1, 1)
Israel1.46%17.55.811.73.01.6(0.9, 2.3)
Sweden1.34%6.05.30.71.10.2(-1, 1.3)
Korea1.30%1.05.1-4.10.2-2.4(-5.1, 0.3)
Taiwan1.25%0.04.9-4.90.0(-Inf, -Inf)
India1.20%0.04.7-4.70.0(-Inf, -Inf)
Belgium1.04%1.04.1-3.10.2-2.0(-4.7, 0.7)
Singapore0.88%1.03.5-2.50.3-1.8(-4.5, 0.9)
Finland0.85%0.03.4-3.40.0(-Inf, -Inf)
+
+
+
+
Figure 6: The overrepresentation of honorees affiliated with institutions and companies in the US and Israel contrasts the underrepresentation of honorees affiliated with those in China, France, Italy, the Netherlands, Taiwan, and India. +For each country, enrichment is computed by dividing the observed proportion of honorees by the expected proportion of honorees whose affiliations are in that country, and 95% confidence interval of the log is estimated with the delta method (left). +Observed (triangle) and expected (circle) number of honorees and their differences (observed - expected) are shown in square-root scale on the right. +Countries are ordered based on the proportion of authors in the field.
+
+

Conclusions

A major challenge that we faced in carrying out this work was to narrow down geographic origins for some groups of names. Some groupings, such as Group D, are geographically disparate. @@ -479,7 +730,7 @@

Conclusions

Group D honoree counts are influenced from Spain as well as Latin America. In such cases, our analyses may substantially understate the extent to which minoritized scientists are underrepresented among honorees and authors.

Biases in authorship practices may also result in our underestimation of the composition of minoritized scientists within the field. -We estimated the composition of the field using corresponding author status, but in neuroscience [20] and other disciplines [21] women are underrepresented among such authors. +We estimated the composition of the field using corresponding author status, but in neuroscience [21] and other disciplines [22] women are underrepresented among such authors. Such an effect would cause us to underestimate the number of women in the field. Though this effect has been studied with respect to gender, we are not aware of similar work examining race, ethnicity, or name origins.

We acknowledged that our supervised learning approaches are neither error free nor bias free. @@ -491,9 +742,9 @@

Conclusions

Because invitation and honor patterns could be driven by biases associated with name groups, geography, or other factors, we cross-referenced name group predictions with author affiliations could help to disentangle the relationship between geographic regions, name groups and invitation probabilities.

An important questions to ask when measuring representation is what the right level of representation is. We suggest that considering equity may be more appropriate than strictly diversity. -In addition to holding fewer corresponding authorship positions, on average, female scientists of different disciplines are cited less often [22], invited by journals to submit papers less often [21], suggested as reviewers less often [24], and receive significantly worse review scores [23]. -Societies, both through their honorees and the individuals who deliver keynotes at their meetings, can play a positive role in improving the presence of female STEM role models, which, for example, may lead to higher persistence for undergraduate women in geoscience [25]. -Efforts are underway to create Wikipedia entries for more female [26] and black, Asian, and minority scientists [27], which can help early-career scientists identify role models. +In addition to holding fewer corresponding authorship positions, on average, female scientists of different disciplines are cited less often [23], invited by journals to submit papers less often [22], suggested as reviewers less often [25], and receive significantly worse review scores [24]. +Societies, both through their honorees and the individuals who deliver keynotes at their meetings, can play a positive role in improving the presence of female STEM role models, which, for example, may lead to higher persistence for undergraduate women in geoscience [26]. +Efforts are underway to create Wikipedia entries for more female [27] and black, Asian, and minority scientists [28], which can help early-career scientists identify role models. We find that ISCB’s honorees and keynote speakers, though not yet reaching gender parity, appear to be more evenly split between men and women than the field as a whole. On the other hand, honorees include significantly fewer people of color than the field as a whole, and Asian scientists are dramatically underrepresented among honorees. Although we estimate the fraction of non-white and non-Asian authors to be relatively similar to the estimated honoree rate, we note that both are represented at levels substantially lower than in the US population. @@ -503,7 +754,7 @@

Conclusions

These could be affected by explicit biases, implicit biases, or pernicious biases in which a reviewer might consider a path of inquiry, as opposed to an individual, to be more or less meritorious based on the reviewer’s own background [1]. Our efforts to measure the diversity of honorees in an international society suggests that, while a focus on gender parity may be improving some aspects of diversity among honorees, contributions from scientists of color are underrecognized.

Data and Resource Availability

-

This manuscript was written openly on GitHub using Manubot [28]. +

This manuscript was written openly on GitHub using Manubot [29]. The Manubot HTML version is available under a Creative Commons Attribution (CC BY 4.0) License at https://greenelab.github.io/iscb-diversity-manuscript/. Our analysis of authors and ISCB-associated honorees is available under CC BY 4.0 at https://github.com/greenelab/iscb-diversity, with source code also distributed under a BSD 3-Clause License. Rendered Python and R notebooks from this repository are browsable at https://greenelab.github.io/iscb-diversity/. @@ -618,55 +869,61 @@

References

Association for Computing Machinery (ACM) (2017) https://doi.org/ggjc78
DOI: 10.1145/3132847.3133008

+
+

20. Statistics in epidemiology: methods, techniques, and applications
+Hardeo Sahai, Anwer Khurshid
+CRC Press (1996)
+ISBN: 9780849394447

+
-

20. Persistent Underrepresentation of Women’s Science in High Profile Journals
+

21. Persistent Underrepresentation of Women’s Science in High Profile Journals
Yiqin Alicia Shen, Jason M. Webster, Yuichi Shoda, Ione Fine
bioRxiv (2018-03-08) https://doi.org/cmh5
DOI: 10.1101/275362

-

21. The gender gap in science: How long until women are equally represented?
+

22. The gender gap in science: How long until women are equally represented?
Luke Holman, Devi Stuart-Fox, Cindy E. Hauser
PLOS Biology (2018-04-19) https://doi.org/gdb9db
DOI: 10.1371/journal.pbio.2004956 · PMID: 29672508 · PMCID: PMC5908072

-

22. The extent and drivers of gender imbalance in neuroscience reference lists
+

23. The extent and drivers of gender imbalance in neuroscience reference lists
Jordan D. Dworkin, Kristin A. Linn, Erin G. Teich, Perry Zurn, Russell T. Shinohara, Danielle S. Bassett
arXiv (2020-01-07) https://arxiv.org/abs/2001.01002

-

23. Gender differences in peer review outcomes and manuscript impact at six journals of ecology and evolution
+

24. Gender differences in peer review outcomes and manuscript impact at six journals of ecology and evolution
Charles W. Fox, C. E. Timothy Paine
Ecology and Evolution (2019-03-04) https://doi.org/gfwjjb
DOI: 10.1002/ece3.4993 · PMID: 30962913 · PMCID: PMC6434606

-

24. Journals invite too few women to referee
+

25. Journals invite too few women to referee
Jory Lerback, Brooks Hanson
Nature (2017-01-26) https://doi.org/gf4jjz
DOI: 10.1038/541455a · PMID: 28128272

-

25. Role modeling is a viable retention strategy for undergraduate women in the geosciences
+

26. Role modeling is a viable retention strategy for undergraduate women in the geosciences
Paul R. Hernandez, Brittany Bloodhart, Amanda S. Adams, Rebecca T. Barnes, Melissa Burt, Sandra M. Clinton, Wenyi Du, Elaine Godfrey, Heather Henderson, Ilana B. Pollack, Emily V. Fischer
Geosphere (2018-10-31) https://doi.org/gghp9d
DOI: 10.1130/ges01659.1

-

26. Why we’re editing women scientists onto Wikipedia
+

27. Why we’re editing women scientists onto Wikipedia
Jess Wade, Maryam Zaringhalam
Nature (2018-08-14) https://doi.org/gdz52z
DOI: 10.1038/d41586-018-05947-8

-

27. Why we’re creating Wikipedia profiles for BAME scientists
+

28. Why we’re creating Wikipedia profiles for BAME scientists
Nicola O’Reilly
Nature (2019-03-07) https://doi.org/gfwhcr
DOI: 10.1038/d41586-019-00812-8

-

28. Open collaborative writing with Manubot
+

29. Open collaborative writing with Manubot
Daniel S. Himmelstein, Vincent Rubinetti, David R. Slochower, Dongbo Hu, Venkat S. Malladi, Casey S. Greene, Anthony Gitter
PLOS Computational Biology (2019-06-24) https://doi.org/c7np
DOI: 10.1371/journal.pcbi.1007128 · PMID: 31233491 · PMCID: PMC6611653

diff --git a/manuscript.md b/manuscript.md index 0d0aa7f..db4d454 100644 --- a/manuscript.md +++ b/manuscript.md @@ -96,19 +96,19 @@ header-includes: '\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n", + "header-includes": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n", "bibliography": [ "content/manual-references.json" ], @@ -124,20 +124,20 @@ "repo_slug": "greenelab/iscb-diversity-manuscript", "repo_owner": "greenelab", "repo_name": "iscb-diversity-manuscript", - "commit": "5dc6311c9519b47e4aa088c2b776aeab3d8a3e80", - "triggering_commit": "5dc6311c9519b47e4aa088c2b776aeab3d8a3e80", - "build_url": "https://github.com/greenelab/iscb-diversity-manuscript/commit/5dc6311c9519b47e4aa088c2b776aeab3d8a3e80/checks", + "commit": "2001465edf8ac343a7d4b0026fd72f4ad0518ede", + "triggering_commit": "2001465edf8ac343a7d4b0026fd72f4ad0518ede", + "build_url": "https://github.com/greenelab/iscb-diversity-manuscript/commit/2001465edf8ac343a7d4b0026fd72f4ad0518ede/checks", "job_url": "https://github.com/greenelab/iscb-diversity-manuscript/runs/run2" }, "html_url": "https://greenelab.github.io/iscb-diversity-manuscript/", "pdf_url": "https://greenelab.github.io/iscb-diversity-manuscript/manuscript.pdf", - "html_url_versioned": "https://greenelab.github.io/iscb-diversity-manuscript/v/5dc6311c9519b47e4aa088c2b776aeab3d8a3e80/", - "pdf_url_versioned": "https://greenelab.github.io/iscb-diversity-manuscript/v/5dc6311c9519b47e4aa088c2b776aeab3d8a3e80/manuscript.pdf", + "html_url_versioned": "https://greenelab.github.io/iscb-diversity-manuscript/v/2001465edf8ac343a7d4b0026fd72f4ad0518ede/", + "pdf_url_versioned": "https://greenelab.github.io/iscb-diversity-manuscript/v/2001465edf8ac343a7d4b0026fd72f4ad0518ede/manuscript.pdf", "manubot_version": "0.3.1", "rootstock_commit": "1780fac0ac6bba1260a9da3886061730fa5d2765", - "thumbnail_url": "https://github.com/greenelab/iscb-diversity-manuscript/raw/5dc6311c9519b47e4aa088c2b776aeab3d8a3e80/build/assets/thumbnail.png", + "thumbnail_url": "https://github.com/greenelab/iscb-diversity-manuscript/raw/2001465edf8ac343a7d4b0026fd72f4ad0518ede/build/assets/thumbnail.png", "manuscript_stats": { - "word_count": 6157 + "word_count": 6942 } } }