Map where clinical trials are recruiting #569

rando2 · 2020-08-05T18:54:04Z

Description of the proposed additions or changes

This is a proposed addition to the EBM Data Lab COVID-19 TrialsTracker analysis that is used to generate some of the existing figures. Here, the countries represented in the EBM dataset are tabulated, with single-country clinical trials and multi-country clinical trials counted separately. Tabulation is done based on ISO codes, which are easier to match than country names.

Here is what it creates:

Questions:

Is modifying the environment.yml file sufficient to install the new packages on the virtual environment? This seems too simple, but I haven't figured out where else I need to add them.

Concerns:

There is some data cleaning that has to happen to match the EBM dataset to ISO codes -- I'm not sure what the best way to represent this information for reproducibility is (for example, if you search for South Korea, you'll see where I hard-coded in the ISO code based on the official name). I did these automatically with pycountry except where I could not find any workaround ("South Korea" and "Democratic Republic of [the] Congo")
For whatever reason it's very hard to handle side-by-side choropleths where the scales are quite different. I don't know what you think of these two color scales and the off-white background. 70 seems to be approximately equal to purple in the top figure...
I did run a linter, but some of its advice seemed a little counterintuitive, so I'm happy to hear if there's anything formatted in a confusing way.

I'm happy to hear suggestions either on the code or visualization side!

Related issues

#552

Suggested reviewers (optional)

@tlukan @agitter @cgreene @rdvelazquez

ebmdatalab/generate-ebmdatalab-stats.py

cgreene · 2020-08-05T19:17:20Z

🇫🇷 is missing?

agitter · 2020-08-05T20:00:32Z

Is modifying the environment.yml file sufficient to install the new packages on the virtual environment

Yes, that's all you need to do. This GitHub actions step magically takes care of everything else:

covid19-review/.github/workflows/update-external-resources.yaml

Lines 17 to 23 in e3bd878

    
           - name: Install environment 
        
             uses: goanpeca/setup-miniconda@v1 
        
             with: 
        
               activate-environment: covid19 
        
               environment-file: environment.yml 
        
               auto-activate-base: false 
        
               miniconda-version: 'latest'

rando2 · 2020-08-05T22:00:22Z

Per @cgreene's observation, something is very wrong with France (and a few other places) in this geopandas dataset: geopandas/geopandas#1041
I will add manual fixes for these and update!

…ntries because formatting was done on remote

rando2 · 2020-08-05T22:10:16Z

Vive La France!

RLordan · 2020-08-06T03:56:15Z

I have zero technical knowledge to help... but looks really cool folks, thank you

rando2 · 2020-08-06T14:49:42Z

Per @tlukan's suggestion, here is a description for what these figures show (possible draft for a legend):
The EBM Data Lab COVID-19 TrialsTracker dataset was used to identify where, at the country level, each clinical trial is or was enrolling participants. The number of clinical trials operating in each country was tabulated and is shown here through the color overlay on the map. The top figure shows the number of clinical trials specific to each country (e.g., trials recruiting only in that country). The bottom figure shows the same but for countries recruiting in multiple countries. Note that the scale differs between the top and bottom plots.

Here is a link to play with the data: http://covid19.trialstracker.net/

rando2 · 2020-08-06T19:07:33Z

Currently this is missing the following ISO codes: 'LIE', 'MLT', 'GUF', 'SMR', 'HKG', 'GIB', 'BHR', 'MCO', 'UMI', 'IMN', 'SGP', 'MTQ'
In most cases, I think the geopandas datasets lumps these in with other countries for various political reasons (e.g., HKG is Hong Kong, GUF is French Guiana). I'm working on splitting the maps to handle them separately because culturally they seem worth analyzing separately.

cgreene

Just one quick question!

cgreene · 2020-08-13T11:54:16Z

ebmdatalab/generate-ebmdatalab-stats.py

+                        hit = pycountry.countries.search_fuzzy(country + ",")
+                    elif isinstance(hit, type(None)):
+                        hit = pycountry.countries.get(official_name=country)
+            except LookupError:


There are two get calls here - does it matter which one raises the lookup error? I'm imagining that search_fuzzy wouldn't raise one form its name.

It is only the pycountry.countries.get() command that throws this error, .search_fuzzy() I believe returns an empty list. Should this section be reformatted to better reflect that? I wasn't quite sure the best way to handle this many levels of possible alternative strategies, although in retrospect I could just make a function that would return a value whenever it first succeeds.

I noticed there are two .get calls, and if you hit this error the first one you will never do the fuzzy search. We also won't know which .get call failed. I'm not sure that either of these matter, so I wanted to raise them to see if it would make a difference.

rando2

Thank you so much for the review @cgreene!

rando2 · 2020-08-13T13:46:26Z

ebmdatalab/generate-ebmdatalab-stats.py

+                        hit = pycountry.countries.search_fuzzy(country + ",")
+                    elif isinstance(hit, type(None)):
+                        hit = pycountry.countries.get(official_name=country)
+            except LookupError:


It is only the pycountry.countries.get() command that throws this error, .search_fuzzy() I believe returns an empty list. Should this section be reformatted to better reflect that? I wasn't quite sure the best way to handle this many levels of possible alternative strategies, although in retrospect I could just make a function that would return a value whenever it first succeeds.

rando2 · 2020-08-13T23:12:10Z

Thank you so much @cgreene! I believe this handles exceptions a little more consistently. I checked with @mprobson and we couldn't brainstorm a more elegant structure, though there could be something we're not thinking of!

cgreene

Looks good!

agitter

This looks good to me. Very cool addition!

@cgreene already approved so I only reviewed the logic lightly. My main suggestion is on how to refer to this figure later in the manuscript.

ebmdatalab/generate-ebmdatalab-stats.py

agitter · 2020-08-14T13:14:44Z

ebmdatalab/generate-ebmdatalab-stats.py

+    plt.savefig(args.output_map + '.svg', bbox_inches="tight")
+
+    print(f'Wrote {args.output_map}.png and {args.output_map}.svg')
+
    # The placeholder will be replaced by the actual SHA-1 hash in separate


I suggest moving this block to follow line 208 (the Wrote {args.output_figure}.png and {args.output_figure}.svg message) to keep the trials figure code grouped together.

ebmdatalab/generate-ebmdatalab-stats.py

Co-authored-by: Anthony Gitter <agitter@users.noreply.github.com>

rando2 · 2020-08-14T15:56:08Z

Thank you both so much for the review! Excited to see if this works overnight :)

HM Rando added 10 commits August 4, 2020 15:05

wrangle country data

37b4796

test pycountry

df5e59c

data cleaning

76ff8b8

attempt to merge df, still buggy

e7df1d8

generate side-by-side choropleths

1019feb

tried to use geoplot, switching back

72834c4

generate choropleth with geopandas

1d15259

clean up code and fig

dc38c1a

update environment.yml

fd73842

linted

f7db92c

rando2 changed the title ~~Countries~~ Map where the clinical trials we take are recruiting Aug 5, 2020

rando2 changed the title ~~Map where the clinical trials we take are recruiting~~ Map where clinical trials are recruiting Aug 5, 2020

rando2 commented Aug 5, 2020

View reviewed changes

ebmdatalab/generate-ebmdatalab-stats.py Outdated Show resolved Hide resolved

remove extra newline

8e034e5

HM Rando added 2 commits August 5, 2020 18:08

fix issue with geopandas world dataset

a626b18

Merge branch 'countries' of github.com:rando2/covid19-review into cou…

42707cb

…ntries because formatting was done on remote

linter

559e2c3

rando2 added the Technical Technical concerns, enhancements, etc. for the GitHub enthusiasts label Aug 5, 2020

cgreene reviewed Aug 13, 2020

View reviewed changes

rando2 commented Aug 13, 2020

View reviewed changes

handle exceptions

22440a8

cgreene approved these changes Aug 14, 2020

View reviewed changes

agitter requested changes Aug 14, 2020

View reviewed changes

HM Rando and others added 2 commits August 14, 2020 10:13

Apply @agitter's suggestions

a47021c

Co-authored-by: Anthony Gitter <agitter@users.noreply.github.com>

rearrange blocks to keep plots together

9bd7423

agitter approved these changes Aug 14, 2020

View reviewed changes

rando2 merged commit b6fd368 into greenelab:external-resources Aug 14, 2020

rando2 deleted the countries branch August 14, 2020 15:56

rando2 mentioned this pull request Aug 14, 2020

Need to fix mappings for 12 ISO codes in choropleth #603

Open

agitter mentioned this pull request Aug 15, 2020

Update external resources environment #607

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Map where clinical trials are recruiting #569

Map where clinical trials are recruiting #569

rando2 commented Aug 5, 2020

cgreene commented Aug 5, 2020

agitter commented Aug 5, 2020

rando2 commented Aug 5, 2020 •

edited

Loading

rando2 commented Aug 5, 2020

RLordan commented Aug 6, 2020

rando2 commented Aug 6, 2020 •

edited

Loading

rando2 commented Aug 6, 2020

cgreene left a comment

cgreene Aug 13, 2020

rando2 Aug 13, 2020

cgreene Aug 13, 2020

rando2 left a comment

rando2 Aug 13, 2020

rando2 commented Aug 13, 2020

cgreene left a comment

agitter left a comment

agitter Aug 14, 2020

rando2 commented Aug 14, 2020 •

edited

Loading

Map where clinical trials are recruiting #569

Map where clinical trials are recruiting #569

Conversation

rando2 commented Aug 5, 2020

Description of the proposed additions or changes

Related issues

Suggested reviewers (optional)

cgreene commented Aug 5, 2020

agitter commented Aug 5, 2020

rando2 commented Aug 5, 2020 • edited Loading

rando2 commented Aug 5, 2020

RLordan commented Aug 6, 2020

rando2 commented Aug 6, 2020 • edited Loading

rando2 commented Aug 6, 2020

cgreene left a comment

Choose a reason for hiding this comment

cgreene Aug 13, 2020

Choose a reason for hiding this comment

rando2 Aug 13, 2020

Choose a reason for hiding this comment

cgreene Aug 13, 2020

Choose a reason for hiding this comment

rando2 left a comment

Choose a reason for hiding this comment

rando2 Aug 13, 2020

Choose a reason for hiding this comment

rando2 commented Aug 13, 2020

cgreene left a comment

Choose a reason for hiding this comment

agitter left a comment

Choose a reason for hiding this comment

agitter Aug 14, 2020

Choose a reason for hiding this comment

rando2 commented Aug 14, 2020 • edited Loading

rando2 commented Aug 5, 2020 •

edited

Loading

rando2 commented Aug 6, 2020 •

edited

Loading

rando2 commented Aug 14, 2020 •

edited

Loading