Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map where clinical trials are recruiting #569

Merged
merged 17 commits into from
Aug 14, 2020

Conversation

rando2
Copy link
Contributor

@rando2 rando2 commented Aug 5, 2020

Description of the proposed additions or changes

This is a proposed addition to the EBM Data Lab COVID-19 TrialsTracker analysis that is used to generate some of the existing figures. Here, the countries represented in the EBM dataset are tabulated, with single-country clinical trials and multi-country clinical trials counted separately. Tabulation is done based on ISO codes, which are easier to match than country names.

Here is what it creates:
ebmdatalab-map

Questions:

  • Is modifying the environment.yml file sufficient to install the new packages on the virtual environment? This seems too simple, but I haven't figured out where else I need to add them.

Concerns:

  • There is some data cleaning that has to happen to match the EBM dataset to ISO codes -- I'm not sure what the best way to represent this information for reproducibility is (for example, if you search for South Korea, you'll see where I hard-coded in the ISO code based on the official name). I did these automatically with pycountry except where I could not find any workaround ("South Korea" and "Democratic Republic of [the] Congo")
  • For whatever reason it's very hard to handle side-by-side choropleths where the scales are quite different. I don't know what you think of these two color scales and the off-white background. 70 seems to be approximately equal to purple in the top figure...
  • I did run a linter, but some of its advice seemed a little counterintuitive, so I'm happy to hear if there's anything formatted in a confusing way.

I'm happy to hear suggestions either on the code or visualization side!

Related issues

#552

Suggested reviewers (optional)

@tlukan @agitter @cgreene @rdvelazquez

@rando2 rando2 changed the title Countries Map where the clinical trials we take are recruiting Aug 5, 2020
@rando2 rando2 changed the title Map where the clinical trials we take are recruiting Map where clinical trials are recruiting Aug 5, 2020
@cgreene
Copy link
Member

cgreene commented Aug 5, 2020

🇫🇷 is missing?

@agitter
Copy link
Collaborator

agitter commented Aug 5, 2020

Is modifying the environment.yml file sufficient to install the new packages on the virtual environment

Yes, that's all you need to do. This GitHub actions step magically takes care of everything else:

- name: Install environment
uses: goanpeca/setup-miniconda@v1
with:
activate-environment: covid19
environment-file: environment.yml
auto-activate-base: false
miniconda-version: 'latest'

@rando2
Copy link
Contributor Author

rando2 commented Aug 5, 2020

Per @cgreene's observation, something is very wrong with France (and a few other places) in this geopandas dataset: geopandas/geopandas#1041
I will add manual fixes for these and update!

@rando2
Copy link
Contributor Author

rando2 commented Aug 5, 2020

Vive La France!
ebmdatalab-map

@rando2 rando2 added the Technical Technical concerns, enhancements, etc. for the GitHub enthusiasts label Aug 5, 2020
@RLordan
Copy link
Collaborator

RLordan commented Aug 6, 2020

I have zero technical knowledge to help... but looks really cool folks, thank you

@rando2
Copy link
Contributor Author

rando2 commented Aug 6, 2020

Per @tlukan's suggestion, here is a description for what these figures show (possible draft for a legend):
The EBM Data Lab COVID-19 TrialsTracker dataset was used to identify where, at the country level, each clinical trial is or was enrolling participants. The number of clinical trials operating in each country was tabulated and is shown here through the color overlay on the map. The top figure shows the number of clinical trials specific to each country (e.g., trials recruiting only in that country). The bottom figure shows the same but for countries recruiting in multiple countries. Note that the scale differs between the top and bottom plots.

Here is a link to play with the data: http://covid19.trialstracker.net/

@rando2
Copy link
Contributor Author

rando2 commented Aug 6, 2020

Currently this is missing the following ISO codes: 'LIE', 'MLT', 'GUF', 'SMR', 'HKG', 'GIB', 'BHR', 'MCO', 'UMI', 'IMN', 'SGP', 'MTQ'
In most cases, I think the geopandas datasets lumps these in with other countries for various political reasons (e.g., HKG is Hong Kong, GUF is French Guiana). I'm working on splitting the maps to handle them separately because culturally they seem worth analyzing separately.

Copy link
Member

@cgreene cgreene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one quick question!

hit = pycountry.countries.search_fuzzy(country + ",")
elif isinstance(hit, type(None)):
hit = pycountry.countries.get(official_name=country)
except LookupError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two get calls here - does it matter which one raises the lookup error? I'm imagining that search_fuzzy wouldn't raise one form its name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only the pycountry.countries.get() command that throws this error, .search_fuzzy() I believe returns an empty list. Should this section be reformatted to better reflect that? I wasn't quite sure the best way to handle this many levels of possible alternative strategies, although in retrospect I could just make a function that would return a value whenever it first succeeds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed there are two .get calls, and if you hit this error the first one you will never do the fuzzy search. We also won't know which .get call failed. I'm not sure that either of these matter, so I wanted to raise them to see if it would make a difference.

Copy link
Contributor Author

@rando2 rando2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the review @cgreene!

hit = pycountry.countries.search_fuzzy(country + ",")
elif isinstance(hit, type(None)):
hit = pycountry.countries.get(official_name=country)
except LookupError:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only the pycountry.countries.get() command that throws this error, .search_fuzzy() I believe returns an empty list. Should this section be reformatted to better reflect that? I wasn't quite sure the best way to handle this many levels of possible alternative strategies, although in retrospect I could just make a function that would return a value whenever it first succeeds.

@rando2
Copy link
Contributor Author

rando2 commented Aug 13, 2020

Thank you so much @cgreene! I believe this handles exceptions a little more consistently. I checked with @mprobson and we couldn't brainstorm a more elegant structure, though there could be something we're not thinking of!

Copy link
Member

@cgreene cgreene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Very cool addition!

@cgreene already approved so I only reviewed the logic lightly. My main suggestion is on how to refer to this figure later in the manuscript.

plt.savefig(args.output_map + '.svg', bbox_inches="tight")

print(f'Wrote {args.output_map}.png and {args.output_map}.svg')

# The placeholder will be replaced by the actual SHA-1 hash in separate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest moving this block to follow line 208 (the Wrote {args.output_figure}.png and {args.output_figure}.svg message) to keep the trials figure code grouped together.

HM Rando and others added 2 commits August 14, 2020 10:13
Co-authored-by: Anthony Gitter <agitter@users.noreply.github.com>
@rando2
Copy link
Contributor Author

rando2 commented Aug 14, 2020

Thank you both so much for the review! Excited to see if this works overnight :)

@rando2 rando2 merged commit b6fd368 into greenelab:external-resources Aug 14, 2020
@rando2 rando2 deleted the countries branch August 14, 2020 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Technical Technical concerns, enhancements, etc. for the GitHub enthusiasts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants