COVID-19 Public Data

Tracks COVID19 Cases

This project consolidates two of the major COVID19 data repositories and consolidates and standardizes two disparate sources.

Dependencies:

Intention

The intention of this repo is to understand and analyze the consolidated data using Spark Dataframes.

Frequency

Dependant on the frequency of the extract and load pipeline, all data will be sourced in s3 via the objects that is extracted by the data pipeline.

Output

Refer to Readme.md

Made s3 prefix public to download individual source files along with orc sources

All objects are compressed in GZIP format

Download Consolidated Dataset

  -Johns Hopkins
  aws s3 ls s3://poly-testing/covid/jhu  --recursive
  2020-04-01 08:43:11          0 covid/jhu/
  2020-04-02 05:30:58     329761 covid/jhu/UID_ISO_FIPS_LookUp_Table.csv
  2020-04-01 08:43:19          0 covid/jhu/raw/
  2020-04-17 05:13:43     314337 covid/jhu/raw/04-16-2020.csv
  2020-04-17 05:14:48    1223240 covid/jhu/transformed/2020-04-17/jhu_2020-04-17.gz
  
  -Data Scraper
  aws s3 ls s3://poly-testing/covid/cds  --recursive
  2020-04-17 05:14:49     819222 covid/cds/2020-04-17/cds_2020-04-17.gz
  
  -Combined
  aws s3 ls s3://poly-testing/covid/combined  --recursive
  2020-04-17 05:25:51          0 covid/orc/_SUCCESS
  2020-04-17 05:25:49    3834451 covid/orc/covid19_combined.gz

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
output		output
src/main		src/main
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COVID-19 Public Data

Intention

Frequency

Output

About

Releases

Packages

Languages

polyglotDataNerd/poly-spark-covid

Folders and files

Latest commit

History

Repository files navigation

COVID-19 Public Data

Intention

Frequency

Output

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages