User Guide for ACCOuNT Data Commons
-Hosted Data Overview
-ACCOuNT currently hosts genotyping, sequencing, and clinical data from the following programs:
--
-
-
- - Open Access (Open-GEO) - -
- - NU African American Hepatocytes - -
- - The Discovery Project - -
- - MESA (Multi-Ethnic Study of Atherosclerosis) - -
- We list a description of each program and associated studies below. -
- -- Open-GEO contains genotype data for more than 300 participant samples - associated with lung cancer cell lines, as well as associated phenotype - information. The genotype data consists of mRNA microarrays and - unaligned reads in the form of CEL and FASTQ files. Phenotype data - includes information related to demographics, exposure, diagnosis, - treatment, follow-up, and medical history of a subject. For more - information on GEO, please visit the NCBI homepage located - here. -
-- NU African American Hepatocyte: - This study contains genotype and associated transcriptome profile - from hepatocytes obtained from African American livers. Transcriptomes - were measured under the following seven treatments: Phenobarbital, - Dexamethasone, Carbamazepine, Phenytoin, Rifampicin, DSMO, and - Omeprazole. Indications on which drug is used can be found on the - last digit of the submitter_id: - XX-YY-Z, where Z displays a number from 1 to 7 indicating which - drug is used. DSMO is 1, Omeprazole is 2, Phenobarbital is 3, - Dexamethasone is 4, Carbamazepine is 5, Phenytoin is 6, and - Rifampicin is 7. - Genotype data includes submitted aligned reads and aligned read - index files in the form of BAM files. Phenotype data consists of - basic demographics when available. All data is also available - at GSE147628, - 124074, and - 123995. - - Note: “NU” stands for “Northwestern University”. -
-- The Discovery Project identifies genetic biomarkers (or SNPs) - to cardiovascular drug response and disease susceptibility. - “The Discovery Project” currently has phenotype and genotype - data stored across 4 projects or research “arms.” All projects, - except for healthy_platelet, contain demographic, drug phenotypic - and clinical data. The particular elements for each arm are listed - below. All arms have basic demographic data. Below we describe - the data listed for each project: -
--
-
- - Healthy_platelet: Genotype files consist of unaligned reads, - SNP arrays, genotyping arrays, simple germline variations, - germline variation indices in the form of FASTQ, CEL, IDAT, - VCF, and CSI files. Transcriptome data is available as FASTQ - and BAM files. These are healthy African American volunteers with - age values (sex is also available). - - -
-
-
- - Warfarin: Genotype files - including simple germline variations, germline variation - indices, and genotyping arrays in the form of IDAT, VCF, - and CSI files. In addition, this study contains a CSV file - with a comprehensive clinical dataset. Example clinical - properties in this file include but are not limited to a - list of comorbidities, warfarin dose, target INR, reason - for warfarin treatment, and adverse bleeding events that - occured while on treatment. All subjects were followed for - 6-months after enrollment in the study. - -
-
-
- - Novel Oral Anticoagulants (NOAC): - Genotype files for simple germline variations, germline variation - indices, and genotyping arrays in the form of IDAT, VCF, and - CSI files. In addition, this study contains a CSV file with - a comprehensive clinical dataset. Example clinical properties - in this file include but are not limited: list of comorbidities, - which NOAC drug was used, therapeutic dose, or list of active - medications (RX and OTC) and adverse bleeding and clotting - events that occured while on treatment, and a clinical Anti-Xa - lab measure taken while on treatment. All subjects were - followed for 6-months after enrollment in the study. - -
-
-
- - Clopidogrel: - Genotype files that include simple germline variations, germline - variation indices, genotyping arrays, and SNP arrays in the form - of IDAT, CEL, VCF, and CSI files. In addition, this study contains a - CSV file with a comprehensive clinical dataset. Example clinical - properties in this file include but are not limited to the - following properties: list of comorbidities, clopidogrel dose, - aspirin/NSAIDs dose, proton pump inhibitors, other medications, - adverse bleeding and clotting events that occured while on treatment, - and a clinical PRU lab measure taken while on treatment. - All subjects were followed for 6-months after enrollment in the study. - - -
- MESA (Multi-Ethnic Study of - Atherosclerosis) contains -omics data, such as genomic, transcriptomic, - and proteomic data. This program consists of 2 studies with data files - including ZIP, TAR, LOG, and TXT. The 2 available studies are: -
-
-
- - MESA_AACAC - -
- - SHARe - -

How to get started
- - -1. Login to the ACCOuNT Data Commons
--
- a. Homepage -
-- In order to navigate and access data available on the Gen3 - platform, please start by visiting the login page - (https://acct.bionimbus.org). Use the “Login to Google” - on the homepage to authenticate: -
-
- After successfully logging in, your username will appear - in the top-right corner of the page. You will also see - a display with summary statistics for the total number of - projects, subjects, and files available within the ACCOuNT - Data Commons. -
-
- b. How to check your data access -
-- To check your user access information, go to the website - (https://acct.bionimbus.org) and click on “Login from Google”. - In the “Exploration” tab located at the right-hand side of the - page, you will be presented with the data you have access to under - the Studies Sub-tab, Project_Id. -
-- Similarly, you can visit the “Profile” page and view a - list of your approved studies under the “You have access to - the following project(s)” section, as shown in the figure below. -
-
- Currently, the ACCOuNT Data Commons hosts only 1 open access dataset, - open-GEO. Users can request access to controlled data by - visiting the dbGaP homepage or get into contact with the - ACCOuNT consortium (https://precisionmedicine4all.com). -
- - - - -2. Data Explorer
-- The Exploration Page - provides users with a venue to easily - search through data and create cohorts via faceted search - fields. By leveraging the different faceted search categories, - users can not only create virtual cohorts within individual studies, - but also across any of the available studies (with proper user - authorization). Below, we explore the different core sections of - the Exploration page. - - -
- a. The Subject Tab -
-- Under the “Subject” Tab, users can leverage the ACCOuNT data - model and a list of available phenotypic properties to create - virtual cohorts. The selection of search facets will dynamically - update the virtual cohort at the subject level. If no facets have - been selected, all of the data accessible to the user will be - displayed. At this time, users can filter based on properties - grouped into 3 categories displayed as sub-tabs: Studies, - Demographics and Clinical (image below). -
-

- Open Access Data -
-- A user can view all of the summary data and associated study information they - have access to, including but not limited to Project ID, file types, and clinical - variables. Currently, the ACCOuNT Data Commons hosts open access data seen in the - project open-GEO. -
-
- Users can request access by visiting the dbGaP webpage - or get into contact with the ACCOuNT consortium. -
-- b. The File Tab -
-- The File tab provides users with search facets in order to dynamically generate lists - of desired files (“Projects”, “File Category”, “File Type”, etc.). The complete list - of selected files and associated metadata will be displayed in the table at the bottom - right-hand corner of the page. -
-
3. Downloading Data and Data Files
-- Users can download clinical metadata directly from the Data Explorer (section - a below) or download files to their local environment by following 2 different - processes. The first process allows for single-file downloads directly through - the portal, and is explained in Section b below. The second process allows for - multi-file downloads by leveraging the Gen3-client and a python SDK, as - detailed in Section c. -
-- a. Clinical Data and File Metadata Downloads via the Subject and File Tab -
-- Users can download clinical data and file metadata from the Explorer Page. - On the Subject Tab, users can download dynamically updated virtual cohorts at the subject level in table format, for example demographics, clinical data, and the submitter_ids. - On the File Tab, users can download metadata associated with the data files, such as size, guids, data type, and submitter_ids. -
-- Users need to click “Download Table” to save it to the local machine. -
-
- b. Single-file Downloads via the File Tab -
-- Users can download files individually by visiting the summary table in the - File tab (Refer to Section 2b for more information). The table provides a column - titled GUID (Globally Unique Identifier) with a link to a download page. -
-
- Once a GUID is selected, the user will be redirected to the download page as displayed below: -
-
- Example: Finding and Downloading Individual Phenotypic Data Files -
-
-
-
- - Under the File Tab, select “CSV” beneath the “File Format” facet. - -
- - Select one of the GUIDs (Globally Unique Identifier) stored for the phenotypic data files. - -
- - Click “Download” on the next page. - -
- b. Downloading Multiple Data Files -
-- In order to download multiple files, you will need to utilize the gen3-client command - line tool developed by the University of Chicago’s Center for Translational Data Science. - For detailed information, you can visit the following page - https://gen3.org/resources/user/gen3-client/. - The process consists of the 4 following steps: -
--
-
- - Create and download an API key. - -
- - Prepare and download a file manifest of your selected cohort. - -
- - Download and configure the gen3-client. - -
- - Download the files. - -
- 1. Create and save an API key from the “Profile” page: -
- -

- 2. Prepare and download a file manifest -
--
-
- - Under the File Tab, select all data files of interest. Users can sort - after project, file type, file category, file format, or find file names - in the free text search bar. - -
- - Click on the “Download Manifest” button. Note the directory where it was saved. - -

- 3. Download and configure the Gen3-client -
--
-
- - Follow the download instructions of the Gen3-client here. - -
- - In your terminal, configure your profile using the following command: - -
- gen3-client configure --profile=<profile_name> --cred=<credentials.json> --apiendpoint=<api_endpoint_url>
-
-
- - Example on Mac/Linux: -
-
- gen3-client configure --profile=demo --cred=~/Downloads/demo-credentials.json --apiendpoint=https://acct.bionimbus.org/
-
- - Example on Windows: -
-
- gen3-client configure --profile=demo --cred=C:\Users\demo\Downloads\demo-credentials.json --apiendpoint=https://acct.bionimbus.org/
-
--2021/02/23 10:08:20 Profile 'demo' has been configured successfully. --
- If successfully executed, a configuration file will be stored under the directory the user specified under “cred”. -
-- Users should confirm if the profile was configured successfully using the following command, which should list your - access privileges for each project in the commons you have access to: -
- -
- gen3-client auth --profile=demo
-
--2021/02/23 10:12:10 -You have access to the following project(s) at https://acct.bionimbus.org: -2021/02/23 10:12:10 discovery [create delete read read-storage update upload] -2021/02/23 10:12:10 mesa [create delete read read-storage update upload] -2021/02/23 10:12:10 nu [create delete read read-storage update upload] -2021/02/23 10:12:10 open [create delete read read-storage update upload] --
- For troubleshooting, refer to the instructions found here. -
- - -- 4. Download the data with the file manifest generated in Step 2 -
-- Run the following commands in the terminal: -
-
- gen3-client download-multiple --profile=<profile_name> --manifest=<manifest_file> --download-path=<path_for_files>
-
- - Example: -
-
- gen3-client download-multiple --profile=demo --manifest=manifest.json --download-path=downloads
-
--2021/02/23 10:26:58 Reading manifest... -598 B / 598 B [======] 100.00% 0s -2021/02/23 10:27:00 Total number of GUIDs: 3 -2021/02/23 10:27:00 Preparing file info for each file, please wait... -3 / 3 [======] 100.00% 0s -2021/02/23 10:27:00 File info prepared successfully -account_warfarin_processed.csv 243.71 KiB / 243.71 KiB [======] 100.00% -account_clopidogrel_processed.csv 197.93 KiB / 197.93 KiB [======] 100.00% -account_noac_processed.csv 184.89 KiB / 184.89 KiB [======] 100.00% -2021/02/23 10:27:02 3 files downloaded. --
- For more information see the User Guide. -
- - - -4. Understanding the Gen3 data model
-- Gen3-powered Data Commons employ a data model which describes, organizes, - and establishes relationships for data across different datasets. The data - model organizes and connects different categories, "nodes", with experimental - metadata variables, “properties”. The data dictionary describes all nodes and - enlisting properties in each node. -
-- On ACCOuNT, the dictionary page contains an interactive representation of the - Gen3 data model in two forms, “graph model” and “table view”, which is described below. -
-- Note: the Gen3 Data Dictionary encompasses all study metadata displayed - in the Gen3 ACCOuNT portal. Not all studies will contain data associated with - the different nodes and properties available. -
-
- Node types, categories, and relationships between nodes are displayed in a - hierarchical view. Links and node categories are highlighted in the legend - (top right-hand side). Click on nodes to see properties and download available - templates (“JSON”, “TSV”). Use the dictionary search bar (top left-hand side) to - search through property names and descriptions. For more information on the Gen3 - data model see https://gen3.org/resources/user/dictionary/. -
- - -