Data sharing efforts and readily available computing resources are making bioinformatics over the Web possible. In the past, siloed data stores and obscure file formats made it difficult to synthesize and reproduce results between institutions. Here we present two biomedicine APIs, currently under development, and provide example usage. Some familiarity with python is expected.
Get started!
pip install -r requirements.txt
python hello_ga4gh.py
Building on the existing ExAC application we opened up direct data access through straight forward web services. These services enable a user to integrate ExAC services into their own tools, querying the variant information and returning the data in an easy to programmatically use JSON format. https://github.com/hms-dbmi/exac_browser
The Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a wide variety of large-scale sequencing projects, and to make summary data available for the wider scientific community.
The data set provided on this website spans 60706 unrelated individuals sequenced as part of various disease-specific and population genetic studies. http://exac.broadinstitute.org
The REST API for ExAC has been developed as part of Harvard’s Patient-centered Information Commons: Standardized Unification of Research Elements (PIC-SURE http://www.pic-sure.org/software).
GA4GH aims to standardize how bioinformatics data are shared over the web. A reference server with a subset of publicly available test data from 1000 genomes has been made available for these examples.
The GA4GH reference server hosts bioinformatics data using an HTTP API. These data are backed by BAM and VCF files. For these examples we will only be accessing a GA4GH server, but it is open source and eager individuals can create their own server instance using these instructions.
HTTP APIs allow web browsers and command line clients to use the same communication layer to transmit data to a server. A client can GET
a resource from a server, POST
a resource on a server, or DELETE
amongst other things.
The documents that servers and clients pass back and forth are often in JavaScript Object Notation (JSON), which can flexibly describe complex data structures. For example, a variant in GA4GH is returned as a document with the form:
{
"alternateBases": ["T"],
"calls": [],
"created": 1455236057000,
"end": 4530,
"id": "YnJjYTE6MWtnUGhhc2UzOnJlZl9icmNhMTo0NTI5OjllNjRkMDIzOTc5NzQ3M2MyNjk2NzFiNzczMjg1MWNj",
"info": {},
"referenceBases": "C",
"referenceName": "ref_brca1",
"start": 4529,
"updated": 1455236057000,
"variantSetId": "YnJjYTE6MWtnUGhhc2Uz"
}
JSON uses strings as keys for values that could be strings, numbers, or arrays and maps of more complex objects.
Please view each file's source for more details on usage.
Access a GA4GH reference server hosting bioinformatics data and see the basics of building a query.
Access an API hosting population genomics data and a query service for finding variants in a gene.
A demonstration of using the IRCT PIC-SURE API for NHANES data. It gathers patient blood pressure data by demographic.
Access a GA4GH reference server using a (provided) client, making some operations easier.
Get data from a remote web service and visualize it using matplotlib.
TODO
TODO
Use data from two web services to produce synthetic results.
Make the results of combining two APIs available as its own web service.
Visual demonstration of the variant calls matrix represented by a set of GA4GH variants.
Displays GA4GH sequence annotations as a force directed graph.