Skip to content

Commit

Permalink
Describe datasets
Browse files Browse the repository at this point in the history
  • Loading branch information
dhimmel committed Feb 4, 2016
1 parent 54eea52 commit cbf170c
Showing 1 changed file with 15 additions and 4 deletions.
19 changes: 15 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,28 @@
# User-friendly extensions to the Disease Ontology

Code and data for the [Disease Ontology](http://disease-ontology.org/) (DO) [[1](https://doi.org/10.1093/nar/gkr972)].
This repository creates user-friendly extensions to the [Disease Ontology](http://disease-ontology.org "Disease Ontology Homepage") (DO) [[1](https://doi.org/10.1093/nar/gkr972 "Disease Ontology: a backbone for disease semantic integration")]. Simple TSV files are extracted from the OBO-formatted ontology including datasets for term names, cross-references, and subsumption relationships. Additionally, a slim term set is extracted, which we use for our [drug repurposing research](https://doi.org/10.15363/thinklab.4, "Thinklab · Repurposing drugs on a hetnet").

## Notebooks

[`DO-xrefs.ipynb`](DO-xrefs.ipynb) extracts cross-references from `download/HumanDO.obo` and produces easy-to-read mappings files. `data/xref-prop.tsv` contains propagated cross-references, so that for example xrefs to *relapsing remitting multiple sclerosis* would be transmitted to *multiple sclerosis*.

[`slim.ipynb`](slim.ipynb) reads [DO Slim](https://doi.org/10.15363/thinklab.d44#144 "Creating a slim DO") terms and generates slim-specific datasets.

## Directories

`IGS_scripts` contains the [scripts](https://github.com/IGS/disease-ontology/tree/master/scripts) from the `IGS/disease-ontology` [repo](https://github.com/IGS/disease-ontology). These scripts were converted into python 3 and a few conversion errors were manually fixed.

[`download`](download) contains a subversion checkout of the master DO.

[`data`](data) contains files created by us.
[`data`](data) contains created datasets which include:

See our project on ThinkLab for more information:
http://thinklab.com/p/rephetio
+ [`term-names.tsv`](data/term-names.tsv) — names including synonyms for DO terms
+ [`xrefs.tsv`](data/xrefs.tsv) — cross-references to external disease vocabularies
+ [`xrefs-prop.tsv`](data/xrefs-prop.tsv) — cross-references where diseases inherit all cross-references of the diseases they subsume
+ [`slim-terms.tsv`](data/slim-terms.tsv) — a ([semi-manually created](http://doi.org/10.15363/thinklab.d44#144 "Creating a slim DO")) slim term set referred to as DO Slim
+ [`slim-terms-prop.tsv`](data/slim-terms-prop.tsv) — all subsume relationships for DO Slim
+ [`xrefs-slim.tsv`](data/xrefs-slim.tsv) — cross-references to external disease vocabularies for slim terms
+ [`xrefs-prop-slim.tsv`](data/xrefs-prop-slim.tsv) — cross-references for slim terms where diseases inherit all cross-references of the diseases they subsume.

## License

Expand Down

0 comments on commit cbf170c

Please sign in to comment.