Skip to content
This repository has been archived by the owner on Oct 28, 2022. It is now read-only.

Coordinate between MatchMaker Patient and GA Individual? #234

Closed
buske opened this issue Jan 25, 2015 · 15 comments
Closed

Coordinate between MatchMaker Patient and GA Individual? #234

buske opened this issue Jan 25, 2015 · 15 comments

Comments

@buske
Copy link
Member

buske commented Jan 25, 2015

Dear Metadata Task Team,

I wanted to extend an offer to coordinate between your representation of an Individual and our representation of a Patient:
https://github.com/MatchMakerExchange/mme-apis/blob/refactor/search-api.md#example

We're hoping to finalize the first major version of our API very soon, so now would be the best time to make any adjustments.

Here's a summary of the differences I spotted:

  • Currently, we have contact information within the patient object, but I'm not sure if you have this information represented anywhere. We could potentially move it outside of the patient object into a metadata object alongside the patient.
  • Our equivalents of the phenotypes and diseases fields use objects with an id field instead of just a list of terms to allow metadata (e.g. ageOfOnset, presence) to be easily added.
  • We also have fields for candidate genes, ageOfOnset, inheritanceMode
@mbaudis
Copy link
Member

mbaudis commented Jan 26, 2015

Dear Orion,

thanks for getting back to us with this. Some comments (which may or may not represent the consensus opinion of the MTT):

  • Is there a strong reason to use "patient"? This term implies a treated medical condition; it would be not necessarily the appropriate representation for genetic variations/syndromes (implies a "disease"; and individuals with this genofixed variation then are lifelong "patients"). Semantics, but I've learned to be careful here.
  • "contact": regarding metadata compatibility, as an additional object this wouldn't represent a problem. However, it is a bit ambiguous (being inside the "patient" object) an should be labeled as "institutional" or such.
  • "disorders": While I actually prefer the name over our current "diseases", the structure of the single attributes looks insufficient. I would strongly suggest to use richer objects which specify ontology source, version, accession ... This would also allow for dropping in additional items/attributes. In principle, you could only use self referencing IDs like in your example, e.g.
"disorders" : [
  {
      "id" : "MIM:######"
  },
  {
      "id" : "Orphanet:#####"
  },
  …
],

There is an example in the metadata-team area https://github.com/ga4gh/metadata-team/blob/master/usecases/diseasesStages.json , and a simplification in a new PR:
https://github.com/mbaudis/metadata-team/commit/4255326cd8ff60ea7236889f497ee6da390a52bd

  • "features" => "phenotypes"? Also, better specify e.g. "ontologySource": "HPO", "ontologyCode": ...
  • In your current object hierarchy, you have "ageOfOnset" and "inheritanceMode" as top level attributes, outside specific "disorders" or "features" objects but then, appropriately, again "ageOfOnset" as part of a "features" object).
  • I'm not sure what the "genes" object is for, without containing other information? Either more there, or having the gene ID(s) part of the variants? But that is nothing we have discussed so far on the MTT side ...

Especially regarding the "disorders" annotation, I would be very glad if we could coordinate this. We are about to refactor things, and having this done in an application-ready way with Matchmaker as use case would be great!

Best,

Michael.

@mbaudis mbaudis closed this as completed Jan 26, 2015
@mbaudis mbaudis reopened this Jan 26, 2015
@buske
Copy link
Member Author

buske commented Jan 29, 2015

Dear Michael,

Thank you very much for your feedback. I'll submit a pull request to the MME to see if we can get some of these suggestions integrated into the upcoming version.

The term "patient" is not necessarily the best one, but it is used frequently in our discourse, since at the end of the day we're almost always trying to match patients.

I agree that "contact" is ambiguous; I'll try to get that renamed along with "features".

I completely agree that we should have a more complete representation of each ontology term, but we had settled into using prefixed IDs everywhere. We should probably break out of this and start using "ontologySource"/"id" objects (with optional "name" and "version"). Do you think that would be adequate?

I'm sorry if this is a naive question, but I'm a little confused about your use of "ontologySource", "id", and "ontologyCode" (which you just mentioned). In your examples, "ontologySource" is sometimes a descriptive name ("Ontology for Biomedical Investigation"), sometimes a code ("WHO:ICD"), and sometimes a purl ("http://purl.obolibrary.org/obo/NCBITaxon_9606"). Similarly, the "id" is sometimes an ID within that ontologySource ("C18.5"), sometimes a purl ("http://purl.obolibrary.org/obo/OBI_0001271"), and sometimes a prefixed ID ("NCBITaxon:9606"). Just trying to decide what we should do. Thus far, we've been using prefixed IDs for everything, so I'm tempted to keep doing that. Similarly, what's the difference between "name", "text", and "ontologyText"? They seem to be used interchangeably. I'd like to try to synchronize our efforts with the GA4GH, but it looks like the discussion of this stalled a while ago: #165

The "genes" object is intended to store candidate genes, a compromise from our earlier attempt at storing all the varying levels of genotypic information in a single object. I'm still not sure what's best, but the idea is that occasionally all that can be specified is that a certain gene is a likely cause. In other cases, you might want to specify the causal variant, or at least that the causal variant is a stopgain mutation in a particular gene. We wanted to be able to capture this variable level of specificity, but we're still working on the best way to do this. We separated the candidate genes from the precise variants to make it easier to integrate with the GA4GH variant API eventually, but that might not have been the best approach. We should really get the Variant Annotation Team's input on this.

The placement of ageOfOnset and inheritanceMode has been the subject of quite a bit of discussion. The idea is that ageOfOnset is the overall age of onset of the patient's symptoms, though we allow specifying it per-term as well for additional resolution. We could use inheritanceMode in a similar fashion, and we have discussed potentially adding it per-variant, but this seemed like overkill so we left it at the root.

As one suggestion for the metadata team, we've found it useful to think about phenotypes that explicitly aren't present, and so have a separate true/false/na field for representing this.

Best,
Orion

@buske
Copy link
Member Author

buske commented Jan 29, 2015

Also, if we wanted to synchronize on our disorder representations, it sounds like just splitting the "id" field (e.g. "id":"MIM:123456") out into "ontologySource" and "id" fields (e.g. "ontologySource":"MIM", "id":"123456") would be the easiest way to unify our representations. Is that correct?

@diekhans
Copy link
Contributor

There is an OntologyTerm record in metadata.avdl. It would be
good to converge on a single way to reference ontologies.

Orion Buske notifications@github.com writes:

Also, if we wanted to synchronize on our disorder representations, it sounds
like just splitting the "id" field (e.g. "id":"MIM:123456") out into
"ontologySource" and "id" fields (e.g. "ontologySource":"MIM", "id":"123456")
would be the easiest way to unify our representations. Is that correct?


Reply to this email directly or view it on GitHub.*

@buske
Copy link
Member Author

buske commented Jan 29, 2015

@diekhans Agreed, but it looked like the metadata representation for these is in flux: #165. The examples are inconsistent and confuse me, though that might be because of how flexible the fields are. We've been using prefixed IDs since the beginning, so I'm a bit hesitant to propose drastic changes in this version.

Also, just realized that it seemed like 'MIM' was the preferred ontology prefix for OMIM, but 'OMIM' is used by purl. Seems like we should switch? @cmungall?

@diekhans
Copy link
Contributor

Oh, yes, don't derail anything making forward progress to
converge on something that is behind. A comment about needing
to converge would be appreciated.

Orion Buske notifications@github.com writes:

@diekhans Agreed, but it looked like the metadata representation for these is
in flux: #165. The examples are inconsistent and confuse me, though that might
be because of how flexible the fields are. We've been using prefixed IDs since
the beginning, so I'm a bit hesitant to propose drastic changes in this
version.

Also, just realized that it seemed like 'MIM' was the preferred ontology prefix
for OMIM, but 'OMIM' is used by purl. Seems like we should switch? @cmungall?


Reply to this email directly or view it on GitHub.*

@fschiettecatte
Copy link

It should be MIM.

@buske
Copy link
Member Author

buske commented Jan 29, 2015

@diekhans :) Okay, I commented on that PR asking for an update. It seems like things are already compatible at a very rudimentary level since we're using prefixed IDs in an "id" field, and we'll see if we can synchronize a bit more before the next version

@fschiettecatte 10-4

@Relequestual
Copy link
Member

@diekhans I agree we should converge on a single way to reference ontologies! I see currently that an ontology version field is present in the linked meta-data commit. Is that a manditory field? It's not especially useful in the case of HPO where the version is the date of the last change (There's no long term release / stable release).

@mbaudis
Copy link
Member

mbaudis commented Jan 29, 2015

No; the only required attribute would be id, which could be rich. All others are in the form of

 union { null, string } version = null;

So the only point to adhere to is really using "id". And regarding the probable heterogeneity of use cases, there is no way to make this specific to a certain format.
The current draft (which may change substantially ...) specifies quite a number of possible attributes for ontologies - which themselves are used for "phenotypes", "disorders", "species" ...

record OntologyTerm {

  /**
  The ID defined by the external onotology source.
  (e.g. `http://purl.obolibrary.org/obo/OBI_0001271`)
  */
  string id;

  /**
  The source of the onotology term. This is optional if using self-referencing IDs.
  (e.g. `Ontology for Biomedical Investigation`)
  */
  union { null, string } source = null;

  /**
  The name of the onotology term. (e.g. `RNA-seq assay`)
  */
  union { null, string } name = null;

  /**
  The version of the onotology (e.g. `ICD10`)
  */
  union { null, string } version = null;

  /**
  Date the observation was made/assigned (e.g. date of diagnosis, observation of phenotype...).
  Suitable e.g. for health related purposes, epidemiology, experimental setups (time series) ...
  TODO: Format for this?
  */
  union { null, string } dateTimeObserved = null;

  /**
  Age at time of the observation (e.g. diagnosis, observation of phenotype...).
  This is highly relevant in the human context and usually the primary available time related 
  parameter available.
  */
  union { null, string } ageObserved = null;

}

@mbaudis
Copy link
Member

mbaudis commented Jan 29, 2015

See also thread on #165 (comment)

@buske
Copy link
Member Author

buske commented Jan 29, 2015

After discussion, it seems that phenotypes may not be an appropriate term, since the singular phenotype is formally used to refer to the collection (ga4gh/mme-apis#68 (comment)).

@mbaudis
Copy link
Member

mbaudis commented Jan 30, 2015

Is there a general agreement about how to name lists (plural vs. singular)? "phenotype(s)" clearly needs list context, since in common use a phenotype => interpreted/codified observation (and not the individual's overall manifestation).

@Relequestual
Copy link
Member

It looks like in the linked MME issue comments, that the agreement is to stick with features for now and address it later down the line.

@buske
Copy link
Member Author

buske commented Feb 18, 2015

Closing due to lack of activity. Thank you all for your input!

@buske buske closed this as completed Feb 18, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants