-
Notifications
You must be signed in to change notification settings - Fork 112
Individual Phenotype Representation #254
Comments
Thanks for pointing this out. We'll have to look into this, but on a first glance e.g. the AssociationType is way too rigid/overspecified. Also, we're getting rid of ENUMS if possible ...
|
The enum is an attempt to get a data structure that 'subclasses'. The association represent multiple possible associations, but rather then put each of the into different data structures (like a VariantPhenotypeAssocation and a SamplePhenotypeAssociation structure) a single 'super' structure is used. The enumeration is a quick way for the user to determine the association type, rather then having to scan each possible permutation of non-null connectors. |
Samples don't have to be children of individuals; they can be pooled, environmental ... There is a larger scope, not only specific to human disease etc. But my comment is more about the technical aspect: An enum is a very rigid structure, which only can be modified with schema updates. We just gut rid of the one for GeneticSex (not solving it, though ...). I'm actually quite positive to have it now on the level to discuss & modify this here ;-) |
Thank you for your comments. One of the questions about this data structure is how much should we try to 'protect' users from creating bad data. As it stands, they could fill in the wrong optional fields for the declared type. So there are three ways to go:
I'm curious about the thoughts about this from the GA4GH community. |
I am in favor of controlled vocabulary, since it will makes searches more robust. This will allow for the possibility of find something similar to your phenotype search. The only thing to maybe take into account - that I've seen in the past - is where there was a huge amount of choices under some categories that caused people to select the most general, which made later integrative searches of comparing datasets almost impossible. Most folks prefer to keep filtering their searches with new searches on the search result, rather than writing one long, detailed search. So keeping these categories as clean as possible will make them more user- and scientist-friendly. |
I would like to question the way phenotypes are currently embedded in the 'Individual' structure ( https://github.com/ga4gh/schemas/blob/master/src/main/resources/avro/metadata.avdl#L96 ).
The comments question 'Is this the right representation?', and I would point out that the Genotype2Phenotype group is currently working to create the 'Association' data structure ( https://github.com/kellrott/schemas/blob/g2p/src/main/resources/avro/genotypephenotype.avdl#L97 )
Under this schema, the phenotype would be linked to the Individual via an 'Association', which would provide the opportunity to provide evidence for the association. The same association data structure can also be used to link samples, phenotypes and genomic features to phenotypes.
The text was updated successfully, but these errors were encountered: