Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what is the appropriate Biolink category for Sequence Ontology concepts? #706

Closed
bill-baumgartner opened this issue Apr 1, 2021 · 4 comments
Labels
kp/textmining text mining and explaining question

Comments

@bill-baumgartner
Copy link

Question:
We have a text-mining pipeline that extracts mentions of Sequence Ontology concepts from text. The KGX format to which we serialize requires that a Biolink category accompany each SO id. What Biolink concept do you recommend be used as the category for SO concepts?

Is the biolink:GenomicEntity class appropriate for this use-case? Or should a more abstract SequenceFeature class be used? Or maybe something else?

Thanks!

@nlharris nlharris added the kp/textmining text mining and explaining label Apr 1, 2021
@sierra-moxon
Copy link
Member

I think we should add the SO concepts needed for your KP - yep, biolink:GenomicEntity is meant to encompass "SequenceFeature." Do you need more granular objects?

@bill-baumgartner
Copy link
Author

Thanks Sierra. I was looking for a catch-all for any Sequence Ontology concept because any concept in the ontology could be recognized in text potentially. It looks like there are four roots to the Sequence Ontology. If we can map each of the roots to an appropriate Biolink concept then I think that should work.

It seems like biolink:GenomicEntity covers SO:sequence_collection (e.g. genome), SO:sequence_feature, and SO:sequence_variant.

Do you agree with that statement, and do you think it's appropriate to also bin SO:sequence_attribute under biolink:GenomicEntity or should that map do a different biolink class?

@mikebada
Copy link
Collaborator

mikebada commented Apr 2, 2021

I'd say that biolink:GenomicEntity covers most but not all of SO:sequence_collection, SO:sequence_feature, and SO:sequence_variant. Things that I don't think quite fit include sequence assembly concepts, junctions, and sequence constructs such as vectors and primers. I don't think SO:sequence_attribute fits at all into biolink:GenomicEntity. However, I think what we're going to export in associations for Translator are probably limited to genomic entities, but let's talk about it.

@bill-baumgartner
Copy link
Author

Thanks all for the answers. I think I'm good to go for now. Will followup with @mikebada for things that shouldn't be considered as biolink:GenomicEntity. Closing this question for now. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kp/textmining text mining and explaining question
Projects
None yet
Development

No branches or pull requests

4 participants