-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some Reactome Identifiers are not resolving #91
Comments
And this very instance no longer appears in the latest version of the pathway annotated in Reactome. Could an early-2022 change in a Reactome instance propagate back into a 2021 GO-CAM model? |
Yes. I suspect that REO was updated to reflect the latest Reactome data as part of the NEO rebuild, but the models weren't updated. The result was not finding the entities in REO and reverting to the IRIs in the existing out-of-date models. |
Not sure how this relates to the work that happened here: #82 |
@ukemi I think you're right, and that REACTO is built from the current Reactome data each time, so that if an identifier is removed from Reactome it will be removed from REACTO, rather than kept and obsoleted. Should we add something to the REACTO build that retains any missing IDs? |
At the level of grand strategy, I suspect the answer is "no" - we should be dropping old sets of Reactome-derived GO-CAMs and reloading new sets regularly (e.g., every 3 months in synchrony with new Reactome releases), and the function of a checking tool would be to flag any discrepancies and report them back to Reactome ot be fixed there, not patched on the fly in the Reactome-derived GO-CAMs. Anyway that's how I understood our discussions. |
Okay, so the way this seems to currently stand, the issue is "no label" and the fix is either
Practically speaking though, right now, I'm not seeing an action here to be taken as part of this project. While users coming in to view the model might be a little confused as to the lack of a label(there do not see to me too many of those at the moment), the data is "correct" as it currently stands? I'm not sure what the implications are for identifier destruction for us--I usually assume that doesn't happen. I've added this to the agenda for this week's technical call. |
I think that @balhoff brings up an interesting point here. We are creating an ontology from something that is not an ontology. Good practice dictates, I think, that classes never just go missing. They should be obsoleted. I suspect this will extend beyond Reactome entities to other gene and protein objects as well. Since the entities used to build the ontology are all imported from either GPIs or in this case the Reactome BioPax, do we want to take the job on at the NEO end to 'obsolete' a class if it is no longer present in an import? What if they come back in a future load? Can we resurrect them? |
@ukemi @balhoff Despite what I said above about obsolete instances simply disappearing, within the Reactome data structure we track obsoletions of instances of the event and entity classes, so when one is obsoleted a "deleted" record is created to record the fact of deletion, a one-word reason (obsoleted, merged, replaced, ...) and where appropriate the dbID of the replacement instance. I don't know how much of this information gets into the BioPAX export, but that would be something to investigate. But the whole list of every instance whose deletion has been annotated in this way is visible here. For each instance, its "(deletedInstance)" attribute points to its replacement, if any. This list has gaps where deletions and obsoletions were done without proper annotation. Current practice is better. |
What about extending the GPI2.0 file format to capture things like gene model merges, e.g. a new column 'replaced by' or 'merged into'? |
Can PRO help here? I'm wondering how many of the Reactome identifiers used in the current set of GO-CAMs are already represented in PRO. Can someone send a list of these? I'll return that list with a mapping to PRO so we can get a handle on where we're at. |
@nataled Right now, probably not, because the problem appears to be that some recent edits in Reactome instances put them out of synch with the June 2021 versions of those instances that are in the GO-CAM models, and that disconnect is messing things up. Once we get frequent re-builds of the GO-CAMs and with PRO IDs in use, opportunities for this kind of disconnect should mostly be eliminated. |
I noticed this morning that in the Reactome models, some identifiers are being displayed as IRIs instead of strings. This is a new issue and I assume that it was introduced with last night's NEO update??????
For example: In model http://noctua.geneontology.org/editor/graph/gomodel:R-HSA-196741
The input for 'Endosomal GIF:Cbl translocates to lysosome' is obo:go/extensions/reacto.owl#REACTO_R-HSA-3000295'.
I am certain that this used to have a label. This is also a pathway that has been substantially modifed in Reactome with the new release. Is it possible that the update of NEO took information about entities that have changed recently in Reactome, while the rest of the model wasn't updated and this is causing problems?
If this is the case, then it points to us needing an SOP for large-scale data changes to models imported from external resources into the GOC framework. Perhaps these kinds of changes should be coordinated with a complete refresh of the import data.
ping @deustp01
The text was updated successfully, but these errors were encountered: