Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Reactome Identifiers are not resolving #91

Open
ukemi opened this issue Apr 15, 2022 · 11 comments
Open

Some Reactome Identifiers are not resolving #91

ukemi opened this issue Apr 15, 2022 · 11 comments
Assignees
Labels

Comments

@ukemi
Copy link

ukemi commented Apr 15, 2022

I noticed this morning that in the Reactome models, some identifiers are being displayed as IRIs instead of strings. This is a new issue and I assume that it was introduced with last night's NEO update??????
For example: In model http://noctua.geneontology.org/editor/graph/gomodel:R-HSA-196741
The input for 'Endosomal GIF:Cbl translocates to lysosome' is obo:go/extensions/reacto.owl#REACTO_R-HSA-3000295'.
I am certain that this used to have a label. This is also a pathway that has been substantially modifed in Reactome with the new release. Is it possible that the update of NEO took information about entities that have changed recently in Reactome, while the rest of the model wasn't updated and this is causing problems?
If this is the case, then it points to us needing an SOP for large-scale data changes to models imported from external resources into the GOC framework. Perhaps these kinds of changes should be coordinated with a complete refresh of the import data.
ping @deustp01

@deustp01
Copy link

This is also a pathway that has been substantially modifed in Reactome with the new release.

And this very instance no longer appears in the latest version of the pathway annotated in Reactome. Could an early-2022 change in a Reactome instance propagate back into a 2021 GO-CAM model?

@ukemi
Copy link
Author

ukemi commented Apr 15, 2022

Yes. I suspect that REO was updated to reflect the latest Reactome data as part of the NEO rebuild, but the models weren't updated. The result was not finding the entities in REO and reverting to the IRIs in the existing out-of-date models.

@vanaukenk
Copy link

Not sure how this relates to the work that happened here: #82
but we need to understand better how NEO and REACTO interact and how this happened and how soon we can get some feedback or report on missing entities.

@balhoff
Copy link
Member

balhoff commented Apr 18, 2022

Yes. I suspect that REO was updated to reflect the latest Reactome data as part of the NEO rebuild, but the models weren't updated. The result was not finding the entities in REO and reverting to the IRIs in the existing out-of-date models.

@ukemi I think you're right, and that REACTO is built from the current Reactome data each time, so that if an identifier is removed from Reactome it will be removed from REACTO, rather than kept and obsoleted. Should we add something to the REACTO build that retains any missing IDs?

@deustp01
Copy link

Should we add something to the REACTO build that retains any missing IDs?

At the level of grand strategy, I suspect the answer is "no" - we should be dropping old sets of Reactome-derived GO-CAMs and reloading new sets regularly (e.g., every 3 months in synchrony with new Reactome releases), and the function of a checking tool would be to flag any discrepancies and report them back to Reactome ot be fixed there, not patched on the fly in the Reactome-derived GO-CAMs. Anyway that's how I understood our discussions.

@kltm
Copy link
Member

kltm commented Apr 18, 2022

Okay, so the way this seems to currently stand, the issue is "no label" and the fix is either

  • initiate synchronized updates through ontology build and model refresh at https://github.com/geneontology/reactome-go-cams
  • avoid updating reactome models/identifiers in the future (I'm not quite sure what the mechanics of this are)
  • avoid "destroying" identifiers in the future

Practically speaking though, right now, I'm not seeing an action here to be taken as part of this project. While users coming in to view the model might be a little confused as to the lack of a label(there do not see to me too many of those at the moment), the data is "correct" as it currently stands? I'm not sure what the implications are for identifier destruction for us--I usually assume that doesn't happen. I've added this to the agenda for this week's technical call.

@ukemi
Copy link
Author

ukemi commented Apr 19, 2022

I think that @balhoff brings up an interesting point here. We are creating an ontology from something that is not an ontology. Good practice dictates, I think, that classes never just go missing. They should be obsoleted. I suspect this will extend beyond Reactome entities to other gene and protein objects as well. Since the entities used to build the ontology are all imported from either GPIs or in this case the Reactome BioPax, do we want to take the job on at the NEO end to 'obsolete' a class if it is no longer present in an import? What if they come back in a future load? Can we resurrect them?

@deustp01
Copy link

deustp01 commented Apr 19, 2022

@ukemi @balhoff Despite what I said above about obsolete instances simply disappearing, within the Reactome data structure we track obsoletions of instances of the event and entity classes, so when one is obsoleted a "deleted" record is created to record the fact of deletion, a one-word reason (obsoleted, merged, replaced, ...) and where appropriate the dbID of the replacement instance. I don't know how much of this information gets into the BioPAX export, but that would be something to investigate.

But the whole list of every instance whose deletion has been annotated in this way is visible here. For each instance, its "(deletedInstance)" attribute points to its replacement, if any. This list has gaps where deletions and obsoletions were done without proper annotation. Current practice is better.

@vanaukenk
Copy link

What about extending the GPI2.0 file format to capture things like gene model merges, e.g. a new column 'replaced by' or 'merged into'?
We would have that information in WB, as I suspect other groups do as well, so in theory that could be included in the GPI2.0 file and used to update models, if desired.

@nataled
Copy link

nataled commented Apr 20, 2022

Can PRO help here? I'm wondering how many of the Reactome identifiers used in the current set of GO-CAMs are already represented in PRO. Can someone send a list of these? I'll return that list with a mapping to PRO so we can get a handle on where we're at.

@deustp01
Copy link

@nataled Right now, probably not, because the problem appears to be that some recent edits in Reactome instances put them out of synch with the June 2021 versions of those instances that are in the GO-CAM models, and that disconnect is messing things up. Once we get frequent re-builds of the GO-CAMs and with PRO IDs in use, opportunities for this kind of disconnect should mostly be eliminated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants