Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some entities are not available in the graph editor #810

Closed
ukemi opened this issue Dec 8, 2022 · 24 comments
Closed

Some entities are not available in the graph editor #810

ukemi opened this issue Dec 8, 2022 · 24 comments

Comments

@ukemi
Copy link

ukemi commented Dec 8, 2022

Using the 'add individual' functionality of the graph editor, is appears that some entities are not available. For example, try to autocomplete on 'gluconeogenesis' or the mouse (EMAPA) liver.

@kltm
Copy link
Member

kltm commented Dec 8, 2022

Working single example: gluconeogenesis
Exists in NEO: http://noctua-amigo.berkeleybop.org/amigo/term/GO:0006094

@ukemi
Copy link
Author

ukemi commented Dec 8, 2022

Maybe I should add this additional bit of weirdness. If I use the other entry portals like "add annotaton', I can get these to autocomplete.

@kltm
Copy link
Member

kltm commented Dec 8, 2022

So, I suspect, it's getting filtered by the CHEBI:33695 and GO:0032991 filter?

@ukemi
Copy link
Author

ukemi commented Dec 8, 2022

Hmmmm? Not sure how to interpret this query. Since I am able to add anything 'within reason', shouldn't it only be limited by a valid ID? It always used to work for everything.

@kltm
Copy link
Member

kltm commented Dec 8, 2022

Basically, it's saying to return anything that string matches as long as it's in the regulates closure of CHEBI:33695 OR regulates closure of GO:0032991. Playing around with the query, the issue seems to be that the term is not in the regulates closure of "GO:0032991".

@ukemi
Copy link
Author

ukemi commented Dec 8, 2022

So has the query changed, or has the ontology changed? It's not clear to me why a continuant would be in the regulates closure. But I can get UBERON terms to autocomplete, so they must be there?????????

@kltm
Copy link
Member

kltm commented Dec 8, 2022

There has been no change in the software for this and the filters have been there for years (?), so it wouldn't be at that end. As well, gluconeogenesis is in the noctua-amigo instance just fine, so no worries there. It sounds like it might be under an ontology change of some kind then if you were expecting it? We can maybe loop in Jim if you want to explore that a little?

@ukemi
Copy link
Author

ukemi commented Dec 8, 2022

Sounds like it has to be on the ontology end then. Maybe we need to loop him in.

@kltm
Copy link
Member

kltm commented Dec 8, 2022

@balhoff We were wondering if you might have any thoughts on this thread?

@kltm
Copy link
Member

kltm commented Dec 8, 2022

Although, why would one expect a biological process to be in the closure of protein containing complex or chebi entity (http://noctua-amigo.berkeleybop.org/amigo/term/GO:0032991)?

@ukemi
Copy link
Author

ukemi commented Dec 8, 2022

I looked at that and was puzzled as well. THat's why I asked if the query had changed. Uberon entities certainly don't fit but they are showing up.

@balhoff
Copy link
Member

balhoff commented Dec 8, 2022

That query is not at all what I always assumed was being searched. It is weird that you can autocomplete things like Uberon liver—it doesn't regulate anything.

@ukemi
Copy link
Author

ukemi commented Dec 8, 2022

Yep. ?????????????

@kltm
Copy link
Member

kltm commented Dec 8, 2022

Talked to @balhoff, I think we're good on the uperbon liver.
@kltm to trace back the filters on the "add individual" entry

@kltm
Copy link
Member

kltm commented Dec 8, 2022

Okay, walking through what's going on more carefully, @cmungall 's instincts were correct and there is something else going on here. (The filters were a red herring--I likely pulled that from the wrong entry while trying to debug.)
As an initial check, I'm rerunning the NEO load to see if there was an issue with the index generation, which seems to be mostly likely.

@vanaukenk The long story is that the "uber noodle" (the add individual free-for-all input) has historically worked off of a different document type than the other entries--basically a general index that takes the information from /anything/ and any field that is fed into the system. For some reason, a mere 364000 entities got loaded, instead of the expected 1975019. That round number sounds an awful lot like the push break in the loader. No matter what, I want to rerun and track down how a load did not complete and did not throw an error; this is a system issue that needs to be traced.

To answer a second question: why use this "general" field instead of the usual "ontology_class" field? I believe the original reasoning was that it is a better field for overall searching as it creates a special search packet with all sorts of things like identifier snippets (NS:123 and '123') which are not in some of the more structured search documents. We could switch over to the "ontology_class" field for this search, but search ability would slightly degrade.

Either way, first step is to track the apparent loader issue a little more and replace the current index with more functional one.

@kltm
Copy link
Member

kltm commented Dec 9, 2022

General ontology load self-reports that it loaded everything:
[2022-12-07T22:07:03.597Z] 2022-12-07 22:07:03,419 INFO (OntologyGeneralSolrDocumentLoader:55) Doing clean-up (final) commit at 1974266 general ontology documents...
Optimization was completed and solr reported as containing 3949301 documents total, which is what we'd expect (basically 2x). It seems like the index was build properly...
The machine itself has no disk space issues.

I'm continuing the rerun to make sure we're starting clean.

Also, looking at my notes, last week we had two NEO build failures and a "hiccup" when I successfully ran and deployed on Sunday. My current guess is that the issue is at a "devops" level, rather than a construction level (although that does not explain the round number).

@kltm
Copy link
Member

kltm commented Dec 9, 2022

This most recent run, 256000 entities got loaded. I'm not sure why the load seems to drop off part of the way through, but I'm pretty sure this is where the problem exists now.
Noting that this runs are slightly shorter than ones that may have been "better" a few weeks ago. Also noting that we have no rollback mechanism (geneontology/neo#21).

@kltm
Copy link
Member

kltm commented Dec 9, 2022

Using luke and examining the index directly, amigo is reporting correctly and only 256000 general terms were added, out of nearly 2m. Rebuilding again and am diving into the log looking for trouble.

@kltm
Copy link
Member

kltm commented Dec 9, 2022

Error point found:

[2022-12-09T22:40:55.129Z] 2022-12-09 22:40:54,831 INFO  (OntologyGeneralSolrDoc
umentLoader:48) Processed 1000 general ontology docs at 257000 and committing...
[2022-12-09T22:42:02.631Z] Exception in thread "main" org.apache.solr.client.sol
rj.SolrServerException: java.net.SocketException: Broken pipe (Write failed)
[2022-12-09T22:42:02.631Z]      at org.apache.solr.client.solrj.impl.CommonsHttp
SolrServer.request(CommonsHttpSolrServer.java:475)
[2022-12-09T22:42:02.631Z]      at org.apache.solr.client.solrj.impl.CommonsHttp
SolrServer.request(CommonsHttpSolrServer.java:249)

The issue seems to be that owltools does not sufficiently crash out from this error for Jenkins to pick up.
I'm not sure if this is a new problem or if this has been quietly happening in the background for some time.

@kltm
Copy link
Member

kltm commented Dec 11, 2022

@ukemi @vanaukenk I've done a couple more loads and finally got one that completed: 1974291 entities as expected. With that, assuming things are now working as expected at your end, I'd vote to close this ticket, with anything else (e.g. widget filter redo) going into a new ticket. I've created a new ticket in data QC to try and make sure this becomes automatically checked (geneontology/pipeline#309); will be doing a manual check until that's cleared.

@pgaudet
Copy link

pgaudet commented Dec 12, 2022

This looks like it's the same issue as geneontology/neo#111

@ukemi
Copy link
Author

ukemi commented Dec 12, 2022

The EMAPA terms are back.

@kltm
Copy link
Member

kltm commented Dec 12, 2022

@pgaudet I believe it's likely a different issue.

@kltm kltm closed this as completed Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done - on production
Development

No branches or pull requests

4 participants