-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WP4 - API to upload target dataset - followup actions #21
Comments
Two initial requests for changes from discussions on the upload target set functionality:
|
When the running the data loader to load new targets, the inspirations seem to disappear. This is likely to be because of references to primary keys being regenerated in the upload. |
Rachael has successfully tested the load. There is some remaining work and this can be tracked under the follow-up actions. Agreed roadmap for the remaining target upload tasks following meeting on 17/12/2020:
The remaining followup tasks on the loader to be tracked by this task are as follows:
ProblemWhen a target set is reloaded, links to existing compound sets are broken Initial prognosis is as follows: Fragalysis backend repo: In tasks.process_design_compound: The inspirations field in the compound model links to a manytomany field to the molecules model/table. Likely solution: When a target set is uploaded, examine where molecules are removed/added and make sure that the many to many field is retained. Place to start looking: targate_set_upload.analyse_mols for mol_id in ids: Is the manytomany field correct after the reload? otherwise it needs to be saved/replaced. |
AnalysisWhen the proteins are loaded, existing proteins with alternate names are actually deleted and recreated rather than updated. This changes the id and breaks links. This has been confirmed by running the Mpro upload multiple times. The number of proteins stays the same, but the auto-incremented id increases each time by 295. The problem is caused by the update to Protein.code that is made when the Protein has an alternate name. The processing is as follows (all in target_set_upload.py - but also existing in the current loader):
This produces difference results depending on whether the folder name has "_0" in it or not:
Our first attempt at a fix failed because we tried to just use the part up to the colon, but that only works if the whole of the folder is in the key, not for the ones where the '_0' is stripped off, which is the normal situation. Solution:One possible solution is to:
But at the moment the remove_not_added function would fail. I can probably fix this by doing the same thing in the remove_not_added function (or make a list of the keys I've matched and get rid of all the others) Questions:
|
Discussed with Frank: The current loader also needs to be fixed. Will raise/fix an issue on the fragalysis-loader repo. |
The data upload problem is solved as per my previous message. |
I also needed to change the compound set uploader so that when it checked protein.code, it checked up to ":" rather than "_" so it would comply with the new names. |
This is a placeholder for follow-up actions to implement the data loader API now that the first version (Minimum Viable Product) has been merged.
The text was updated successfully, but these errors were encountered: