-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor vector generation (lazily?) #1014
Comments
Initial analysis on de-duplicating compounds has been done and can be found on the In short, the compound de-duplication is going to be a bit more complex than expected. viewer_molecule table: no particular problems with updating the compound IDs, but Django code will need to be investigated to ensure that compound IDs are unique viewer_computedmolecule table: no particular problems with updating the compound IDs, but the computed set upload functionality needs to be investigated to ensure that compound IDs are unique viewer_compound_project_id table: this is a join table and it should be relatively easy to de-duplicate, but Django code will need to be investigated to ensure that new rows use unique compound ID hypothesis_vector table: this defines the fragment network vectors and is the most complex to address. De-duplicating seems possible, but the way this data is generated also needs to be investigated. This table is also referenced by the hypothesis_vector3d table that will also need de-duplicating. scoring_cmpdchoice, viewer_activitypoint, viewer_compound_inspirations, viewer_designset_compounds tables: currently these contain no data. Need to investigate whether any Django code needs updating should data come into existence (or do we drop these tables completely if they are not used and just add to the confusion). |
@tdudgeon points to multiple different things:
Scope out later...? |
@tdudgeon updates:
What may need updating is:
Also, four tables with "scoring" in name need to be checked out. |
Scope has moved from IDs - they don't need curation, now that we (@phraenquex ) is happy to have them duplicate. The implication is that the vectors are probably / definitely broken, in two ways:
Solution is (probably) to generate them lazily - is that feasible? Action:
|
The
viewer_compound
table needs to be de-duplicated.This also needs the table referring to this table to be updated accordingly. This will need a series of SQL statements to be generated. The only change to Django should be add a unique constraint to the SMILES.
The text was updated successfully, but these errors were encountered: