Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align all canonical sites to first biological assembly #1369

Open
phraenquex opened this issue Mar 5, 2024 · 19 comments
Open

Align all canonical sites to first biological assembly #1369

phraenquex opened this issue Mar 5, 2024 · 19 comments
Assignees
Labels
2024-03-13 green Data dissemination 2024-06-14 mint Data dissemination 2 XChemAlign

Comments

@phraenquex
Copy link
Collaborator

phraenquex commented Mar 5, 2024

@ConorFWild

The CHIKV upload flushed out a bug in the algorithm, as discussed here in Slack:

Bug: If something binds only once to somewhere specific to a chain not part of the reference biological assembly (the one defined in assemblies.yaml), then that generates a canonical site that is not located around that reference biological assembly.

Required: before creating a new canonical site, ensure its biological assembly is aligned to the reference biological assembly.

Am I right, that this will mean writing out explitly a set of reference coordinates for every canonical site - rather than simply pointing to the original crystallographic file? (Or maybe that isn't what happens anyway.)

@ConorFWild
Copy link
Collaborator

You are misunderstanding the references I think, as this will always happen, but yes you are correct that a canonical site will not be aligned to the reference assembly, but this is because a canonical site can contain conformer sites from multiple reference assemblies, and hence there is no unique reference assembly to choose!

As such the required feature is most likely not possible with the current algorithm design?

@phraenquex phraenquex added 2023-11-02 yellow Too big for V2 and removed 2023-08-23 violet V2 full release labels Mar 7, 2024
@phraenquex
Copy link
Collaborator Author

@ConorFWild says it might be lots of work - or not. Doesn't matter for purple.

@mwinokan mwinokan added 2024-03-15 indigo Data dissemination loose ends and removed 2023-11-02 yellow Too big for V2 labels Mar 13, 2024
@phraenquex
Copy link
Collaborator Author

@ConorFWild would this ticket merge with #1227 ?

@phraenquex
Copy link
Collaborator Author

(FYI @mwinokan, who was taking notes)
@ConorFWild and @phraenquex discussed how to define equivalence, and that robust heuristics are impossible/unlikely. Frank said the answer is that the F/E must support easy curation/updating of assignments. The relevant ticket is #1389.

@ConorFWild
Copy link
Collaborator

@phraenquex So, I've more or less scoped out the extent of changes that will be required I think:

  1. "Neighbourhood"s, the model for the protein environment of atoms, will need to be updated to carry information on which assembly they came from explicitly (currently this is only carried as transforms which are hard to reliably map to xtalform assemblies)
  2. Code needs to be added to create the assembly transform hierarchy, which will define the alignment relationship between assemblies (sketch algorithm in image below under "hierarchy construction")
  3. A bunch of plumbing code to get the assembly transform hierarchy, and the associated structures, where it needs to be
  4. A mechanism for checkpointing not just neighbourhoods but entire reference structures, as these must be robust to future changes and are currently referenced only by name and not version
  5. Input -and- Output code changes to save the hierarchy, reference alignment checkpoints and new version of Neighbourhood model
  6. Changes to the alignment code to consume assembly hierarchies and neighbourhood assembly membership, and decide which to and then perform the additional alignment operations (again sketch algorithm in image under "reference alignment")
  7. Probably some additional plumbing on XCA side to move things around, generate metadata and upload

All in all I'd say it was most likely at least 2 weeks full time work, are realistically more like a month with the amount of time I actually spend on XCA

PXL_20240327_150703320

@phraenquex
Copy link
Collaborator Author

@ConorFWild this appears to address ticket #1227 too, correct?

@ConorFWild
Copy link
Collaborator

It is, in the sense this same procedure will need to be applied to artefact atoms/chains, so needs to be tracked for them too! (although that kind of comes "for free" with this change)

@phraenquex
Copy link
Collaborator Author

@ConorFWild one more thought from our chat:

It will be important that users can see both chain IDs for non-artefact atoms: the one in the original crystal structure; and the one of the corresponding reference assembly.

(This may need some further front-end/NGL work; but for now, be sure to propagate the names in the relevant yaml file.)

@ConorFWild
Copy link
Collaborator

Update - actually trying to implement this has made me realize we need to define a canonical embedding of the reference assemblies - i.e. once the reference hierarchy is defined it must then be realized i.e. global (limited to the overlapping chains) alignments must be performed to generate concrete atomic coordinates for each of the assemblies and their final relative positions, and then canonical/conformer sites and alignments are done locally to parts of this franken-assembly!

@ConorFWild
Copy link
Collaborator

Alright, the new files are:

  • hierarchy.yaml
  • biochain_priorities.yaml
  • assembly_landmarks.yaml
  • assembly_transforms.yaml
  • chain_to_assembly.yaml

Changes are on the branches:

Examples can be found here:

@tdudgeon @kaliif @phraenquex If there are no updates to the metadata necessary based on the new files, then only their presence is necessary for alignment. On the assembly_alignment branch XCA "works" with the new method, but whether the upload works I can't say?

@phraenquex
Copy link
Collaborator Author

@ConorFWild can you clarify/elaborate on your last sentence, pelase?

If there are no updates to the metadata necessary based on the new files, then only their presence is necessary for alignment.

What are you explaining here?

@ConorFWild
Copy link
Collaborator

@phraenquex Basically - these new files contain essential alignment state, new runs of the aligner will not work without them, and hence they should be saved somewhere by the uploader. They also contain interesting information we may one day want to serve to the frontend, so again they should be saved, albeit not necessarily parsed and added to tables (unless there is a table containing a file list).

However they do not change the form of any currently uploaded and parsed data, and hence there may actually be no work for @tdudgeon @kaliif , barring keeping track of the fact there are new files!

@kaliif
Copy link
Collaborator

kaliif commented May 24, 2024

If the loader doesn't need to look into the files, then there should be nothing to do with the target loader - all yaml files are already saved and included in LHS download.

@tdudgeon
Copy link
Collaborator

I tried this out on the CHIKV_Mac data and hit a bug:

2024-05-24 15:26:36.265 | WARNING  | ligand_neighbourhood_alignment.align_xmaps:read_xmap_from_mtz:633 - Trying DELFWT DELPHWT
Origin for xmap is now: [37.44  46.927 -3.769]
2024-05-24 15:26:36.471 | INFO     | ligand_neighbourhood_alignment.cli:_update:1451 - Writing to: data/lb32633-6/upload_1/aligned_files/CHIKV_MacB-x0692/CHIKV_MacB-x0692_D_304_1_CHIKV_MacB-x0692+D+304+1_event.ccp4
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/data/github/im/fragalysis_api/xchem-align/src/xchemalign/aligner.py", line 879, in <module>
    main()
  File "/data/github/im/fragalysis_api/xchem-align/src/xchemalign/aligner.py", line 866, in main
    a.run()
  File "/data/github/im/fragalysis_api/xchem-align/src/xchemalign/aligner.py", line 232, in run
    new_meta = self._perform_alignments(input_meta)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/github/im/fragalysis_api/xchem-align/src/xchemalign/aligner.py", line 465, in _perform_alignments
    updated_fs_model = _update(
                       ^^^^^^^^
  File "/data/github/im/fragalysis_api/xchem-align/venv_assembly_alignment/lib/python3.11/site-packages/ligand_neighbourhood_alignment/cli.py", line 1482, in _update
    __align_xmap(
  File "/data/github/im/fragalysis_api/xchem-align/venv_assembly_alignment/lib/python3.11/site-packages/ligand_neighbourhood_alignment/align_xmaps.py", line 582, in __align_xmap
    interpolation_range = _get_interpolation_range(neighbourhood, running_transform, reference_xmap)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/github/im/fragalysis_api/xchem-align/venv_assembly_alignment/lib/python3.11/site-packages/ligand_neighbourhood_alignment/align_xmaps.py", line 238, in _get_interpolation_range
    rglb, rgub = get_grid_bounds(tlb, tub, reference_xmap)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/github/im/fragalysis_api/xchem-align/venv_assembly_alignment/lib/python3.11/site-packages/ligand_neighbourhood_alignment/align_xmaps.py", line 76, in get_grid_bounds
    floor(xmap.nu * tlbf.x),
    ^^^^^^^^^^^^^^^^^^^^^^^
ValueError: cannot convert float NaN to integer

@ConorFWild any ideas what's wrong?
I can provide data if needed, but it's huge.

@ConorFWild
Copy link
Collaborator

@tdudgeon I'm going to guess that what has happened is that one of the alignments has failed, and propagated a nonsense transform operator - can you link me the location where this was run - I can probably take a first stab at the problem by looking at the transform yamls

@mwinokan mwinokan moved this to XChemAlign in Fragalysis May 29, 2024
@phraenquex phraenquex added 2024-06-14 mint Data dissemination 2 and removed 2024-03-15 indigo Data dissemination loose ends labels Jun 14, 2024
@mwinokan
Copy link
Collaborator

@ConorFWild what is the status of this ticket?

@phraenquex
Copy link
Collaborator Author

@ConorFWild please confirm that this was in fact done for green release.

Adding green tag so long.

@phraenquex phraenquex added the 2024-03-13 green Data dissemination label Sep 17, 2024
@mwinokan
Copy link
Collaborator

mwinokan commented Oct 1, 2024

@mwinokan to call @ConorFWild to confirm this has been implemented

@phraenquex phraenquex moved this from Dev Done - Do review (DEV) to In staging - assess function vs spec in Fragalysis Oct 1, 2024
@mwinokan mwinokan moved this from In staging - assess function vs spec to Approved in staging - push to production in Fragalysis Oct 22, 2024
@mwinokan
Copy link
Collaborator

All ribbons have looked good for uploads in August/September so concluding this has been merged

@phraenquex phraenquex moved this from Approved in staging - push to production to In production (Done) in Fragalysis Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024-03-13 green Data dissemination 2024-06-14 mint Data dissemination 2 XChemAlign
Projects
Status: In production (Done)
Development

No branches or pull requests

5 participants