Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The path to predictions #260

Closed
electronicsbyjulie opened this issue May 29, 2023 · 21 comments · Fixed by #268
Closed

The path to predictions #260

electronicsbyjulie opened this issue May 29, 2023 · 21 comments · Fixed by #268

Comments

@electronicsbyjulie
Copy link
Contributor

To dock ligands in an active conformation protein model would involve:

  • Do a positional homology based on the active OR51E2 model;
  • If there is a TMR6-EXR2 salt bridge, engage it;
  • Soft-pocket all the TMRs without a ligand to get internal clashes down to pre-homology levels;
    • All TMR6 rotations would be centered on the salt bridge residue, and the bridge would be re-formed at the end;
  • A hard dock could then be done, and the total binding strength to TMR6 times the sum of TMR2 thru TMR5 would be the prediction.

For ligands below a certain total binding strength, the pre-activation raw PDB dock binding energy would be subtracted.

An accuracy of 80% or better would be just as trustowrthy as the empirical measurements.

@primaryodors
Copy link
Owner

Can certainly work on this intermittently while solving higher priority issues.

@primaryodors primaryodors added the medium priority This should be made working as soon as time permits. label May 29, 2023
@primaryodors
Copy link
Owner

Also see:

The takeaway here is that not only are there multiple possible active configurations per receptor, but also there are configurations active for some G proteins but not others, and these dissimilar conformers bind to different sets of agonists. So it is not sufficient to say that any given receptor is sensitive to an odorant, but rather that a receptor-G-protein pair is sensitive to that odorant, or a receptor can be sensitive to it depending on the G protein.

Originally posted by @primaryodors in #256 (comment)

In other words, truly accurate predictions will rely on not just an active configuration model of the receptor protein, but one (or ones) made to fit whichever G protein(s) are expressed in the olfactory cells. Presumably this is usually Golf for in vivo olfaction, but some of the empirical measurements may have used other G proteins besides Golf, so this may explain some of the self contradictions in the scientific data.

On the other hand, this makes the job easier when it comes to extrapolating active receptor structures. A typical active receptor model would be modified from the AlphaFold original to 1.) fit the Golf structure or other G protein on the cytoplasmic end; 2.) maintain a salt bridge between EXR2 and the extracellular end of TMR6; 3.) otherwise resemble the conformational changes found in the active OR51E2 cryo-EM model; 4.) minimize internal clashes of the protein at least to the energy level of the AlphaFold original.

Once such a model is generated of a receptor-G-protein pair, it can be serialized as a PDB file, in a subfolder named active, with a name that identifies both the receptor and the G protein, and then all docks done on this active state conformer would be hard docks for the sake of performance.

primaryodors added a commit that referenced this issue May 29, 2023
@primaryodors
Copy link
Owner

Can use the g-protein-based-reshaping branch for development on this.

@electronicsbyjulie
Copy link
Contributor Author

electronicsbyjulie commented May 30, 2023

Some of the contact points between proteins:

GNAS1 - OR51E2:     BW                Conserved?
--------------------------------------------------
Q35   - N136        *4.38             Y
A39   - A132/A133   *4.34-5/*3.61-2   N
D215  - R130        *3.59             mostly
E392  - K294        7.56              ORs only, less one orphan; not TAARs.

R385  - E232        6.29              >50%
E392  - K296        7.58              mostly
D215  - R130        3.59              usually R or H
Y358  - S229        5.70/6.26         mostly; let's call this residue 56.50.
Y391  - H131        3.60              most receptors have an H or an R in the 3.59 position; OR51E2 is unusual having HR.

@primaryodors primaryodors added high priority Above average priority. and removed medium priority This should be made working as soon as time permits. labels May 30, 2023
@primaryodors
Copy link
Owner

To break the task down into easier pieces:

  • In the Protein class, create a ftn that accepts a pointer to another Protein.

    • For every amino acid in the local object, and every residue in the other protein, sum up their reach values and, if their CA-CA distance is less than or equal to the summed reach (plus an interaction distance of say 2.5A), add both residues to a std::vector (and mark both residues "dirty" to prevent duplicates). Then return that vector.
  • Next, create a new app, with all the necessary entries in makefile.

    • The app will accept two command line arguments: a PDB of a GPCR, and a PDB of a G protein. It will use the three conserved points of contact above, omitting the A39-AA34.xx contact which is not conserved in ORs.
  • Any GPCR lacking BW can still be used if it has the right motifs:

    • DRYXAICXPLXY finds *3.59 as the last X.
    • After that motif comes four more residues and then *4.38.
    • 7.56 is the fourth from last letter of the PMLNPLIYSLRNKD motif.
  • If any motifs are not found and there are no REMARK rows indicating BW numbering, then an error is generated.

  • G protein alignment shall be accomplished entirely by motif:

    • There is an EKQLQKD motif early in the sequence; 2 letters after it is Q35.
    • D215 is the last letter of GIFETKFQVD.
    • E392 is the E of the terminal YELL motif.
  • Once the contact points have been identified, the GPCR shall be uprighted and the G protein positioned so that the YELL-7.56 contact is at first superimposed. Next, the G protein shall be rotated about the Y axis to bring its Q contact near the *4.38 residue.

  • The exact positioning of the G protein will then be adjusted to place GPCR:*4.38 in the +Y direction from GNAX:Q(35), with the optimal Y distance for polar contact between the two residues.

  • If there is a salt bridge for TMR6-EXR2, then the helix shall be rotated in such a way as to facilitate that salt bridge contact. Then the entire TMR6 helix will be rotated outward, by its salt bridge if it has one, else by its 6.48 residue, until it avoids clashes with the G protein. If there is no salt bridge and the +Y end is clashing with the other parts of the protein, then it shall be rotated outward, centering rotation on the -Y end, to avoid clashes.

  • The output of the new inter-protein function can be used to fine tune the positioning of the G protein and of all TMRs, and to conform the side chains.

  • At this point, the two proteins can be written to a single combined PDB. Create the ability to specify a chain letter for each protein. The GPCR can be chain A, and the G protein chain B. Current code is that for docking purposes, PrimaryDock will only load chain A. The file should not be saved in the usual pdbs/OR##/ folder, but there should be a dedicated folder under pdbs/ for storing coupled models, also organized under OR##/ folders like the root of pdbs/ has.

  • It would be optimal to also flex the backbone between TMR helices and rejoin the strands that inevitably get broken. What might be useful for this task would be to use Molecule::conform_molecules() and include the covalent bonds between adjacent residues in the conformer search. Each motion could be automatically accepted if it results in bettering either the worst or the average of the set of covalent bonds.

We've opted to increase this task to the same priority as the unit tests. The two may be developed concurrently, with equal time given to both.

@electronicsbyjulie
Copy link
Contributor Author

Were it only that simple, but the G proteins change shape too. The chain from 8F76 can be extracted with the new update to pepteditor and will at least fit a class I OR.

@electronicsbyjulie
Copy link
Contributor Author

So the GNAL protein from 8iw1 is not even a real human G protein but an engineered chimera.

Wonder if it will still suffice.

@electronicsbyjulie
Copy link
Contributor Author

Very upset. Best binding paired the aliphatic atoms of d-limonene with OR1A1's Asn109 residue. I've added code to prevent pairing polar with nonpolar like that.

@primaryodors
Copy link
Owner

Don't forget #266 is an urgent task.

@electronicsbyjulie
Copy link
Contributor Author

The protein coupling app has a memory leak that interferes with testing it locally.

@electronicsbyjulie
Copy link
Contributor Author

Updated the list of inter-protein contacts.

Recommend rewriting the couple app to accept its own format of config file with lists of coupling pairs. Can offer a way to specify multiple matching aminos, e.g. the one in the list that's usually H or R, as well as a positional tolerance, e.g. specify K7.58 but allow for +/-2 so that K7.56 thru K7.60 will also match.

@primaryodors
Copy link
Owner

primaryodors commented Jun 4, 2023

Agreed. Can use a file extension like .cplcfg to distinguish it from a PrimaryDock .config file.

The two proteins can be specified with PROT1 and PROT2 params, overridable on the command line. Then multiple CONTACT params can follow, with a format similar to CONTACT DE215 RH3.59~2 using the tilde to specify positional tolerance. The first argument would apply to protein 1, and the second to protein 2.

If fewer than 3 of the contacts correspond to residues in both proteins, then the app should exit with an error.

@electronicsbyjulie
Copy link
Contributor Author

There would also want to be params for bridging, homology, and regions for repositioning.

@electronicsbyjulie electronicsbyjulie linked a pull request Jun 4, 2023 that will close this issue
@electronicsbyjulie
Copy link
Contributor Author

The positions of the EXR and CYT regions can be adjusted instead of moving the TM helices for optimizing the inter-protein contacts. For olfactory GPCRs, the movable regions are CYT1 (between TMR 1 and 2), EXR1 (2-3), CYT2 (3-4), CYT3 (5-6), EXR3 (6-7). EXR and CYT regions that do not contact the G protein or form a salt bridge can then further adjust during soft dock. There are four axes of motion: horizontally outward/inward; horizontally perpendicular to protein XZ center; rotation about Y axis; rotation about radial axis from protein XZ center.

Once the piece has been rotated, the adjoining TM helices are rotated about their other-end residue's CA atom to line up their same-end CA atom with where the piece "expects" it to be. The piece is then moved to align itself with the actual same-end locations, which will probably differ by a fraction of an Angstrom from "expected".

@electronicsbyjulie
Copy link
Contributor Author

It keeps flinging the TM helices out away from the rest of the protein! 😡

@primaryodors
Copy link
Owner

Switch gears and work on the unit tests for a while. It will be easier to solve the positioning bug after taking a break from it and coming back fresh.

@electronicsbyjulie
Copy link
Contributor Author

It's finally a weekend with nothing going on. It's finally the perfect time to really dig into the code and figure out why it keeps messing up.

@primaryodors
Copy link
Owner

If you can find the cause of the problem tonight, then go ahead and take the time to fix it.

@electronicsbyjulie
Copy link
Contributor Author

Even if limiting the motions to Δy = 0, TMR6 quickly pulls too far away from the rest of the protein.

The answer won't be improving the existing functionality, but rather performing a 3D conformational search that first lines up the contacts' CA atoms at suitable distances, particularly those of segments that don't move, then adjusts each movable segment to maximize contact with its contact partner. No iterations, just a one-time best fit followed by a one-time fine adjustment.

So, technically, I did find the cause of the problem tonight.

@primaryodors
Copy link
Owner

I should have been more clear.

Proceed.

@electronicsbyjulie
Copy link
Contributor Author

There is a hydrophobic pocket in the cryo-EM model formed by I125, I220, V224, L227 of OR51E2 and L388, L393, L394 of GNAS. As these are not direct contacts between individual aminos but rather a general grouping, there ought to be a way to define such a thing as another form of contact point.

@primaryodors primaryodors added on hold Waiting for something else to be working first. and removed high priority Above average priority. labels Jun 11, 2023
@primaryodors primaryodors removed the on hold Waiting for something else to be working first. label Jun 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants