Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PSFParser fails to read alternative resids #4189

Open
pipitoludovico opened this issue Jul 6, 2023 · 7 comments
Open

PSFParser fails to read alternative resids #4189

pipitoludovico opened this issue Jul 6, 2023 · 7 comments
Labels

Comments

@pipitoludovico
Copy link

Expected behavior

Loading a PSF and a trajectory with alternate conformation inside.

Actual behavior

PSFParser expects only integers and fails when finding alternate residues:
example:
loading a psf that includes
...
2273 P2 52 SER HG1 H 0.430000 1.0080 0
2274 P2 52 SER C C 0.510000 12.0110 0
2275 P2 52 SER O O -0.510000 15.9990 0
2276 P2 52A ALA N NH1 -0.470000 14.0070 0
2277 P2 52A ALA HN H 0.310000 1.0080 0
2278 P2 52A ALA CA CT1 0.070000 12.0110 0
...
raises a ValueError:
ValueError: Failed to construct topology from file ionized.psf with parser <class 'MDAnalysis.topology.PSFParser.PSFParser'>.
Error: invalid literal for int() with base 10: '52A'

Code to reproduce the behavior

    import MDAnalysis as Mda
    from MDAnalysis import transformations
    ext = ('dcd', 'xtc')
    traj_name = 'output'
    for trajectory in os.listdir(os.getcwd()):
        if trajectory.startswith(traj_name) and trajectory.endswith(ext):
            u = Mda.Universe('ionized.psf', trajectory)
            prot = u.select_atoms("protein and name CA")
            ag = u.atoms
            workflow = (transformations.unwrap(ag),
                        transformations.center_in_box(prot),
                        transformations.wrap(ag, compound='fragments'))
            u.trajectory.add_transformations(*workflow)

            with Mda.Writer('wrapped_MDA_TEST_GIUSEPPE.xtc', ag) as w:
                for ts in u.trajectory:
                    if ts is not None:
                        w.write(ag)
import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PSF, DCD,  GRO, PDB, TPR, XTC, TRR,  PRMncdf, NCDF

fails here already when reading resid "52A" with the PSFParser -> u = mda.Universe(PSF, DCD)

....

Current version of MDAnalysis

  • Which version are you using? 2.4.3
  • Which version of Python (python -V)? 3.10.11
  • Which operating system? Ubuntu 20.04
    Thanks!
@ztimol
Copy link
Contributor

ztimol commented Jul 9, 2023

In PDB files I believe the res id needs to be an integer. Is this possibly the same for PSF files?

@pipitoludovico
Copy link
Author

pipitoludovico commented Jul 9, 2023

Generally .pdb and .psf resnums ( 52 SER 52 SER...) can have letters in their code and the alternative conformations can be read from those.
I'm proposing to change the value type from int to str to overcome this issue.

specifically:

#this resids = np.zeros(numlines, dtype=np.int32) to this resids = np.zeros(numlines, dtype=np.object)

@IAlibay
Copy link
Member

IAlibay commented Jul 9, 2023

We already handle altlocs for PDBs and it's a topology attribute in its own rights. i.e. I believe resum has to be an int, so here we'd have to do a check on the type and handle it appropriately. Implementation aside, this simply seems like an edge case we just weren't aware of.

Is there a PSF file standard defined somewhere? It would be easier to make sure we don't miss any further edge cases.

@pipitoludovico
Copy link
Author

To the best of my knowledge the only reference I could find for psf is the CHARMM-GUI section:
https://www.charmm-gui.org/?doc=lecture&module=pdb&lesson=6

These PSFs with alternative conformations derive from VMD's psfgen autopsf when starting from a structure that has altloc.
As you said this is quite an edge case.

Thank you!

@orbeckst
Copy link
Member

orbeckst commented Oct 7, 2023

CHARMM is now free for academic use https://academiccharmm.org/news/free-charmm so one could look at the source. The CHARMM PSF is related to XPLOR PSF... but I could also not find a file format description there.

The summary in CHARMM-GUI: Lesson 6: Protein Structure File Format might be the closest to a format description apart from the source code. It does not mention the alternating conformations, though.

If we were to parse psfgen alternate conformations then we should do it so that

  • reside remains an int for backwards compatibility
  • the altloc is stored as a topology attribute to be consistent with PDB handling

@orbeckst
Copy link
Member

orbeckst commented Oct 7, 2023

@pipitoludovico does the trajectory contain coordinates for BOTH alternatives? (What software would produce such trajectories?)

@IAlibay
Copy link
Member

IAlibay commented Oct 7, 2023

If it's possible to get a copy of the PSF it might be easier to work out what's going on here - a wild guess is that there's only one altloc but that the tag is retained in the PSF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants