-
Notifications
You must be signed in to change notification settings - Fork 676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strict PDB parsing #1966
Comments
@kain88-de in what way don't we support resids out of order? Is this if you have more than 10k residues and they're also randomly arranged? |
In this example I do want atom 12 to have resid 12. ln [1]: u = mda.Universe(test.pdb)
In [2]: u.residues.resids
Out[2]:
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 6999, 10012]) I tested a little bit. This doesn't happen for all resid values of atom 11. Only when reaches a value larger than 5013. |
Yeah the code checks for a downwards jump of greater than 5,000 to guess when a resid has looped, this allows small fluctuations. Could easily add a dont-fix-resids kwarg to the parser |
I would prefer a strict flag. I don’t really if we go away with lots of
tiny switches to tune the PDB reader.
…On Mon 2. Jul 2018 at 17:32, Richard Gowers ***@***.***> wrote:
Yeah the code checks for a downwards jump of greater than 5,000 to guess
when a resid has looped, this allows small fluctuations. Could easily add a
dont-fix-resids kwarg to the parser
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1966 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEGnVsA4w9_fS72N91zsWwc4vVBWCRP4ks5uCj0qgaJpZM4U_Zdz>
.
|
We used to have two PDB readers... I think we disliked maintaining both but I can see the appeal of having at least one that actually follows the standard and just fails if the input file does not follow. |
we could add a |
The strict keyword has two advantages for me. It is one simple switch and it's meaning is easy to guess also for a new user. We can write a fast cython implementation to read a standard compliant PDB in the future (something pandas is doing with csv). This is good if having a faster PDB reader is desirable. Instead of a strict flag I a flavor flag would also be OK. This could, for now, have the values permissive and strict. But in the future others can be added like hybrid-36. This way we can remove a lot of the guesswork code we have right now to read a single frame. Instead it's all done on initialization. |
Flavor sounds good, btw hybrid-36 is issue #1897 . |
Expected behaviour
We have a PDB with residues IDS that are out of order. MDAnalysis should read this PDB like any other valid PDB file. The standard doesn't say anything about ordering. It would be nice to have a
strict
flag that only allows standard conform PDBs without any of the inofficial extensions that have been added over the decades.Actual behaviour
Because we support large PDBs with more then 9999 residues we reset the counter and assume that if a new residue number is smaller then the last that we reached a number equal or above 10000.
Currently version of MDAnalysis:
(run
python -c "import MDAnalysis as mda; print(mda.__version__)"
)dev
The text was updated successfully, but these errors were encountered: