Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding H5MD-format writer #2869

Closed
wants to merge 54 commits into from
Closed

Conversation

edisj
Copy link
Contributor

@edisj edisj commented Jul 24, 2020

Fixes #2866

Changes made in this Pull Request:

  • adds class H5MDWriter to H5MD.py

This is a continuation from #2787

Some info:
H5MD documentation - the writer is following these file format guidelines
pyh5md github - A lot of the code is borrowed (with credit attributed to the author) from this library

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@pep8speaks
Copy link

pep8speaks commented Jul 24, 2020

Hello @edisj! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 118:80: E501 line too long (87 > 79 characters)
Line 119:80: E501 line too long (92 > 79 characters)

Line 205:1: E402 module level import not at top of file

Line 79:29: E261 at least two spaces before inline comment
Line 79:80: E501 line too long (81 > 79 characters)

Line 807:80: E501 line too long (81 > 79 characters)

Line 203:80: E501 line too long (102 > 79 characters)
Line 416:80: E501 line too long (102 > 79 characters)
Line 785:80: E501 line too long (83 > 79 characters)

Line 628:80: E501 line too long (108 > 79 characters)
Line 630:80: E501 line too long (109 > 79 characters)
Line 632:80: E501 line too long (102 > 79 characters)
Line 979:50: E231 missing whitespace after ','
Line 989:80: E501 line too long (96 > 79 characters)
Line 1007:1: E305 expected 2 blank lines after class or function definition, found 1
Line 1007:3: E225 missing whitespace around operator
Line 1007:5: E225 missing whitespace around operator
Line 1007:7: E225 missing whitespace around operator
Line 1008:7: E225 missing whitespace around operator
Line 1009:5: E113 unexpected indentation
Line 1075:50: E231 missing whitespace after ','
Line 1089:80: E501 line too long (103 > 79 characters)
Line 1090:80: E501 line too long (94 > 79 characters)
Line 1091:80: E501 line too long (96 > 79 characters)
Line 1092:80: E501 line too long (86 > 79 characters)
Line 1111:80: E501 line too long (129 > 79 characters)
Line 1112:80: E501 line too long (89 > 79 characters)
Line 1113:80: E501 line too long (91 > 79 characters)
Line 1115:80: E501 line too long (146 > 79 characters)
Line 1116:80: E501 line too long (111 > 79 characters)
Line 1135:80: E501 line too long (91 > 79 characters)
Line 1136:80: E501 line too long (85 > 79 characters)

Line 1087:80: E501 line too long (91 > 79 characters)

Line 27:80: E501 line too long (105 > 79 characters)
Line 276:80: E501 line too long (103 > 79 characters)
Line 309:45: E128 continuation line under-indented for visual indent
Line 606:80: E501 line too long (100 > 79 characters)
Line 608:80: E501 line too long (100 > 79 characters)
Line 610:80: E501 line too long (100 > 79 characters)
Line 612:80: E501 line too long (100 > 79 characters)
Line 614:80: E501 line too long (100 > 79 characters)
Line 616:80: E501 line too long (100 > 79 characters)
Line 618:80: E501 line too long (100 > 79 characters)
Line 620:80: E501 line too long (100 > 79 characters)
Line 622:80: E501 line too long (100 > 79 characters)
Line 624:80: E501 line too long (100 > 79 characters)
Line 626:80: E501 line too long (100 > 79 characters)
Line 628:1: W293 blank line contains whitespace
Line 763:80: E501 line too long (82 > 79 characters)
Line 766:80: E501 line too long (88 > 79 characters)
Line 771:80: E501 line too long (86 > 79 characters)

Line 412:80: E501 line too long (80 > 79 characters)

Line 577:10: E225 missing whitespace around operator

Line 528:80: E501 line too long (83 > 79 characters)
Line 977:80: E501 line too long (81 > 79 characters)
Line 1827:80: E501 line too long (83 > 79 characters)
Line 2063:80: E501 line too long (83 > 79 characters)

Line 268:80: E501 line too long (81 > 79 characters)
Line 271:80: E501 line too long (81 > 79 characters)

Line 45:80: E501 line too long (82 > 79 characters)
Line 235:80: E501 line too long (81 > 79 characters)

Line 417:72: W291 trailing whitespace
Line 419:60: W291 trailing whitespace

Line 121:80: E501 line too long (81 > 79 characters)
Line 143:80: E501 line too long (81 > 79 characters)

Line 1024:31: E201 whitespace after '['
Line 1027:31: E201 whitespace after '['
Line 1067:80: E501 line too long (84 > 79 characters)
Line 1068:80: E501 line too long (133 > 79 characters)
Line 1069:80: E501 line too long (129 > 79 characters)
Line 1070:80: E501 line too long (132 > 79 characters)
Line 1071:80: E501 line too long (127 > 79 characters)
Line 1303:39: E127 continuation line over-indented for visual indent
Line 1315:39: E127 continuation line over-indented for visual indent
Line 1344:39: E127 continuation line over-indented for visual indent
Line 1358:39: E127 continuation line over-indented for visual indent

Line 78:80: E501 line too long (100 > 79 characters)

Line 93:80: E501 line too long (84 > 79 characters)
Line 93:81: E231 missing whitespace after ','

Line 357:21: E127 continuation line over-indented for visual indent

Line 47:80: E501 line too long (87 > 79 characters)

Comment last updated at 2021-03-16 01:33:57 UTC

@edisj
Copy link
Contributor Author

edisj commented Jul 24, 2020

The first commit is a preliminary commit just so I get the pull request up and familiarize again with github branching. I'll remove the draft tag when a working version is pushed.

TODO:

  • add pyh5md licensing information into the LICENCE file
  • push commits with borrowed code with proper credit attributed

ts = ag.trajectory.ts
except AttributeError:
errmsg = "Input obj is neither an AtomGroup or Universe"
raise TypeError(errmsg) from None
Copy link
Member

@tylerjereddy tylerjereddy Jul 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this block I'm also reminded of cases in the code base that look something like this:

        if isinstance(arg, Universe):
            # action 1
        elif isinstance(arg, AtomGroup):
            # action 2
        else:
            raise ValueError("arg is not a Universe or AtomGroup")

I don't know if it is formally cleaner than the try/except, maybe it is just the nesting that made me think an alternative might be desired.

Copy link
Member

@lilyminium lilyminium Jul 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

try:
    ts = ag.universe.trajectory.ts
except AttributeError:
    ...

?

I think there's a performance difference to try/except vs if/else, where try/except is faster unless the exception is raised.

Edit: that being said, the if/else is probably cleaner because it's more restrictive in the allowed types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python Zen says "It's easier to ask for forgiveness than to ask for permission." (which supposedly means that idiomatic Python uses try/except instead of if when possible). The try/except also allows duck-typing (if so desired).

For the performance question: could be tested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is just the nesting that caught my attention really--if the approach from @lilyminium is equivalent that may be nicer.

If you're really keen on avoiding the conditionals, looks like there are some opportunities in the code base.

Copy link
Member

@tylerjereddy tylerjereddy Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt it is worth much energy though.

@orbeckst orbeckst mentioned this pull request Jul 29, 2020
4 tasks
@richardjgowers
Copy link
Member

richardjgowers commented Jul 30, 2020 via email

@orbeckst orbeckst added the NSF REU NSF Research Experience for Undergraduates project label Aug 7, 2020
@orbeckst orbeckst self-assigned this Aug 7, 2020
Cédric Bouysset and others added 16 commits August 21, 2020 17:15
* Overview of work done in PR

  This PR adds the `RDKitConverter` class which converts MDAnalysis
  `Universe`/`Atomgroup` objects to `RDKit rdchem.Mol` objects.

* Limitations
  - Bonds and elements must be present (the former will be inferred if not
    present).
  - This converter mainly aims at supporting cases where explicit hydrogens
    are present.

* Extra implementation details
  - Bond order and formal charges are inferred via atomic valencies & the
    number of unpaired electrons (see `_infer_bo_and_charges`).
  - In part due to the influence of atom read order on inferring, bond
    conjugation and functional groups are standardized using SMARTS
    reactions (see `_standardize_patterns`).

*  Other changes

   Also includes some PEP8 changes and some cleaning up of the tests for the
   `RDKITReader`.

* References
  For more information on the development process for this PR, see:
  1) https://cbouy.github.io/blog/2020/07/01/rdkit-converter
  2) https://cbouy.github.io/blog/2020/07/22/rdkit-converter-part2
…ter (Issue MDAnalysis#2468)

Towards MDAnalysis#2468 (RDKit interoperability GSOC project)

PR MDAnalysis#2916 -> Originally PR MDAnalysis#2775

##Overview of work done in PR

This PR adds the RDKitConverter class which converts MDAnalysis
Universe/Atomgroup objects to RDKit rdchem.Mol objects.

##Limitations

  - Bonds and elements must be present (the former will be inferred if not
     present).
  - This converter mainly aims at supporting cases where explicit hydrogens
     are present.

##Extra implementation details

  - Bond order and formal charges are inferred via atomic valencies & the
     number of unpaired electrons (see _infer_bo_and_charges).
   - In part due to the influence of atom read order on inferring, bond
      conjugation and functional groups are standardized using SMARTS
      reactions (see _standardize_patterns).

##Other changes

Also includes some PEP8 changes and some cleaning up of the tests for the
RDKITReader.

##References

For more information on the development process for this PR, see:
https://cbouy.github.io/blog/2020/07/01/rdkit-converter
https://cbouy.github.io/blog/2020/07/22/rdkit-converter-part2
* modified AtomNames topologyattr to include lookup table index

* cheeky little optimisation

* rework atom name selection to use lookup tables

* Update topologyattrs.py

* fixed test supplying integer as atom name

really topologyattrs need to be statically typed and protective about this

* Update test_topologyattrs.py

* use dict-lookup string attrs EVERYWHERERE

* removed some code duplication

made protein selection faster, 48ms -> 0.5ms on GRO testfile

* improved nucleic/backbone selections

* Added explicit tests for Resnames topologyattr

tests now provide str types for resnames/icodes

* use fnmatchcase to be case sensitive

* Update package/MDAnalysis/core/selection.py

@jbarnoud's fix

* apply suggestions from code review

Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>

* added test for setting multiple segids at once

Co-authored-by: Oliver Beckstein <orbeckst@gmail.com>
Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
 - Removes `as_Universe` from MDAnalysis.core.universe
 - LeafletFinder no longer accepts paths to files as Universes
Update function signatures in core.Universe
Towards MDAnalysis#2468 

## Work done in this PR
Adds a new SMARTS based selection which uses the RDKit converter.
* Fixes MDAnalysis#2860
* refactor transformations from closure into class (necessary so that a Universe
   with transformations can be pickled).
* change universe pickling to `__reduce__` (instead of `__setstate__`/`__getstate__`, which
   did not work with transformations)
* add test for pickling a Universe with transformations
* update docs for transformations
   - examples (written as class)
   - deprecate transformations as closures (still works but cannot be pickled due
      to limitations in Python's pickle module)
* update changelog
Fixes issue MDAnalysis#2939 

## Work done in this PR
- Adds missing file types of testsuite/setup.py
  (*top, *in, *sdf, and the contents of data/gromacs/gromos54a7_edited.ff/*).
- fix MDAnalysis#2934
- use unistd.h in ENCORE's spe.c when not WIN32 (e.g.  __unix__ or __APPLE__ but to be consistent
  with other uses in the code base we do not specifically test for __unix__ ||
  __APPLE__)
- update CHANGELOG
…apple

ensure that unistd.h is included on recent clang
Towards MDAnalysis#2739 

## Work done in this PR
Removes the old helanal code in favour of helix_analysis
lilyminium and others added 10 commits October 10, 2020 17:50
Fixes issue MDAnalysis#2951 

## Work done in this PR

- Restores the parmed intersphinx link.
Fixes MDAnalysis#2979

## Work done in this PR

The file test_bat_IO.npy was created in one test and used in
another one. This leads to the second test possibly failing
when the tests are run in parallel. This commit moves creating
the file into a fixture that the two tests depends.

The new fixture also uses the tmpdir fixture to avoid the file
created during the tests to remain in the test directory.
Not an issue targetted fix.

## Overview

mamba is a dropin replacement for the conda CLI but faster

## Work done in this PR

Enabled the `MAMBA` environment variable keyword for the CI helper.
* Fixes MDAnalysis#2990
* Update indexing in Charges and Masses get_residues: 
   Avoid indexing a numpy array with a list of arrays, use a tuple of array
   instead. Multidimensional indexing of arrays with a non-tuple argument
   is deprecated.
* Turn warnings into errors to avoid regression over MDAnalysis#2990
* Update CHANGELOG
* TST, CI: add ARM64 Graviton 2 to CI

* start testing MDAnalysis on ARM64 using
new AWS Graviton 2 architecture available
in Travis CI

* testing on my fork shows only two failures
in the MDAnalysis test suite using a fairly
minimal set of dependencies; we can probably
either fix those or skip them and open an issue

* the test failures are:
`test_written_remarks_property`
`TestEncoreClustering.test_clustering_three_ensembles_two_identical`

* run time for this CI entry is about
23 minutes, which is solid for ARM; we save a lot
of time using experimental upstream wheels in some
cases and excluding (source builds of) dependencies
where possible until the ARM64 binary wheel ecosystem
matures a bit more

* MAINT: ARM64 shims

* adjust `CAffinityPropagation()` C-level function to use `double`
instead of `float` in some strategic locations that allow the
`test_clustering_three_ensembles_two_identical()` test to pass
on gcc115 ARM64 machine

* mark `test_written_remarks_property()` as a known failure on
ARM64; it fails in Travis CI on that platform, but not on gcc115
ARM64 machine; that said, this test is already known to be flaky
on Windows 32-bit (may depend on character settings on machine?)

* MAINT: MR 2956 revisions

* reduce optimization level of MDAnalysis builds on
ARM64 to avoid problems with `ap.c` causing test failures
on that architecture

* revert any source changes to `ap.c`
Fixes issue MDAnalysis#1989, completes PR MDAnalysis#2956

## Work done in this PR

* Switches CI to using recently released SciPy `1.5.3` with "official" Linux ARM64 wheels
over the weekend, instead of the previous "special"/unreleased wheels
* Add a `CHANGELOG` entry to reflect preliminary MDAnalysis support
for the Linux ARM64 platform (minimal dependencies)
Fixes MDAnalysis#2993

Redirect URLs in stable/sitemap.xml and dev/sitemap.xml to correct locations
* Add a reindex argument to the PDBWritter

* Tests for PDB reindex

* Remove extra print

* Test PDB CONECT with reindex=False

* Update CHANGELOG

* More precise version change for PDB reindex

* Fix long lines

* Increase test coverage for MDAnalysis#2886

Also silence some warnings for the tests in the PR.

* Remove useless branch in PDB.py

There was a test in PDBWriter._write_pdb_bonds about writing the bonds
for an atomgroup that does not have AG.ids but asking to not reindex.
The test was silencing the problem and would lead to the bonds not being
written. However, that part of the code would never be reached because
writing the atoms would have raised an exception before that method is
called.

This commit removes this test. Now, if that part of the code is called
(which should not happen), an exception is raised when the method tries
to access AG.ids.
@edisj edisj marked this pull request as ready for review March 16, 2021 00:44
@orbeckst
Copy link
Member

@edisj you might want to rebase against latest develop (or at least merge and clean up conflicts) — a lot has changed since last year.

@edisj
Copy link
Contributor Author

edisj commented Mar 16, 2021

@orbeckst rebase is complete and everything is up to date now (I hope)

edit: I feel like I didn't do it correctly... are all of the new commits since then supposed to appear in this pull request?

@orbeckst
Copy link
Member

They are not supposed to be here. The history should look as if you just branched of the latest develop. Maybe your own develop was not up-to-date?

@edisj
Copy link
Contributor Author

edisj commented Mar 16, 2021

My develop seems to be up to date... I'm going to save the code and try to do a hard reset in the morning

@orbeckst
Copy link
Member

If you make a branch "backup" right here in your local repo, then switch back to your issue-2866-h5mdwriter branch, then you can always come back to this state by checking out backup. git keeps stuff around.

Btw, does your local history show all these other commits?

@edisj edisj mentioned this pull request Mar 24, 2021
4 tasks
@edisj
Copy link
Contributor Author

edisj commented Mar 24, 2021

I'm moving this pull request to #3189 to avoid all the issues caused by rebasing.

@edisj edisj closed this Mar 24, 2021
@edisj edisj deleted the issue-2866-h5mdwriter branch April 7, 2024 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NSF REU NSF Research Experience for Undergraduates project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding H5MDWriter to H5MD.py
10 participants