Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate hydrogen bond analysis in water bridge analysis #2913

Merged
merged 30 commits into from
Feb 5, 2021

Conversation

xiki-tempula
Copy link
Contributor

@xiki-tempula xiki-tempula commented Aug 18, 2020

Fixes #

deprecate hydrogen bond analysis in water bridge analysis.
The hydrogen bond analysis is not used in the original water bridge analysis and I have cleared some reference to the hydrogen bond analysis in the doc sections.
Move water bridge analysis from hbonds to hydorgenbonds

Related to #2739 and #2746

Changes made in this Pull Request:

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@pep8speaks
Copy link

pep8speaks commented Aug 18, 2020

Hello @xiki-tempula! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-01-27 21:11:07 UTC

@xiki-tempula xiki-tempula marked this pull request as draft August 18, 2020 10:52
@IAlibay
Copy link
Member

IAlibay commented Aug 18, 2020

Going to ping @RMeli here, I believe this move was started in PR #2746, but I don't know why it stalled.
(also @xiki-tempula can you add a reference to #2739 in the PR text + eventually changelog? that way we can remember to update the progress on it).

@xiki-tempula
Copy link
Contributor Author

@IAlibay I have chatted with @RMeli before initiating this PR. This PR is still WIP and I will do this after I solved all the PEP8 issue.

@RMeli
Copy link
Member

RMeli commented Aug 18, 2020

@IAlibay, PR #2746 stalled because of #2745 (comment) and for lack of time. @xiki-tempula did ask me if he could supersede my PR as part of his current work and I have no problems with that.

@xiki-tempula xiki-tempula marked this pull request as ready for review August 18, 2020 16:23
@codecov
Copy link

codecov bot commented Aug 18, 2020

Codecov Report

Merging #2913 (e92d8ce) into develop (0de538e) will increase coverage by 0.00%.
The diff coverage is 98.93%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #2913   +/-   ##
========================================
  Coverage    93.17%   93.17%           
========================================
  Files          171      171           
  Lines        22735    22741    +6     
  Branches      3216     3216           
========================================
+ Hits         21183    21189    +6     
  Misses        1504     1504           
  Partials        48       48           
Impacted Files Coverage Δ
package/MDAnalysis/analysis/hbonds/__init__.py 100.00% <ø> (ø)
...nalysis/analysis/hydrogenbonds/wbridge_analysis.py 97.41% <98.92%> (ø)
...kage/MDAnalysis/analysis/hydrogenbonds/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0de538e...e92d8ce. Read the comment docs.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments + need to work out what the target version will be for full removal.

package/CHANGELOG Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/hbonds/wbridge_analysis.py Outdated Show resolved Hide resolved
self.u.atoms[tmp_acceptors].positions,
box=self.box
)
warnings.warn(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the past we've been a little bit more cautious with stubs, specifically setting simplewarning through something like:

with warnings.catch_warnings():
    warnings.simplefilter('always', DeprecationWarning)
    warnings.warn(("This module has been moved to "
                   "MDAnalysis.analysis.hydrogenbonds.wbridge_analysis "
                   "It will be removed in release 3.0"),
                  category=DeprecationWarning)

from ..hydrogenbonds.wbridge_analysis import WaterBridgeAnalysis

Whilst I'm not sure the order of the import matters so much, the use of filtering is probably useful in cases where people tend to just blanket remove all warnings. Pinging @orbeckst here who probably has more context on why this was the decision taken for stubs circa ~ v0.21.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late (and unhelpful) reply: I don't remember and the more cautious approach by @IAlibay looks good to me.

Removal would be 2.0 (together with old h-bonds).

testsuite/MDAnalysisTests/analysis/test_wbridge.py Outdated Show resolved Hide resolved
@@ -0,0 +1,1799 @@
# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a cursory look, it seems fine as just the moved file + PEP8 changes. Unfortunately the relatively unhelpful git diff make this hard to check thoroughly (PEP8 probably could have just been dealt with in a separate PR but too late now).

Is there any particular changes here that you'd like us to look at?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is described in the PR description. I have

  • Fixed the PEP8 issue
  • Remove reference to the old HBA in the doc
  • Added the code for deprecation related stuff.

None of the code inside the wba is modified.

)
warnings.warn(
"This module has been moved to MDAnalysis.analysis.hydrogenbonds"
"It will be removed in MDAnalysis version 2.0. Please use "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next target version is 2.0, so removals have to be done afterwards unless we want to make this a v1.1 target instead. Thoughts @orbeckst ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we doing more versions of 1? I would deprecate in 1.xx and remove in 2.0, as was done for #2791 and #2622.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have to do a 1.1 for whatever reasons then we keep things as unchanging as possible. But the goal was to keep 1.x boring and as similar to the old stuff as possible, so any removals target 2.0.

I hope that we are only doing bug fixes for 1.x so that this stays 1.0.x

xiki-tempula and others added 6 commits September 29, 2020 16:31
Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
@xiki-tempula
Copy link
Contributor Author

@orbeckst Thank you for the advice. I have changed the changelog and remove the water_bridge from the hbond module. I will open another PR for master.

@xiki-tempula
Copy link
Contributor Author

@orbeckst Thank you for the advice. I was wondering if there is anything that I could do to get this PR merged. I'm working to port @p-j-smith 's MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis.lifetime() to water bridge analysis, so it would be nice if this PR could be merged.

@orbeckst
Copy link
Member

orbeckst commented Jan 19, 2021 via email

@xiki-tempula
Copy link
Contributor Author

@orbeckst Thanks for all the help. I wonder if you mind give a look, please? I will make another PR to the masters once this one is merged.

@orbeckst
Copy link
Member

orbeckst commented Jan 25, 2021 via email

@orbeckst orbeckst added this to the 2.0 milestone Jan 28, 2021
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR moves wbridge_analysis to hydrogenbonds. Please open a PR against master that just includes the deprecations, stating that in 2.0, it will be in hydrogenbonds. When the deprecation PR is approved, I can also approve the move here.

@orbeckst
Copy link
Member

@xiki-tempula what is your plan for wbridge analysis? Do you want to rewrite it to base it on the new hydrogenbond analysis? It would be really good to have HydrogenBondAnalysis and WaterBridgeAnalysis behave in the same way as far as selections and performance go.

If there's a way to avoid duplication of code then I would be very much in favor of, e.g., creating a base class for both HydrogenBondAnalysis and WaterBridgeAnalysis.

@xiki-tempula
Copy link
Contributor Author

@orbeckst The PR against the master is done in #3111.

The issue with HydrogenBondAnalysis is a quite tricky one. Since the WaterBridgeAnalysis is written based on the old HydrogenBondAnalysis, it has all the functionality of the old HydrogenBondAnalysis.

Later on, @p-j-smith introduced capped-distance to HydrogenBondAnalysis, which brought a major speed improvement. So I have adopted the capped-distance to distance calculations. Currently, WaterBridgeAnalysis and HydrogenBondAnalysis offer the same performance in terms of computing hydrogen bonds.

The selection problem is a bit tricky. As is discussed in #2177, the WaterBridgeAnalysis adopted the same selection scheme as the old HydrogenBondAnalysis. I'm currently working on a RDKIT based selection scheme.

It is difficult to write WaterBridgeAnalysis based on HydrogenBondAnalysis as WaterBridgeAnalysis needs a more complex data structure but it is easy to write HydrogenBondAnalysis based on WaterBridgeAnalysis. The only functionality that is present in HydrogenBondAnalysis but not WaterBridgeAnalysis is the lifetime analysis, which I plan to add when this PR is merged.

@p-j-smith
Copy link
Member

Hi, I don't know the full extent of the functionality of the WaterBridgeAnalysis class, but I have used HydrogenBondAnalysis to find hydrogen bonds and then used NetworkX to look for water-bridging in the following way:

  1. Find hydrogen bonds. Use an selection that include e.g. protein and water within 10 Angstrom of the protein, such as in the fourth example seen here: https://docs.mdanalysis.org/2.0.0-dev0/documentation_pages/analysis/hydrogenbonds.html#example-use-of-hydrogenbondanalysis Using the 10 Angstrom cutoff (as opposed to all water) makes both the hydrogen bond analysis and the path-finding faster.

  2. Iterate over frames. At each frame construct an adjacency matrix, A, of size (num_donors, num_accepors), where Aij=1 if there is a hydrogen bond between donor_i and acceptor_j. Use NetworkX to construct a graph from this adjacency matrix. Iterate over donor-acceptor pairs of the protein only, and use the NetworkX all_shorted_paths method (https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.shortest_paths.generic.all_shortest_paths.html#networkx.algorithms.shortest_paths.generic.all_shortest_paths) to find the shortest water-mediated paths between each donor and acceptor, if one exists. If the path is greater than a cutoff, say 5 water molecules, don't count this as path. I'm not sure, but I think there is also an all-to-all method in NetworkX that could remove the need to iterate over donor-acceptor pairs.

Another PhD student in my group had the idea of using NetworkX for this. We've only used it to look at the statistics of the path-length distribution, but NetworkX also returns the list of atoms in the through-water path from donor-to acceptor. I'm not sure whether this is something you would be interested in, but offloading the path-finding to NetworkX might give a performance boost whilst also keeping the same selections etc. as HydrogenBondAnalysis. If this is something you're interested in @xiki-tempula I'd be happy to work with you on it.

@xiki-tempula
Copy link
Contributor Author

@p-j-smith Thanks for the insight. The WaterBridgeAnalysis basically does what you have said with various tweaks for speed improvement and robustness.

The first step is to find the hydrogen bonds within a defined space. Instead of protein and water within 10 Angstrom of the protein, I make three selections, one selection is the start selection (selection 1) another selection is the end selection (selection 2), while the third water selection is a convex which is both within a certain distance from selection 1 and selection 2. Since the hydrogen bond from selection 1 to selection 2 is implicitly detected in this stage, it is easy to go from WaterBridgeAnalysis to HydrogenBondAnalysis but not the other way around.

The second step is finding a path from selection 1 to selection 2. I have written an optimized path-finder for this type of job. Though I have not tested the NetworkX implementation, I think it would be interesting to see how would NetworkX perform.
I don't think NetworkX would be significantly slower than my implementation but most of the time is spent on distance search, so it is not likely to matter a lot.

I think one strength of the WaterBridgeAnalysis is the flexible analysis functionality. The path-length distribution could be easily computed by providing the custom analysis function to the count_by_type function.

@IAlibay
Copy link
Member

IAlibay commented Jan 28, 2021

The selection problem is a bit tricky. As is discussed in #2177, the WaterBridgeAnalysis adopted the same selection scheme as the old HydrogenBondAnalysis. I'm currently working on a RDKIT based selection scheme.

Just jumping in here and focusing very specifically on the last sentence. I don't know what your plans are re: RDKIT-based selections, but if it's possible to make it optional that would be great. WaterBridgeAnalysis is a great core analysis feature of MDA, since we probably can't make RDKIT a core dependency, having the ability to use it without RDKIT is probably quite important.

@p-j-smith
Copy link
Member

I make three selections, one selection is the start selection (selection 1) another selection is the end selection (selection 2), while the third water selection is a convex which is both within a certain distance from selection 1 and selection 2.

Sorry - it's been a while since I worked on this. You're completely right - I remember now that in post-processing the hydrogen bonds to find water-bridging I also had to use three selections - for the source, sink and path. So what I suggested would probably complicate rather than simplify things.

@orbeckst
Copy link
Member

orbeckst commented Feb 5, 2021

With PR #3111 to be merged into 1.0.2-develop, this is good to go from my end.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I've not been keeping up, from a cursory look it seems fine. I'll approve since @orbeckst did / so I'm not blocking merging :)

@@ -156,6 +156,7 @@ Changes
`__call__` (Issue #2860, PR #2859)
* deprecated ``analysis.helanal`` module has been removed in favour of
``analysis.helix_analysis`` (PR #2929)
* Move water bridge analysis from hbonds to hydrogenbonds (Issue #2739 PR #2913)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically new changes go to the top of the list, but I'm willing to let that go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate your attention to detail.

I merged the whole thing nevertheless, as this has been sitting for way too long.

@orbeckst orbeckst self-assigned this Feb 5, 2021
@orbeckst
Copy link
Member

orbeckst commented Feb 5, 2021

I want to rebase all of this into a single commit (there's too much clutter in the history) but Rebase and merge will not work automatically.

@xiki-tempula please condense the history of this PR in not more than ~3 commits (or just a single one) and then force-push so that we have a clean history. I leave this to you because you know best what the end result ought to look like.

I will block until the history is cleaned up.

@orbeckst
Copy link
Member

orbeckst commented Feb 5, 2021

Actually, forget my previous comment: Squash and merge works just fine. Will do this now. Sorry for the clutter!

@orbeckst orbeckst merged commit cbda8a6 into MDAnalysis:develop Feb 5, 2021
IAlibay pushed a commit that referenced this pull request Mar 13, 2021
Related to #2913, #2739 and #2746

## Work done in this PR
  - Moves hbonds.WaterBridgeAnalysis code to hydrogenbonds.WaterBridgeAnalysis
  - Add a temporary stub which links hbonds.WaterBridgeAnalysis to hydrogenbonds.WaterBridgeAnalysis
  - Some docstring changes
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this pull request Mar 30, 2021
…s#2913)

* Move water bridge analysis from hbonds to hydrogenbonds (PR MDAnalysis#2913)
* part of MDAnalysis#2739
* deprecate hydrogen bond analysis in water bridge in PR MDAnalysis#3111 
* Update CHANGELOG

Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
@fiona-naughton fiona-naughton added maintainability deprecation Deprecated functionality to give advance warning for API changes. labels Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprecation Deprecated functionality to give advance warning for API changes. maintainability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants