Iterative Average structures #2039

kain88-de · 2018-08-10T10:59:34Z

Is your feature request related to a problem? Please describe.

In a recent paper of mine, I describe an iterative algorithm to calculate an average structure from simulations. Below is the description.

As a reference structure, I used the average structure in the simulations, which for B-DNA was
obtained by RMSD aligning the inner five base pairs and averaging the simulation structures iteratively to convergence. The first frame was used for the initial alignment.

I would expect that this algorithm was already previously described. If anyone knows an earlier paper please add it to this issue so we can properly cite it. I currently do not have time to include this properly into MDAnalysis.

Note: I find this a good example to show how the changes of our analysis module in the last two years allow it to quickly write up a new analysis type that is easy to reuse and share.

Describe the solution you'd like

This is the complete implementation I used including tests. I'm posting it here should anyone be interested in including this into the align module of MDAnalysis. If anyone wants to just use this please consider to cite my paper when you do. The code can be used under the same license that MDAnalysis uses.

class AverageStructure(base.AnalysisBase):
    """Calculate an average structure after RMSD alignment to a reference."""

    def __init__(self, mobile, ref, weights=None, calc_msf=True, **kwargs):
        """init function

        Parameters
        ----------
        mobile : mda.AtomGroup
            atomgroup that is being fitted
        ref : mda.AtomGroup
            reference structure
        weights : str, array_like (optional)
            weights that can be used. If `None` use equal weights, if `'mass'`
            use masses of ref as weights or give an array of arbitrary weights.
        calc_msf : bool (optional)
            calc mean square fluctuation
        kwargs : dict
            options for AnalysisBase to adjust iteration behavior

        """
        super(AverageStructure, self).__init__(mobile.universe.trajectory,
                                               **kwargs)
        self.mobile = mobile
        self.ref_xyz = ref.positions.copy() - ref.centroid()
        self.weights = weights
        self.calc_msf = calc_msf

    def _prepare(self):
        self.structure = np.zeros(self.mobile.positions.shape)
        self.rmsd = 0
        if self.calc_msf:
            self.msf = np.zeros(self.mobile.n_atoms)

    def _single_frame(self):
        self.mobile.translate(-self.mobile.center(self.weights))
        self.rmsd += align._fit_to(
            self.mobile.positions,
            self.ref_xyz,
            self.mobile,
            weights=self.weights,
            mobile_com=np.zeros(3),
            ref_com=np.zeros(3))[1]
        self.structure += self.mobile.positions
        if self.calc_msf:
            self.msf += np.sum(
                (self.ref_xyz - self.mobile.positions)**2, axis=1)

    def _conclude(self):
        self.structure /= self.n_frames
        self.rmsd /= self.n_frames
        if self.calc_msf:
            self.msf /= self.n_frames


def iterative_average(mobile,
                      ref,
                      weights=None,
                      niter=100,
                      eps=1e-8,
                      verbose=False,
                      **kwargs):
    """Calculate an optimal reference that is also the average structure after an
    RMSD alignment. This is done with an iterative algorithm where the previous
    average structure is used as a new reference.

    Parameters
    ----------
    mobile : mda.AtomGroup
        mobile atomgroup to find the average for
    ref : mda.AtomGroup
        initial reference structure. Positions are changed by this function!
    weights : str, array_like (optional)
        weights that can be used. If `None` use equal weights, if `'mass'` use masses of ref as weights or give an array of arbitrary weights.
    niter : int (optional)
        maximum number of iterations
    eps : float (optional)
        RMSD distance at which reference and average are assumed to be equal
    verbose : bool (optional)
        verbosity
    **kwargs : dict (optional)
        AverageStructure kwargs

    Returns
    -------
    res : AverageStructure
        AverageStructure result from the last iteration

    """
    drmsd = np.inf
    for i in range(niter):
        # found a converged structure
        if drmsd < eps:
            break

        res = AverageStructure(mobile, ref, weights, **kwargs).run()
        drmsd = mda.analysis.rms.rmsd(ref.positions, res.structure, weights=weights)
        ref.positions = res.structure

        if verbose:
            print("i = {}, rmsd-change = {:.2f}, ave-rmsd = {:.2f}, ave-msf = {:.2f}".
                  format(i, drmsd, res.rmsd, res.msf.mean()))

    return res

Some reference tests I have. They only check for regressions. Correctness was only validated visually so don't put too much trust into these tests.

from __future__ import absolute_import

import MDAnalysis as mda
from MDAnalysisTests.datafiles import PSF, DCD

import pytest
from numpy.testing import assert_array_almost_equal

# module contains iterative average implementation
import metric


@pytest.fixture
def mobile():
    u = mda.Universe(PSF, DCD)
    u.transfer_to_memory()
    return u.select_atoms('bynum 1:10')


@pytest.fixture
def reference():
    u = mda.Universe(PSF, DCD)
    return u.select_atoms('bynum 1:10')


# Pure regression test
def test_AverageStructure(mobile, reference):
    res = metric.AverageStructure(mobile, reference, stop=10).run()
    assert_array_almost_equal(res.structure,
                              [[0.54083995, -0.42515443, -1.06207113],
                               [1.07435689, -1.16633342, -1.31354781],
                               [1.03657782, 0.31960404, -1.12080256],
                               [-0.15218723, -0.3288244, -1.698552],
                               [0.08886168, -0.68049069, 0.28910915],
                               [0.92952701, -0.77678878, 0.83458763],
                               [-0.7420783, 0.4927999, 0.85065659],
                               [-1.48584671, 0.17381404, 1.22256903],
                               [-1.06299627, 1.13438787, 0.23116275],
                               [-0.22705485, 1.25698588, 1.76688837]], decimal=5)


# Pure regression test
def test_iterative_average(mobile, reference):
    res = metric.iterative_average(mobile, reference, stop=10)
    assert_array_almost_equal(res.structure,
                              [[0.54418358, -0.42490685, -1.06077607],
                               [1.08109621, -1.18605215, -1.30346445],
                               [1.05210044, 0.32313582, -1.11259767],
                               [-0.16157775, -0.3265537, -1.70656559],
                               [0.08676427, -0.66799566, 0.28346055],
                               [0.91055919, -0.76166723, 0.8188435],
                               [-0.74196764, 0.49133729, 0.85177611],
                               [-1.48156064, 0.15684868, 1.23166217],
                               [-1.0767725, 1.13997174, 0.22508162],
                               [-0.21282517, 1.25588206, 1.77257982]], decimal=5)

edit kain88-de: Removed class unrelated to this feature from example code.

orbeckst · 2018-08-10T17:52:25Z

The paper can be added to the class as well, using duecredit and its DOI 10.1021/acs.jpcb.7b11988.

* added align.AverageStructure * added more tests and example * related to #2039

kain88-de · 2020-01-22T12:53:17Z

with the implementation of #2430 the iterative algorithm can be implemented using only

def iterative_average(mobile,
                      ref,
                      weights=None,
                      niter=100,
                      eps=1e-8,
                      verbose=False,
                      **kwargs):
    """Calculate an optimal reference that is also the average structure after an
    RMSD alignment. The optional reference is defined as average structure of a trajectory, with the optimal reference used as input. This function computes the optimal reference by using a starting reference for the average structure, which is used to calculate the average structure again. This is repeated until the reference structure has converged.

    Parameters
    ----------
    mobile : mda.AtomGroup
        mobile atomgroup to find the average for
    ref : mda.AtomGroup
        initial reference structure. Positions are changed by this function!
    weights : str, array_like (optional)
        weights that can be used. If `None` use equal weights, if `'mass'` use masses of ref as weights or give an array of arbitrary weights.
    niter : int (optional)
        maximum number of iterations
    eps : float (optional)
        RMSD distance at which reference and average are assumed to be equal
    verbose : bool (optional)
        verbosity
    **kwargs : dict (optional)
        AverageStructure kwargs

    Returns
    -------
    res : AverageStructure
        AverageStructure result from the last iteration

    """
    drmsd = np.inf
    for i in range(niter):
        # found a converged structure
        if drmsd < eps:
            break

        res = AverageStructure(mobile, ref, weights, **kwargs).run()
        drmsd = mda.analysis.rms.rmsd(ref.positions, res.structure, weights=weights)
        ref.positions = res.positions

        if verbose:
            print("i = {}, rmsd-change = {:.2f}, ave-rmsd = {:.2f}, ave-msf = {:.2f}".
                  format(i, drmsd, res.rmsd, res.msf.mean()))

    return res

Neel-Shah-29 · 2022-03-14T18:08:47Z

Hello , I would like to take up this issue, Kindly guide me through what needs to be done in this issue.
As per the issue, Do we need to add the iterative approach stated above to be implemented in the codebase?

orbeckst · 2022-03-17T01:21:08Z

@Neel-Shah-29 , you'd add the iterative_average() function to the MDAnalysis.analysis.rms module. You'd add tests for the code that you add to the analysis/test_rms.py file. You'd add documentation for the new functionality.

There hasn't been a lot of discussion on this issue so it's quite possible that reviewers of your code might say that we want this functionality to follow the standard schema for analysis tools, namely being a class derived from AnalysisBase (see https://docs.mdanalysis.org/stable/documentation_pages/analysis/base.html).

You can start with a PR where you add the code provided by @kain88-de to the existing code base and make the new tests work. We'll then use your PR for further discussions.

Please also introduce yourself on the developer list with your GitHub handle (if you haven't done so already).

kain88-de added enhancement help wanted Component-Analysis Difficulty-easy GSOC Starter labels Aug 10, 2018

lilyminium mentioned this issue Dec 26, 2019

align.AverageStructure for out-of-memory average structures #2430

Merged

4 tasks

orbeckst pushed a commit that referenced this issue Jan 20, 2020

align.AverageStructure for out-of-memory average structures (#2430)

72ce35a

* added align.AverageStructure * added more tests and example * related to #2039

leonwehrhan mentioned this issue Mar 21, 2024

Implement iterative average structure (Issue#2039) #4524

Merged

5 tasks

orbeckst closed this as completed in #4524 Mar 29, 2024

orbeckst closed this as completed in ff222fe Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterative Average structures #2039

Iterative Average structures #2039

kain88-de commented Aug 10, 2018 •

edited

Loading

orbeckst commented Aug 10, 2018

kain88-de commented Jan 22, 2020

Neel-Shah-29 commented Mar 14, 2022

orbeckst commented Mar 17, 2022

Iterative Average structures #2039

Iterative Average structures #2039

Comments

kain88-de commented Aug 10, 2018 • edited Loading

orbeckst commented Aug 10, 2018

kain88-de commented Jan 22, 2020

Neel-Shah-29 commented Mar 14, 2022

orbeckst commented Mar 17, 2022

kain88-de commented Aug 10, 2018 •

edited

Loading