Faster name selections #2755

richardjgowers · 2020-06-14T13:22:03Z

Changes made in this Pull Request:

AtomNames now use a lookup table in their implementation

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

richardjgowers · 2020-06-14T13:35:38Z

This is a different way of storing atom names in the Topology object, where instead of an array of ['N', 'C', etc] you instead have two arrays, one a set of known atom names, i.e. ['C', 'N', ...] then the atom names are stored as indices into this array.

This is then beneficial for quickly matching strings (ie names), as you match against a set of names to yield integers, then quickly match integers. There's an assumption that the number of unique atom names is very small compared to the number of atoms in the system, ie atom names are repeated often.

I did some timing with this sequence of actions:

import MDAnalysis as mda
from MDAnalysisTests.datafiles import GRO

u = mda.Universe(GRO)

names = u.atoms.names

ag = u.select_atoms('name O*')

ag.names = 'X'

Universe creation is a bit slower (600 -> 625 ms) for me.
Attribute access is identical.
select_atoms goes from 69 ms to 0.69 ms
setting atom names gets slower from 0.6ms to 1.1ms.

I think I can probably make Universe creation faster by rewriting how the Parsers are building the AtomName attribute (ie build the lookup array as I go). I think atoms are selected more often than they are set, so I'm not too concerned about the performance drop in attribute setting.

Then in a very hand wavy way, I think this will also help serialisation efforts as this is more compact to store/send.

richardjgowers · 2020-06-14T13:36:29Z

So if @MDAnalysis/coredevs could try this on their typical workloads and see if it works for them.

codecov · 2020-06-16T15:23:36Z

Codecov Report

Merging #2755 into develop will increase coverage by 0.03%.
The diff coverage is 99.50%.

@@             Coverage Diff             @@
##           develop    #2755      +/-   ##
===========================================
+ Coverage    92.97%   93.01%   +0.03%     
===========================================
  Files          187      187              
  Lines        24787    24937     +150     
  Branches      3231     3258      +27     
===========================================
+ Hits         23046    23195     +149     
- Misses        1693     1694       +1     
  Partials        48       48

Impacted Files	Coverage Δ
package/MDAnalysis/core/topologyattrs.py	`96.49% <99.26%> (+0.31%)`	⬆️
package/MDAnalysis/core/selection.py	`99.52% <100.00%> (+0.02%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca2cbe4...ee64c32. Read the comment docs.

richardjgowers · 2020-06-20T14:03:27Z

@IAlibay as a way of slowing you down from making more PRs, can you review this? Is it understandable/maintainable and does it show performance improvements on your typical systems?

IAlibay

Looks really good to me, my main worry was that selection might not work if atoms were renamed but it all works :) Would be good to add a test to check that post-rename selection still holds up (there might already be one but I couldn't find it).

See below for benchmark data, in my million atom prmtop system we get a ~ 337x speedup (select_universe("name CA")).
Universe creation is pretty much within error of what's on develop, so the gain is in my opinion worth it.

It might be good to try to expand this to resnames, as can be seen from the "select all water oxygens" case, there seems to be a high cost to matching residue names too.

Benchmark data:

On a million atom system, universe creation:

This code:
11.1 s +/- 180 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Current develop:
10.9 s +/- 157 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

For a basic "select all alpha carbons"

code used:

def select_ca(u):
    ca = u.select_atoms('name CA')

This code:
2.83 ms +/- 66.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Current develop:
956 ms +/- 12.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

For a basic "select all the water oxygens"

code used:

def select_water_oxygens(u):
    w_oh = u.select_atoms('resname WAT and name O')

This code:
1.09 s +/- 8.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Current develop:
1.98 s +/- 27.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Rename waters loop

code used:

def rename_waters(u):
    u_new = mda.Universe('large.prmtop')
    water_o = u_new.select_atoms('name O and resname WAT')
    water_h = u_new.select_atoms('name H* and resname WAT')
    water_o.atoms.names = "OW"
    water_h.atoms.names = "HW"

This code:
14.2 s +/- 439 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Current develop:
16 s +/- 281 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

IAlibay · 2020-06-21T13:41:36Z

package/MDAnalysis/core/topologyattrs.py

+
+                self.nmidx[i] = nextidx
+
+        self.name_lookup = np.array(name_lookup, dtype=object)


rather than doing the individual append operations, it might be cheaper/cleaner to just do np.array([i for i in namedict.keys()], dtype=object) here?

Yeah there's also possibly a way to exploit the guaranteed ordering of the keys that I'm not exploiting here.

@richardjgowers is this optimization something you still want to look into?

I realise in practice this isn't going to happen over many iterations, but append can be quite slow, especially if you're just doing a numpy conversion at the end anyways. If there's any way to avoid it (and make the code look cleaner at the same time), I would encourage it (same thing for the similar appends below).

I don't think that with this code you know in advance what size name_lookup will be. And finding out in advance will be relatively costly.

I benchmarked a typical system

u = mda.Universe(data.GRO_MEMPROT) %timeit build_name_index_numpy(u.atoms.names)

with len(set(u.atoms.names)) == 219 different atom names.

Just finding out how many different names there are

%timeit len(set(u.atoms.names))

costs sizable 2.3 ms (on my ageing laptop)

2.29 ms ± 39.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

How does this compare to what we have?

Current code

The current code (@richardjgowers 's version)

def build_name_index_append(vals): namedict = dict() # maps str to nmidx name_lookup = [] # maps idx to str # eg namedict['O'] = 5 & name_lookup[5] = 'O' nmidx = np.zeros_like(vals, dtype=int) # the lookup for each atom # eg Atom 5 is 'C', so nmidx[5] = 7, where name_lookup[7] = 'C' for i, val in enumerate(vals): try: nmidx[i] = namedict[val] except KeyError: nextidx = len(namedict) namedict[val] = nextidx name_lookup.append(val) nmidx[i] = nextidx name_lookup = np.array(name_lookup, dtype=object) values = name_lookup[nmidx] return namedict, nmidx, name_lookup, values

is already pretty fast

%timeit build_name_index_append(u.atoms.names)

gives the speed to beat:

9.91 ms ± 664 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Note how len(set(vals)) would already be 1/5 of the total run time?

Replace append with pre-allocated array

Following @IAlibay suggestion and replacing append with a pre-allocated array

def build_name_index_insert(vals): namedict = dict() # maps str to nmidx name_lookup = np.empty(len(set(vals)), dtype=object) # maps idx to str # eg namedict['O'] = 5 & name_lookup[5] = 'O' nmidx = np.zeros_like(vals, dtype=int) # the lookup for each atom # eg Atom 5 is 'C', so nmidx[5] = 7, where name_lookup[7] = 'C' nextidx = 0 for i, val in enumerate(vals): try: nmidx[i] = namedict[val] except KeyError: namedict[val] = nextidx name_lookup[nextidx] = val nmidx[i] = nextidx nextidx += 1 values = name_lookup[nmidx] return namedict, nmidx, name_lookup, values

gives

%timeit build_name_index_insert(u.atoms.names)

11.1 ms ± 712 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

which is a tad slower than the version with append.

So I would say unless we expect O(10^3) different string values, the append version is almost certainly ok.

NumPy

I tried out a "numpy" version

def build_name_index_numpy(vals): name_lookup = np.unique(vals) namedict = {name: idx for idx, name in enumerate(name_lookup)} nmidx = np.zeros_like(vals, dtype=int) for name in name_lookup: nmidx[vals == name] = namedict[name] values = name_lookup[nmidx] return namedict, nmidx, name_lookup, values

but that gave an abysmal

240 ms ± 34.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

which is 24 times slower than the try/except/append version. You can shave off 60 ms by replacing np.unique with name_lookup = np.array(list(set(vals))) but the >200 repeated assignments nmidx[vals == name] = namedict[name] kill performance:

name = "H1" vals = u.atoms.names %timeit nmidx[vals == name] = namedict[name]

752 µs ± 24.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Maybe if there was a way to do all assignments in one go, one would have a hope of doing better.

But even just building the dict itself

%timeit namedict = {name: idx for idx, name in enumerate(name_lookup)}

19.5 µs ± 776 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

is slower than the existing code so I think it's going to be tough to do much better – maybe someone with better numpy skills than I can see it.

So my suggestion was actually to do this:

def build_name_index_append(vals): namedict = dict() # maps str to nmidx # eg namedict['O'] = 5 & name_lookup[5] = 'O' nmidx = np.zeros_like(vals, dtype=int) # the lookup for each atom # eg Atom 5 is 'C', so nmidx[5] = 7, where name_lookup[7] = 'C' for i, val in enumerate(vals): try: nmidx[i] = namedict[val] except KeyError: nextidx = len(namedict) namedict[val] = nextidx nmidx[i] = nextidx name_lookup = np.array([i for i in namedict.keys()], dtype=object) values = name_lookup[nmidx] return namedict, nmidx, name_lookup, values

The idea here was that if you end up having to do lots of irregular append operations in the original construct, you could avoid some of the memory access overheads (plus it looks a bit nicer imho).

However, in practice that idea doesn't hold up, in part since the KeyError is triggered so infrequently, you end up with code that has within error performance (13.5 ms vs 13.6 ms in the above GRO_MEMPROT system).

Even if all 43480 elements in the name array are unique, you only save ~ 2 ms [44 ms -> 42 ms], although this is mainly an academic point as passing an array of unique names is out of scope of this function (i.e. triggering the KeyError probably becomes the biggest cost).

richardjgowers · 2020-06-21T20:00:40Z

Ok I've made all string topology attrs use this approach. I think I can improve selections like protein to leverage the new stuff too.

IAlibay · 2020-06-21T20:27:06Z

So the code adds a numpy array and a smaller (unless everything is unique) list + dictionary set per attribute. At what point do we need to consider the extra memory cost of Universe creation?

richardjgowers · 2020-06-21T20:57:01Z

The memory cost is likely almost always smaller. Yes there are more python objects, but each resname is stored as an integer (2 bytes for uint16) rather than a Python string, likely 3+? I’ll double check these assumptions though :)

…

On Sun, Jun 21, 2020 at 21:27, Irfan Alibay ***@***.***> wrote: So the code adds a numpy array and a smaller (unless everything is unique) list + dictionary set per attribute. At what point do we need to consider the extra memory cost of Universe creation? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2755 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACGSGB7MAAIAJJL2RKSFJRTRXZUKNANCNFSM4N5OMNXQ> .

IAlibay · 2020-06-22T07:23:49Z

Update on my million atom benchmark for this new patch.

Universe creation has further increased to 12.4s from 10.9s (develop).
Selecting waters has decreased to 54.2 ms from 1.98s (develop)
water renaming hasn't substantially changed (slight reduction but within error).

It's probably worth noting that I suspect that the TOPParser is particularly inefficient, so it's likely that the difference in universe creation is going to be a lot different if say a GRO file was read.

richardjgowers · 2020-06-22T10:06:20Z

@IAlibay thanks for benchmarking. I'm not shocked that this is a little slower to construct, we're constructing an index of possible types, so there's a Python loop over all atom values, for each attribute. I'll see if I can take the edge off that... Killing a 2s selection time is nice though :)

package/MDAnalysis/core/selection.py

package/MDAnalysis/core/topologyattrs.py

package/MDAnalysis/core/selection.py

orbeckst · 2020-06-30T00:29:45Z

Is the only thing left here the refactoring of the duplicated code? It would be nice to get this regression fixed sooner than later (also for PR #2798 ).

orbeckst · 2020-06-30T00:30:29Z

CHANGELOG, too.

pep8speaks · 2020-07-05T10:07:14Z

Hello @richardjgowers! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file package/MDAnalysis/core/selection.py:

Line 577:10: E225 missing whitespace around operator

In the file package/MDAnalysis/core/topologyattrs.py:

Line 479:1: W293 blank line contains whitespace
Line 527:80: E501 line too long (83 > 79 characters)
Line 1778:1: W293 blank line contains whitespace
Line 1797:51: W291 trailing whitespace
Line 1826:80: E501 line too long (83 > 79 characters)
Line 1829:51: W291 trailing whitespace
Line 2014:1: W293 blank line contains whitespace
Line 2033:51: W291 trailing whitespace
Line 2062:80: E501 line too long (83 > 79 characters)
Line 2065:51: W291 trailing whitespace
Line 2067:1: W293 blank line contains whitespace

In the file testsuite/MDAnalysisTests/core/test_atomselections.py:

Line 77:80: E501 line too long (100 > 79 characters)

In the file testsuite/MDAnalysisTests/core/test_segmentgroup.py:

Line 93:80: E501 line too long (84 > 79 characters)
Line 93:81: E231 missing whitespace after ','

Comment last updated at 2020-08-25 10:30:30 UTC

richardjgowers · 2020-07-05T10:10:13Z

package/MDAnalysis/core/selection.py

@@ -828,14 +845,20 @@ class ProteinSelection(Selection):
        'CLEU', 'CILE', 'CVAL', 'CASF', 'CASN', 'CGLN', 'CARG', 'CHID', 'CHIE',
        'CHIP', 'CTRP', 'CPHE', 'CTYR', 'CGLU', 'CASP', 'CLYS', 'CPRO', 'CCYS',
        'CCYX', 'CMET', 'CME', 'ASF',
-    ])
+    }

    def __init__(self, parser, tokens):
        pass

    def apply(self, group):


select_atoms('protein') went from 50ms to 0.5ms on GRO

IAlibay · 2020-07-05T10:34:28Z

Something is messing up the azure pipelines stuff here but it's not immediately obvious to me as to what it is.

orbeckst

Is there anything in there that would not work in the backport PR #2798 , i.e., anything that requires Python ≥ 3.6 ?

(It would be ok but it would be helpful to have a note how to backport any Py 3.6+ constructs. Fixing the performance regression on selections is the main reason why we haven’t publicly announced 1.0!)

package/CHANGELOG

orbeckst · 2020-07-13T23:58:44Z

AppVeyor fails on TestICodes.test_set_residues_singular – is this related to this PR?

package/MDAnalysis/core/topologyattrs.py

richardjgowers · 2020-07-28T18:04:43Z

@MDAnalysis/coredevs does someone fancy giving this a once over?

richardjgowers · 2020-08-04T11:26:06Z

@MDAnalysis/coredevs please

jbarnoud · 2020-08-04T15:05:14Z

package/MDAnalysis/core/selection.py

+
+    .. versionchanged:: 2.0.0
+       sug_atoms changed to set (from numpy array)
+       performance improved by ~100x on larger systems
    """
    token = 'nucleicsugar'
    sug_atoms = np.array(["C1'", "C2'", "C3'", "C4'", "O4'"])


Shouldn't this be a set?

That's what the versionchanged says....

jbarnoud · 2020-08-04T15:10:39Z

package/MDAnalysis/core/topologyattrs.py

+        name_lookup = []  # maps idx to str
+        # eg namedict['O'] = 5 & name_lookup[5] = 'O'
+
+        self.nmidx = np.zeros_like(vals, dtype=int)  # the lookup for each atom


I would not mind a more explicit name than nmidx. name_index perhaps? But there may be something even better I do not think of.

Doesnmidx need to be public or could it just be hidden behind an underscore?

Note that there are duplicates of this code in _ResidueStringAtrr and _SegmentStringAttr that would also need to be changed for consistency.

On balance I can live with nmidx in what's clearly an internal class, especially as there are comments that explain what it is.

orbeckst · 2020-08-07T23:12:52Z

ping @IAlibay for approval

ping @richardjgowers to address @jbarnoud 's comments

orbeckst

Generally looking good to me. The only real issue is the array -> set fix that @jbarnoud found.

I don't know how to reduce the duplicated code in a simple manner so I'll just be content with it.

There was one comment by @IAlibay on a potential performance improvement. I'll leave it to @richardjgowers and @IAlibay to figure that one out.

package/MDAnalysis/core/selection.py

orbeckst · 2020-08-12T18:57:14Z

package/MDAnalysis/core/topologyattrs.py

+        name_lookup = []  # maps idx to str
+        # eg namedict['O'] = 5 & name_lookup[5] = 'O'
+
+        self.nmidx = np.zeros_like(vals, dtype=int)  # the lookup for each atom


Note that there are duplicates of this code in _ResidueStringAtrr and _SegmentStringAttr that would also need to be changed for consistency.

On balance I can live with nmidx in what's clearly an internal class, especially as there are comments that explain what it is.

orbeckst · 2020-08-12T18:59:26Z

package/MDAnalysis/core/topologyattrs.py

+
+                self.nmidx[i] = nextidx
+
+        self.name_lookup = np.array(name_lookup, dtype=object)


@richardjgowers is this optimization something you still want to look into?

@jbarnoud

@jbarnoud's fix

orbeckst · 2020-08-16T06:33:05Z

@IAlibay @jbarnoud do you see any blocking issues here?

I am keen to see if it looks better in our benchmarks.

jbarnoud · 2020-08-16T07:04:05Z

I am not in a position to review in depth, thought from what I read on my phone it looks good to me. I am uneasy with the amount of code duplication, but it can be a separate pull request.

IAlibay

Overall lgtm, mainly:

docstring changes needed (we probably need a better way to capture bad versionchanged placements since it doesn't error out but just doesn't render the right way)
test for uncovered else
can we clean up the append operations (if not, nevermind, maybe some other time)?

package/MDAnalysis/core/selection.py

IAlibay · 2020-08-17T20:16:50Z

package/MDAnalysis/core/topologyattrs.py

+
+                self.nmidx[i] = nextidx
+
+        self.name_lookup = np.array(name_lookup, dtype=object)


I realise in practice this isn't going to happen over many iterations, but append can be quite slow, especially if you're just doing a numpy conversion at the end anyways. If there's any way to avoid it (and make the code look cleaner at the same time), I would encourage it (same thing for the similar appends below).

IAlibay · 2020-08-17T20:25:10Z

package/MDAnalysis/core/topologyattrs.py

+                newidx = len(self.namedict)
+                self.namedict[values] = newidx
+                newnames.append(values)
+        else:


This part of the code (i.e. the else part of the if construct) isn't being tested (at least according to codecov). Could we add at least a quick test for this? Since it's core, it's probably a bad idea to leave stuff untested.

hopefully covered by new test on setting multiple segids

Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>

orbeckst · 2020-08-25T08:50:28Z

@richardjgowers could you please have a look at @IAlibay 's comments? Then this should be finally good to go!

richardjgowers · 2020-08-25T10:42:32Z

re: avoiding the append loop. Not sure it's possible as we don't know the length beforehand. Then we definitely want it as a numpy array later as we fancy index it to death.

there's some code duplication, this is because the different versions work on atoms/residues/segments or have different name arrays (nucl_res / sug_atoms). You could make it all more generic and accessing a variable variable but I think it gets less readable as you try and play code golf with it.

IAlibay

codecov seems reasonable, lgtm!

* fix #2751 * Python 2.7/3 backport of PR #2755 (use six.str_types, update CHANGELOG and versionchanged as 1.0.1) * modified AtomNames topologyattr to include lookup table index * rework atom name selection to use lookup tables * fixed test supplying integer as atom name * Update test_topologyattrs.py * use dict-lookup string attrs EVERYWHERERE * made protein selection faster, 48ms -> 0.5ms on GRO testfile * improved nucleic/backbone selections * Added explicit tests for Resnames topologyattr tests now provide str types for resnames/icodes * use fnmatchcase to be case sensitive (this was a small unreported bug in 1.0.0: the matching was done case-insensitive) Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com> Co-authored-by: Oliver Beckstein <orbeckst@gmail.com> (cherry picked from commit 45e56e8)

* fix #2751 * Python 2.7/3 backport of PR #2755 (use six.string_types, update CHANGELOG and versionchanged as 1.0.1) * modified AtomNames topologyattr to include lookup table index * rework atom name selection to use lookup tables * fixed test supplying integer as atom name * Update test_topologyattrs.py * use dict-lookup string attrs EVERYWHERERE * made protein selection faster, 48ms -> 0.5ms on GRO testfile * improved nucleic/backbone selections * Added explicit tests for Resnames topologyattr tests now provide str types for resnames/icodes * use fnmatchcase to be case sensitive (this was a small unreported bug in 1.0.0: the matching was done case-insensitive) Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com> Co-authored-by: Oliver Beckstein <orbeckst@gmail.com> (cherry picked from commit 45e56e8)

@jbarnoud

* modified AtomNames topologyattr to include lookup table index * cheeky little optimisation * rework atom name selection to use lookup tables * Update topologyattrs.py * fixed test supplying integer as atom name really topologyattrs need to be statically typed and protective about this * Update test_topologyattrs.py * use dict-lookup string attrs EVERYWHERERE * removed some code duplication made protein selection faster, 48ms -> 0.5ms on GRO testfile * improved nucleic/backbone selections * Added explicit tests for Resnames topologyattr tests now provide str types for resnames/icodes * use fnmatchcase to be case sensitive * Update package/MDAnalysis/core/selection.py @jbarnoud's fix * apply suggestions from code review Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com> * added test for setting multiple segids at once Co-authored-by: Oliver Beckstein <orbeckst@gmail.com> Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>

richardjgowers mentioned this pull request Jun 14, 2020

simple selection by name is slow due to fnmatch #2751

Closed

tylerjereddy added Component-Core Component-Selections performance labels Jun 17, 2020

lilyminium mentioned this pull request Jun 17, 2020

Cannot select atoms with StringSelection from empty AtomGroup #2765

Closed

richardjgowers requested a review from IAlibay June 20, 2020 14:03

IAlibay requested changes Jun 21, 2020

View reviewed changes

jbarnoud reviewed Jun 22, 2020

View reviewed changes

package/MDAnalysis/core/selection.py Show resolved Hide resolved

jbarnoud reviewed Jun 22, 2020

View reviewed changes

package/MDAnalysis/core/topologyattrs.py Show resolved Hide resolved

IAlibay reviewed Jun 24, 2020

View reviewed changes

package/MDAnalysis/core/selection.py Outdated Show resolved Hide resolved

richardjgowers force-pushed the faster_name_selections branch from 81a0fb5 to 00ba0ee Compare July 5, 2020 10:07

richardjgowers commented Jul 5, 2020

View reviewed changes

richardjgowers requested review from IAlibay and jbarnoud July 5, 2020 10:10

orbeckst reviewed Jul 5, 2020

View reviewed changes

package/CHANGELOG Show resolved Hide resolved

richardjgowers force-pushed the faster_name_selections branch from 00ba0ee to 55f9a5b Compare July 11, 2020 15:39

richardjgowers commented Jul 26, 2020

View reviewed changes

package/MDAnalysis/core/topologyattrs.py Show resolved Hide resolved

Merge branch 'develop' into faster_name_selections

59f5cde

jbarnoud reviewed Aug 4, 2020

View reviewed changes

Merge branch 'develop' into faster_name_selections

498c84f

orbeckst requested changes Aug 12, 2020

View reviewed changes

orbeckst added 2 commits August 15, 2020 23:29

Update package/MDAnalysis/core/selection.py

5ae8edd

@jbarnoud's fix

Merge branch 'develop' into faster_name_selections

3cf5bcb

orbeckst approved these changes Aug 16, 2020

View reviewed changes

orbeckst added this to the 1.0.x milestone Aug 16, 2020

IAlibay requested changes Aug 17, 2020

View reviewed changes

orbeckst and others added 2 commits August 18, 2020 23:37

apply suggestions from code review

640c411

Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>

Merge branch 'develop' into faster_name_selections

7493155

added test for setting multiple segids at once

ee64c32

IAlibay approved these changes Aug 25, 2020

View reviewed changes

richardjgowers merged commit 45e56e8 into develop Aug 25, 2020

richardjgowers deleted the faster_name_selections branch August 25, 2020 13:35

orbeckst mentioned this pull request Aug 25, 2020

Backport 1.0.1 #2798

Merged

4 tasks

fiona-naughton added the enhancement label Sep 25, 2023


		self.nmidx[i] = nextidx

		self.name_lookup = np.array(name_lookup, dtype=object)

Faster name selections #2755

Faster name selections #2755

Conversation

richardjgowers commented Jun 14, 2020 • edited Loading

PR Checklist

richardjgowers commented Jun 14, 2020

richardjgowers commented Jun 14, 2020

codecov bot commented Jun 16, 2020 • edited Loading

Codecov Report

richardjgowers commented Jun 20, 2020

IAlibay left a comment • edited Loading

Choose a reason for hiding this comment

Benchmark data:

On a million atom system, universe creation:

For a basic "select all alpha carbons"

For a basic "select all the water oxygens"

Rename waters loop

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Current code

Replace append with pre-allocated array

NumPy

Choose a reason for hiding this comment

richardjgowers commented Jun 21, 2020

IAlibay commented Jun 21, 2020

richardjgowers commented Jun 21, 2020 via email

IAlibay commented Jun 22, 2020

richardjgowers commented Jun 22, 2020

orbeckst commented Jun 30, 2020

orbeckst commented Jun 30, 2020

pep8speaks commented Jul 5, 2020 • edited Loading

Comment last updated at 2020-08-25 10:30:30 UTC

Choose a reason for hiding this comment

IAlibay commented Jul 5, 2020

orbeckst left a comment

Choose a reason for hiding this comment

orbeckst commented Jul 13, 2020

richardjgowers commented Jul 28, 2020

richardjgowers commented Aug 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst commented Aug 7, 2020

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst commented Aug 16, 2020

jbarnoud commented Aug 16, 2020

IAlibay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst commented Aug 25, 2020

richardjgowers commented Aug 25, 2020

IAlibay left a comment

Choose a reason for hiding this comment

richardjgowers commented Jun 14, 2020 •

edited

Loading

codecov bot commented Jun 16, 2020 •

edited

Loading

IAlibay left a comment •

edited

Loading

pep8speaks commented Jul 5, 2020 •

edited

Loading