Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pbc dist bench #3475

Open
wants to merge 25 commits into
base: develop
Choose a base branch
from
Open

Conversation

Miro-Astore
Copy link

@Miro-Astore Miro-Astore commented Dec 6, 2021

Fixes #3206 and #2794

Rewrote the whole benchmark since last pull request so made a new one.

Changes made in this Pull Request:

Added, for relevant functions, benchmarks for testing orthogonal and triclinic
PBC settings
Rewrote all distance benchmarks to use a universe of randomly generated
atoms. This lets the benchmarks scale to arbitrary sizes.

It also makes realistic atom densities which are important for the contact
matrix distance functions. If the universe is too dense then a matrix with
too many connections is made and the benchmark will time out. As is the
10 000 atom option requires a modification to timeout for asv to run the
benchmark.

I had to split the DistancesBench class into 2 classes to avoid
uncessary reruns of the benchmars which do not take pbc type as an argument. Hence I
created BetweenBench.

The 2 classes and the distribution of testing functions is

BetweenBench

time_between

DistancesBench
time_distance_array
time_distance_array_pre_allocated
time_self_distance_array
time_self_distance_array_pre_allocated
time_dist
time_dist_offsets
time_contact_matrix
time_contact_matrix_sparse

Splitting the classes like this was necessary because asv does not
allow us to easily skip certain combinations of parameters at the
function level, only in the setup()
function of the class.

…enchmarks

The benchmarked distances functions now all test 3 PBC conditions:
None, triclinic, orthogonal.

Separated matrix contacts functions and simple distance functions into
DistanceBench and ContactsBench classes to deal with time complexity issues.
Added, for relevant functions, benchmarks for testing orthogonal PBC
settings and also removed 10000 atom benchmarks for contact matrix
functions to avoid timeout failiures.

Had to split the DistancesBench class into 3 classes to avoid
failiures and uncessary reruns of the benchmark. These failiures
were due to time outs when too many atoms were run in the
benchmarks or distances.betweeen not taking box type as an argument.
The latter case meant, to avoid failiure, we had to either split
the classes out or run redundant benchmarks.

The 3 classes and the distribution of testing functions is

BetweenBench
    time_between
DistancesBench
    time_distance_array
    time_distance_array_pre_allocated
    time_self_distance_array
    time_self_distance_array_pre_allocated
    time_dist
    time_dist_offsets
ContactsBench
    time_contact_matrix
    time_contact_matrix_sparse

Splitting the classes like this was necessary because asv does not
allow us to easily skip certain combinations of parameters at the
function level, only in the setup()
function of the class.
Took out redundant definitions of universe at the class level and
innapropriate raising of Exceptions.
@pep8speaks
Copy link

pep8speaks commented Dec 6, 2021

Hello @Miro-Astore! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 206:5: E303 too many blank lines (2)
Line 344:5: E303 too many blank lines (2)
Line 355:5: E303 too many blank lines (2)
Line 375:1: W391 blank line at end of file

Comment last updated at 2023-10-27 23:16:00 UTC

@codecov
Copy link

codecov bot commented Dec 6, 2021

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (2197a2c) 93.40% compared to head (3d8be13) 93.37%.
Report is 27 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3475      +/-   ##
===========================================
- Coverage    93.40%   93.37%   -0.04%     
===========================================
  Files          170      184      +14     
  Lines        22255    23403    +1148     
  Branches      4071     4075       +4     
===========================================
+ Hits         20788    21853    +1065     
- Misses         951     1035      +84     
+ Partials       516      515       -1     

see 20 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Miro-Astore
Copy link
Author

Hi, just wondering if there's anything that still needs fixing for this pull request to be merged.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand your changes, you changed the underlying data for DistanceBench — correct me if I am wrong. My concern with changing a benchmark case but keeping the name the same will break the history so looking at the time series of performances may now show an artificial increase or decrease in performance that has nothing to do with the code itself.

My opinion is that we should

  • keep old benchmarks as they are or retire them
  • add new benchmarks with new names

@tylerjereddy @hmacdope @IAlibay @richardjgowers if you have different opinions I am all ears. This looks like something we should be deciding now.

@orbeckst
Copy link
Member

Sorry @Miro-Astore for the delay in reviewing. I think we have to first decide how we want to progress in cases like this one where the benchmarks themselves are changed.

@Miro-Astore
Copy link
Author

No problem. I will be happy to adapt the code to whatever framework the development team decide on.

@orbeckst
Copy link
Member

After a quick discussion on the developer list there are at least two developers (@richardjgowers and I) who think that we need to do what I said in my comment:

  • keep old benchmarks as they are
  • create new ones for anything new

Can you please change your PR in this way, @Miro-Astore ? Thank you very much! (I do look forward to seeing the results!)

@orbeckst
Copy link
Member

Please ping me when you need me to review again, I try to do it reasonably quickly.

@orbeckst
Copy link
Member

@hmacdope if you have time to look at any updates then that would also be very helpful.

@hmacdope
Copy link
Member

@hmacdope if you have time to look at any updates then that would also be very helpful.

I'll get onto it asap

@hmacdope
Copy link
Member

@orbeckst @Miro-Astore agree that we probably need to keep original benchmark names as otherwise performance history is not really linear.

@Miro-Astore
Copy link
Author

No problem. I'll get to it after the BPS meeting. If any of you are here I'll be sure to say hi!

Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good in general , just a few things.

Also worth considering is having a benchmark with a significant portion of the coordinates outside the primary cell.

I could be wrong but for triclinic boxes I think this is slightly more expensive due to need to call _triclinic_pbc which has heaps of branching logic. For orthorhombic boxes I am fairly sure all the code paths are the same.

…means that we will be running time_between 3 times instead of 1, without any difference because it doesn't take PBC types.
@Miro-Astore
Copy link
Author

Miro-Astore commented Mar 6, 2022

I think my latest commits need a bit more organisation sorry. I'll ping you when they're ready for reveiw.

Also worth considering is having a benchmark with a significant portion of the coordinates outside the primary cell.

Just a small note that I don't think this is a good idea because the original reason behind the bug in #2794 was because contact matrix calculations made massive matrices which cause MemoryErrors when lots of atoms are very close.

If we make atoms outside the box it could cause the same error because then they are wrapped into the smaller box they would closely overlap. I'll try and test it out at some point.

@hmacdope
Copy link
Member

@Miro-Astore the diff seems enormous for some reason?

@Miro-Astore
Copy link
Author

Miro-Astore commented Jun 14, 2023

Sorry about that, I think I've done something silly and merged the current development branch because my current one was so old. It seems to have created a problem. Can you just look at the benchmarks/benchmarks/analysis/distances.py script or would you like me to try and fix the weird diffs ?

@hmacdope
Copy link
Member

You will need to sort out diffs before merge is possible. @MDAnalysis/coredevs I forget what the fix is here.

@orbeckst
Copy link
Member

orbeckst commented Jun 14, 2023

Yes, please fix the patch. You also have conflicts. I'd go back to 087cb46. Then I'd try merging from GitHub. Once the merge is done cleanly, you can put the changes from 1eb6bdd back.

(Perhaps you can try a rebase ... but be careful with any history rewriting. If in doubt, ask first.)

@Miro-Astore
Copy link
Author

Hi, sorry for the messy commit history. I think this is now ready?

@github-actions
Copy link

github-actions bot commented Jun 15, 2023

Linter Bot Results:

Hi @Miro-Astore! Thanks for making this PR. We linted your code and found the following:

There are currently no issues detected! 🎉

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very brief/terse review

  • The original DistancesBench needs to have the old data generation. If there's a need for a version with the new realistic density then make a copy.
  • You removed the time_calc_* methods. They should be added back, at least to DistancesBench.
  • Add .gitignore back.

.gitignore Outdated
@@ -1,54 +0,0 @@
# Ignore python bytecoded files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't delete our .gitignore

def setup(self, num_atoms, pbc_type):

# This setup function is virtually identical to the one
# in the DistancesBench, we just have to do some extra
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DistancesBench2

box=None)
class DistancesBenchPBC(object):
"""Benchmarks for MDAnalysis.analysis.distances
functions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with PBC

# calculating reasonable density for an atomistic system.
ex_universe = MDAnalysis.Universe(GRO)
density = ex_universe.coord.volume/ex_universe.atoms.n_atoms
desired_box_volume = universe_num_atoms*density
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space around operators

Comment on lines +198 to +199
sides = [float(cube_side_length)]*3
ortho_angles = [float(90)]*3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space around operators

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 8 — space around operators

@Miro-Astore
Copy link
Author

Miro-Astore commented Oct 28, 2023

Hi sorry this is taking so long, and for all the trouble. I think I'm ready for another round of reviews.

(accidentally made a giant commit again but i've reset it. Ping me if there's an issue. ) @orbeckst

@Miro-Astore
Copy link
Author

Hi sorry this is taking so long, and for all the trouble. I think I'm ready for another round of reviews.

(accidentally made a giant commit again but i've reset it. Ping me if there's an issue. ) @orbeckst

bump

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good; I only have one question (see comments) and please do some PEP8 formatting. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't delete lines from the file. They might be there for a reason.

If some clean-up is needed, do a separate mini PR.

memory implementation of contact_matrix intended
for larger systems.
"""
distances.contact_matrix(coord=self.ag1.positions,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you briefly explain how your changes address #2794 ? Does the numatom == 10000 case not time out anymore?

Perhaps we can just skip this case

from asv_runner.benchmarks.mark import skip_for_params

@skip_for_params([(10000, 'numatoms')])
def time_contact_matrix_sparse(self, num_atoms):
   ...

Comment on lines +39 to +40
sides = [float(cube_side_length)]*3
ortho_angles = [float(90)]*3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 8 — space around operators


# Make fake universe with 3*num_atoms because some of our benchmarks
# require 3 atom groups.
universe_num_atoms = num_atoms*3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 8 — space around operators

Comment on lines +35 to +36
density = ex_universe.coord.volume/ex_universe.atoms.n_atoms
desired_box_volume = universe_num_atoms*density
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 8 — space around operators


# Make fake universe with 3*num_atoms because some of our benchmarks
# require 3 atom groups.
universe_num_atoms = num_atoms*3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 8 — space around operators

Comment on lines +222 to +223
density = ex_universe.coord.volume/ex_universe.atoms.n_atoms
desired_box_volume = universe_num_atoms*density
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 8 — space around operators

Comment on lines +198 to +199
sides = [float(cube_side_length)]*3
ortho_angles = [float(90)]*3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 8 — space around operators

@orbeckst orbeckst removed the close? Evaluate if issue/PR is stale and can be closed. label Nov 21, 2023
@orbeckst orbeckst self-assigned this Nov 21, 2023
@orbeckst
Copy link
Member

@hmacdope can you also have a look if your previous concerns have been addressed?

@Miro-Astore
Copy link
Author

@hmacdope
if i fix up the syntax errors is this ready to roll?

@orbeckst
Copy link
Member

@Miro-Astore sorry ... hadn't seen you question.

If you address my comments then we should be good to go. Please ping me when you're done! Thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Distance benchmarks should cover all box types
6 participants