-
Notifications
You must be signed in to change notification settings - Fork 678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pbc dist bench #3475
base: develop
Are you sure you want to change the base?
Pbc dist bench #3475
Conversation
…enchmarks The benchmarked distances functions now all test 3 PBC conditions: None, triclinic, orthogonal. Separated matrix contacts functions and simple distance functions into DistanceBench and ContactsBench classes to deal with time complexity issues.
Added, for relevant functions, benchmarks for testing orthogonal PBC settings and also removed 10000 atom benchmarks for contact matrix functions to avoid timeout failiures. Had to split the DistancesBench class into 3 classes to avoid failiures and uncessary reruns of the benchmark. These failiures were due to time outs when too many atoms were run in the benchmarks or distances.betweeen not taking box type as an argument. The latter case meant, to avoid failiure, we had to either split the classes out or run redundant benchmarks. The 3 classes and the distribution of testing functions is BetweenBench time_between DistancesBench time_distance_array time_distance_array_pre_allocated time_self_distance_array time_self_distance_array_pre_allocated time_dist time_dist_offsets ContactsBench time_contact_matrix time_contact_matrix_sparse Splitting the classes like this was necessary because asv does not allow us to easily skip certain combinations of parameters at the function level, only in the setup() function of the class.
… 3.7, changed back to 2.7
Took out redundant definitions of universe at the class level and innapropriate raising of Exceptions.
…ix timer with timeout
…rse contact matrix class
Hello @Miro-Astore! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2023-10-27 23:16:00 UTC |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #3475 +/- ##
===========================================
- Coverage 93.40% 93.37% -0.04%
===========================================
Files 170 184 +14
Lines 22255 23403 +1148
Branches 4071 4075 +4
===========================================
+ Hits 20788 21853 +1065
- Misses 951 1035 +84
+ Partials 516 515 -1 ☔ View full report in Codecov by Sentry. |
Hi, just wondering if there's anything that still needs fixing for this pull request to be merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand your changes, you changed the underlying data for DistanceBench — correct me if I am wrong. My concern with changing a benchmark case but keeping the name the same will break the history so looking at the time series of performances may now show an artificial increase or decrease in performance that has nothing to do with the code itself.
My opinion is that we should
- keep old benchmarks as they are or retire them
- add new benchmarks with new names
@tylerjereddy @hmacdope @IAlibay @richardjgowers if you have different opinions I am all ears. This looks like something we should be deciding now.
Sorry @Miro-Astore for the delay in reviewing. I think we have to first decide how we want to progress in cases like this one where the benchmarks themselves are changed. |
No problem. I will be happy to adapt the code to whatever framework the development team decide on. |
After a quick discussion on the developer list there are at least two developers (@richardjgowers and I) who think that we need to do what I said in my comment:
Can you please change your PR in this way, @Miro-Astore ? Thank you very much! (I do look forward to seeing the results!) |
Please ping me when you need me to review again, I try to do it reasonably quickly. |
@hmacdope if you have time to look at any updates then that would also be very helpful. |
I'll get onto it asap |
@orbeckst @Miro-Astore agree that we probably need to keep original benchmark names as otherwise performance history is not really linear. |
No problem. I'll get to it after the BPS meeting. If any of you are here I'll be sure to say hi! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good in general , just a few things.
Also worth considering is having a benchmark with a significant portion of the coordinates outside the primary cell.
I could be wrong but for triclinic boxes I think this is slightly more expensive due to need to call _triclinic_pbc
which has heaps of branching logic. For orthorhombic boxes I am fairly sure all the code paths are the same.
…means that we will be running time_between 3 times instead of 1, without any difference because it doesn't take PBC types.
I think my latest commits need a bit more organisation sorry. I'll ping you when they're ready for reveiw.
Just a small note that I don't think this is a good idea because the original reason behind the bug in #2794 was because contact matrix calculations made massive matrices which cause MemoryErrors when lots of atoms are very close. If we make atoms outside the box it could cause the same error because then they are wrapped into the smaller box they would closely overlap. I'll try and test it out at some point. |
@Miro-Astore the diff seems enormous for some reason? |
Sorry about that, I think I've done something silly and merged the current development branch because my current one was so old. It seems to have created a problem. Can you just look at the |
You will need to sort out diffs before merge is possible. @MDAnalysis/coredevs I forget what the fix is here. |
…eenBench class." This reverts commit 3261f78.
Hi, sorry for the messy commit history. I think this is now ready? |
Linter Bot Results:Hi @Miro-Astore! Thanks for making this PR. We linted your code and found the following: There are currently no issues detected! 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very brief/terse review
- The original DistancesBench needs to have the old data generation. If there's a need for a version with the new realistic density then make a copy.
- You removed the time_calc_* methods. They should be added back, at least to DistancesBench.
- Add .gitignore back.
.gitignore
Outdated
@@ -1,54 +0,0 @@ | |||
# Ignore python bytecoded files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't delete our .gitignore
def setup(self, num_atoms, pbc_type): | ||
|
||
# This setup function is virtually identical to the one | ||
# in the DistancesBench, we just have to do some extra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DistancesBench2
box=None) | ||
class DistancesBenchPBC(object): | ||
"""Benchmarks for MDAnalysis.analysis.distances | ||
functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with PBC
# calculating reasonable density for an atomistic system. | ||
ex_universe = MDAnalysis.Universe(GRO) | ||
density = ex_universe.coord.volume/ex_universe.atoms.n_atoms | ||
desired_box_volume = universe_num_atoms*density |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space around operators
sides = [float(cube_side_length)]*3 | ||
ortho_angles = [float(90)]*3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space around operators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP 8 — space around operators
Hi sorry this is taking so long, and for all the trouble. I think I'm ready for another round of reviews. (accidentally made a giant commit again but i've reset it. Ping me if there's an issue. ) @orbeckst |
bump |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good; I only have one question (see comments) and please do some PEP8 formatting. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't delete lines from the file. They might be there for a reason.
If some clean-up is needed, do a separate mini PR.
memory implementation of contact_matrix intended | ||
for larger systems. | ||
""" | ||
distances.contact_matrix(coord=self.ag1.positions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you briefly explain how your changes address #2794 ? Does the numatom == 10000
case not time out anymore?
Perhaps we can just skip this case
from asv_runner.benchmarks.mark import skip_for_params
@skip_for_params([(10000, 'numatoms')])
def time_contact_matrix_sparse(self, num_atoms):
...
sides = [float(cube_side_length)]*3 | ||
ortho_angles = [float(90)]*3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP 8 — space around operators
|
||
# Make fake universe with 3*num_atoms because some of our benchmarks | ||
# require 3 atom groups. | ||
universe_num_atoms = num_atoms*3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP 8 — space around operators
density = ex_universe.coord.volume/ex_universe.atoms.n_atoms | ||
desired_box_volume = universe_num_atoms*density |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP 8 — space around operators
|
||
# Make fake universe with 3*num_atoms because some of our benchmarks | ||
# require 3 atom groups. | ||
universe_num_atoms = num_atoms*3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP 8 — space around operators
density = ex_universe.coord.volume/ex_universe.atoms.n_atoms | ||
desired_box_volume = universe_num_atoms*density |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP 8 — space around operators
sides = [float(cube_side_length)]*3 | ||
ortho_angles = [float(90)]*3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP 8 — space around operators
@hmacdope can you also have a look if your previous concerns have been addressed? |
@hmacdope |
@Miro-Astore sorry ... hadn't seen you question. If you address my comments then we should be good to go. Please ping me when you're done! Thank you!! |
Fixes #3206 and #2794
Rewrote the whole benchmark since last pull request so made a new one.
Changes made in this Pull Request:
Added, for relevant functions, benchmarks for testing orthogonal and triclinic
PBC settings
Rewrote all distance benchmarks to use a universe of randomly generated
atoms. This lets the benchmarks scale to arbitrary sizes.
It also makes realistic atom densities which are important for the contact
matrix distance functions. If the universe is too dense then a matrix with
too many connections is made and the benchmark will time out. As is the
10 000 atom option requires a modification to timeout for asv to run the
benchmark.
I had to split the DistancesBench class into 2 classes to avoid
uncessary reruns of the benchmars which do not take pbc type as an argument. Hence I
created BetweenBench.
The 2 classes and the distribution of testing functions is
BetweenBench
DistancesBench
time_distance_array
time_distance_array_pre_allocated
time_self_distance_array
time_self_distance_array_pre_allocated
time_dist
time_dist_offsets
time_contact_matrix
time_contact_matrix_sparse
Splitting the classes like this was necessary because asv does not
allow us to easily skip certain combinations of parameters at the
function level, only in the setup()
function of the class.