-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GitHub Action + atime test to observe the performance regression introduced by PR #4491 and fixed by PR #5463 #2
Conversation
Generated via commit 9b08cee Download link for the artifact containing the test results: ↓ atime-results.zip Time taken to finish the standard R installation steps: 11 minutes and 28 seconds Time taken to run |
this is a good start. |
Also can you please change the PR title to be consistent with the plot title? (introduced by PR#4491 and fixed by PR#5463) |
Done! (please check)
Now that I think of it, going with the original title might be appropriate since I'm only focusing on the merge commit of Rdatatable#4491 and its parent, or the point where the regression happened and before it was introduced respectively. Not the fix or rollback from Rdatatable#5463, as the other way to go about this would be to replicate the PR that fixed the regression (as opposed to the one which introduced it, both being different PRs), or to make a PR from a branch with the merge commit of #5463 to a branch with a commit before it got merged. (For example, I did that for #4440 here) Question - How should I name the titles of test cases if more than one test case exists for a PR that got merged? |
This comment was marked as outdated.
This comment was marked as outdated.
also "Time taken to install atime and run atime_pkg on the test(s): 17 minutes and 18 seconds" is good but can you please break that down into at least two steps? 1. time to install/setup, 2. time to run performance tests. devs I think will like to know how many minutes we are using just for the testing, beyond what is required by any CI job to setup R. |
… step runtime into installation and test time)
|
The first comment in this PR is "To reproduce on another fork,..." which is helpful for reproducibility, but it would be even better to start with a couple of sentences which explain the intent / big picture of the PR (demonstration of new performance testing software, trying to reproduce a historical regression, and show that we can detect and avoid). |
Done in terms of an error - previously, 'Before' was in between the 'Regression' and 'Fix' commit points, but now it is aligned with 'Regression' which I think is correct since Rdatatable#4440 is the PR that fixes the regression. So 'Before' should not be fast, and only the 'Fix' in this case should be fast as we are seeing.
I was in the process of testing things and working on a fix for that (just done), but I'm not sure why this happened since I was using the same logic to extract both this and the time taken to run the tests, and then the latter worked but this didn't (?) for some reason.
I got it done after some trial and error - please check! And I agree that it would be better, as it's also something you might not get directly by looking at the job (only the time taken to execute each step and the workflow as a whole is openly shown). That being said, installing
Yup, I used a similar approach before (for reference, here is the workflow file I was modifying), and currently too I'm saving to a file although that is for timing the tests.
I had the exact same thoughts as well when writing that comment initially but the reason why I didn't include the full context here is because I'll be making an issue in Please let me know If the current state of things in the PR looks reasonable/explainable to you. (If so, I'll post the issue tomorrow morning) |
hi again times look good/useful. |
7d09869
to
f49bee3
Compare
Thanks for clarifying!
It could've been a PR or a direct commit but the source hasn't been identified in the issue that discovered or reported the regression (Rdatatable#4311). However, I see one point of reference - Jan mentioned that the regression was already present at 1.12.8 in that thread, so I'm thinking to try commits from older tags. |
… data.table before the regression that got reported in Rdatatable#4311 was introduced
if there is no older fast historical reference commit known, then just use the other two commits, but call them Fast and Slow (instead of Fixed and Regression). |
a9e529f
to
74ed5e9
Compare
…state of data.table before that regression was introduced are not working.
05743ad
to
988f918
Compare
Time taken to run atime::atime_pkg on the tests: -28536194 minutes and -50 seconds ???? |
… updated workflow)
I'm in the process of implementing the changes we discussed yesterday and testing them so please check back later. (I'll ping you here once I'm done so you can review then)
That was an undefined value case (used an older variable to compute the timings) and I fixed it in the last change to my workflow (also added a check for negatives in the simple shell function I wrote to avoid repeating the arithmetic conversion from the total second count to minutes and seconds). |
…emoving Before made the Regression/Fixed lines change positions)
also I see below which seems reasonable with respect to timings of recent GHA R CMD check runs Time taken to finish the standard R installation steps: 12 minutes and 3 seconds Time taken to run atime::atime_pkg on the tests: 4 minutes and 9 seconds seems like "standard R installation" is about twice as long (10 min) as R CMD check workflow (5 min), why is that? |
The third test (Test regression fixed in Rdatatable#4440) doesn't seem to be affected by this PR to me. The only thing that is slower and different is the 'Regression' label there, which is associated with the commit that faces performance regression for that test - like that should always be slower in that plot or for that test case and also in the same spot since it's a fixed commit SHA right? (as is the case with the other two labels) I'm not sure about what's going on with the second one (Test regression fixed in Rdatatable#4558) though. Also, another thing that confused me earlier is that the 'Regression' and 'Fixed' labels (or just two labels for a test case with any names) move places when 'Before' (or a third label) is not specified. For e.g., I got this when I didn't specify 'Before' for the first test case: ![]() After including a commit for 'Before' again, the other two labels switched to their original positions (or like how it is currently). Do you know why this happens? (since this makes it look like specifying a third label would be necessary) |
I'm not sure why you are comparing to an R CMD check, but the bulk of that time has to do with installing ![]() (Here is the link to that run for reference) Btw, I termed it 'standard R installation steps' since you mentioned that here:
Please let me know if you want to change the wording in case it does not accurately represent the duration that we are measuring here. |
the labels have the same order as the mean of the last/largest N data points.
because that is what data.table is running after every push already, so that is the standard of "acceptable" time usage (we don't want to go too much over that) |
Looks like the 15 minutes comes from installing all of the R package deps of atime, (like ggplot2 + tidy deps, there are a lot). |
f38000e
to
eac907b
Compare
This sounds like a good idea. I would have to extend the container I'm using, doing something like: FROM ghcr.io/iterative/cml:0-dvc2-base1
# Installing the libgit2 system dependency for git2r and reducing the image size by removing package lists post-installation:
RUN apt-get update && \
apt-get install -y libgit2-dev && \
rm -rf /var/lib/apt/lists/*
# Installing atime dependencies:
RUN R -e "install.packages('atime', dependencies = TRUE, repos = 'https://cloud.r-project.org')" |
… of .Rprofile, it is time to test with the CRAN mirror being only set therein [Reverted the last change of removing the git switch step; see Anirban166/Autocomment-atime-results#33 (comment) for more details]
f42185a
to
f2ba635
Compare
Why? I would suggest the opposite, please include as many performance tests as possible. |
Because the difference between them is clear: For 'Test regression introduced from #4491' you can see that base, merge-base, CRAN, Fixed, and Before are fast while Regression and HEAD are slow for this PR that replicates the regression introducing PR, and in #3 where it should not be affected, all commits are fast with the expected exception of Regression. Similarly for 'Test regression fixed in #4440', you can see in this PR that it is the reverse case with all commits being fast except the one associated with Regression (as expected since this should not affect that test case), and in #3 where it replicates the regression fixing PR, you can see Fixed, HEAD, CRAN, and Before as fast while Regression, merge-base, and base are slow as we would expect. Both of them make perfect sense to me, but 'Test regression fixed in #4558' doesn't in relation, especially in terms of the commit associated with 'Regression'. Also, I'll be sharing my work today with the |
ok sure if it would be easier to follow in a first communication that is fine. |
… the remaining two cases
Yes but like I mentioned before, I'm thinking of placing all that information in the issue itself, and avoid repeating myself in the PR (these PRs will be a link in the issue, and ideally they would want to visit for the plot but read everything in one place or in the issue itself first).
Can I just use the existing ones? And yup, I'll write all of that and try to incorporate as much detail as I can, but in my issue instead of the PR(s). |
Since I didn't hear back from you on this I presume you want me to make new PRs so I will go ahead with that - onto creating two new PRs now. I do prefer going for new ones in the sense that they'll be minimal (like without our discussion here), and now that we know they're working it'll be less of a 'test' (which I used in the title here, and then this was more of a work in progress with feedback cycles than a working demo).
I'm supposing that you meant PRs here or in my fork of I've decided upon dividing things to mention in those historical performance regression mirroring PRs and the PR/issue I'll post to data.table. The former would include details based on the PR context (like as you mentioned 'what historical issue this PR is trying to replicate'), and the latter would include other details (such as my testing procedure, motivation, and time-related notes as you mentioned). I'll be posting the issue (or sending the PR directly) on Monday morning since I won't be available over the weekend to comment/respond to any feedback I get from the community. (I'll be spending some time over the weekend to finish constructing/adding information to write for that issue/PR though) |
To reproduce on another fork, one can use the
git
commands I used locally on my clone of this repository:Then place my workflow under
.github/workflows
for thebefore-4491-got-merged
branch, and include the tests underinst/atime
for theafter-4491-got-merged
branch like I did here.