Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub Action + atime test to observe the performance regression introduced by PR #4491 and fixed by PR #5463 #2

Merged
merged 16 commits into from
Apr 6, 2024

Conversation

Anirban166
Copy link
Owner

To reproduce on another fork, one can use the git commands I used locally on my clone of this repository:

git checkout -b before-4491-got-merged
git reset 3fa8b20
git push --set-upstream origin before-4491-got-merged
git switch -f master
git checkout -b after-4491-got-merged
git reset 85adf09
git push --set-upstream origin after-4491-got-merged

Then place my workflow under .github/workflows for the before-4491-got-merged branch, and include the tests under inst/atime for the after-4491-got-merged branch like I did here.

Copy link

github-actions bot commented Mar 14, 2024

Comparison Plot

Generated via commit 9b08cee

Download link for the artifact containing the test results: ↓ atime-results.zip

Time taken to finish the standard R installation steps: 11 minutes and 28 seconds

Time taken to run atime::atime_pkg on the tests: 3 minutes and 6 seconds

@tdhock
Copy link

tdhock commented Mar 14, 2024

this is a good start.
For the test names, one is "Test regression introduced from Rdatatable#4491" and the other is "Test regression fixed in Rdatatable#4558"
I find this confusing. Why is one named "introduced from" and the other named "fixed in" ? can they please be named consistently, both using "fixed in" ?
Both plots show Regression timings, and HEAD timings, which are slower than current CRAN release.
Are these two tests related somehow?
I was expecting that this PR (HEAD in plots) should only be slower in one of the test plots, but above we see HEAD is slower in both plots.
Can you please add a third plot with a test that is not related at all? Hopefully we should see no effect on performance for that third test.

@tdhock
Copy link

tdhock commented Mar 15, 2024

Also can you please change the PR title to be consistent with the plot title? (introduced by PR#4491 and fixed by PR#5463)

@Anirban166 Anirban166 changed the title GitHub Action + atime test to observe the performance regression introduced by PR #4491 GitHub Action + atime test to observe the performance regression introduced by PR #4491 and fixed by PR#5463 Mar 15, 2024
@Anirban166 Anirban166 changed the title GitHub Action + atime test to observe the performance regression introduced by PR #4491 and fixed by PR#5463 GitHub Action + atime test to observe the performance regression introduced by PR #4491 and fixed by PR #5463 Mar 15, 2024
@Anirban166
Copy link
Owner Author

Can you please add a third plot with a test that is not related at all? Hopefully we should see no effect on performance for that third test.

Done! (please check)

Also can you please change the PR title to be consistent with the plot title? (introduced by PR#4491 and fixed by PR#5463)

Now that I think of it, going with the original title might be appropriate since I'm only focusing on the merge commit of Rdatatable#4491 and its parent, or the point where the regression happened and before it was introduced respectively. Not the fix or rollback from Rdatatable#5463, as the other way to go about this would be to replicate the PR that fixed the regression (as opposed to the one which introduced it, both being different PRs), or to make a PR from a branch with the merge commit of #5463 to a branch with a commit before it got merged. (For example, I did that for #4440 here)

Question - How should I name the titles of test cases if more than one test case exists for a PR that got merged?
(For instance, there is another test case that I can include here which would also classify as 'Test regression fixed in #5463')

@tdhock

This comment was marked as outdated.

@tdhock
Copy link

tdhock commented Mar 29, 2024

also "Time taken to install atime and run atime_pkg on the test(s): 17 minutes and 18 seconds" is good but can you please break that down into at least two steps? 1. time to install/setup, 2. time to run performance tests. devs I think will like to know how many minutes we are using just for the testing, beyond what is required by any CI job to setup R.

… step runtime into installation and test time)
@tdhock
Copy link

tdhock commented Mar 31, 2024

Time taken to install atime: 0 minutes and 0 seconds ? Instead of just the time for installing atime, it would be better to see the total time from the start of CI job to when the standard R installation steps finish. Have you tried saving the current time to file before first GHA checkout step?

(base) tdhock@tdhock-MacBook:~$ date +%s > before_time
(base) tdhock@tdhock-MacBook:~$ date +%s > after_time
(base) tdhock@tdhock-MacBook:~$ echo $((`cat after_time`-`cat before_time`))
4

@tdhock
Copy link

tdhock commented Mar 31, 2024

The first comment in this PR is "To reproduce on another fork,..." which is helpful for reproducibility, but it would be even better to start with a couple of sentences which explain the intent / big picture of the PR (demonstration of new performance testing software, trying to reproduce a historical regression, and show that we can detect and avoid).

@Anirban166
Copy link
Owner Author

can you please fix before in the 4440 test? before should be fast, but it is showing up as just as slow as the regression.

Done in terms of an error - previously, 'Before' was in between the 'Regression' and 'Fix' commit points, but now it is aligned with 'Regression' which I think is correct since Rdatatable#4440 is the PR that fixes the regression. So 'Before' should not be fast, and only the 'Fix' in this case should be fast as we are seeing.

Time taken to install atime: 0 minutes and 0 seconds ?

I was in the process of testing things and working on a fix for that (just done), but I'm not sure why this happened since I was using the same logic to extract both this and the time taken to run the tests, and then the latter worked but this didn't (?) for some reason.

Instead of just the time for installing atime, it would be better to see the total time from the start of CI job to when the standard R installation steps finish.

I got it done after some trial and error - please check!

And I agree that it would be better, as it's also something you might not get directly by looking at the job (only the time taken to execute each step and the workflow as a whole is openly shown). That being said, installing atime with all its dependencies accounts for the lion's share of the duration.

Have you tried saving the current time to file before first GHA checkout step?

(base) tdhock@tdhock-MacBook:~$ date +%s > before_time
(base) tdhock@tdhock-MacBook:~$ date +%s > after_time
(base) tdhock@tdhock-MacBook:~$ echo $((`cat after_time`-`cat before_time`))
4

Yup, I used a similar approach before (for reference, here is the workflow file I was modifying), and currently too I'm saving to a file although that is for timing the tests.

The first comment in this PR is "To reproduce on another fork,..." which is helpful for reproducibility, but it would be even better to start with a couple of sentences which explain the intent / big picture of the PR (demonstration of new performance testing software, trying to reproduce a historical regression, and show that we can detect and avoid).

I had the exact same thoughts as well when writing that comment initially but the reason why I didn't include the full context here is because I'll be making an issue in data.table (and after discussing therein, follow up with a PR including the GitHub Action and tests). In that issue I intend to explain everything plus link other PRs apart from this one - so to avoid repeating the information when people visit this PR from there, I thought to keep it condensed this way.

Please let me know If the current state of things in the PR looks reasonable/explainable to you. (If so, I'll post the issue tomorrow morning)

@tdhock
Copy link

tdhock commented Apr 1, 2024

hi again times look good/useful.
but you wrote "now it is aligned with 'Regression' which I think is correct since Rdatatable#4440 is the PR that fixes the regression. So 'Before' should not be fast, and only the 'Fix' in this case should be fast as we are seeing." and that is incorrect.
Before means the code before the regression, which should be fast. (just as fast as fixed)
so if that Before commit is taken from the PR that fixes the regression, then that is incorrect. Before should be a commit before the regression (one way to do that is to go the the PR that caused the regression, and take the parent of the first commit in that PR).

@Anirban166 Anirban166 force-pushed the after-4491-got-merged branch 2 times, most recently from 7d09869 to f49bee3 Compare April 1, 2024 08:02
@Anirban166
Copy link
Owner Author

Before means the code before the regression, which should be fast. (just as fast as fixed)
so if that Before commit is taken from the PR that fixes the regression, then that is incorrect.

Thanks for clarifying!
My bad, I confused 'Before' in relation to the regression with 'Before' with regards to the PR.

(one way to do that is to go the the PR that caused the regression, and take the parent of the first commit in that PR).

It could've been a PR or a direct commit but the source hasn't been identified in the issue that discovered or reported the regression (Rdatatable#4311).

However, I see one point of reference - Jan mentioned that the regression was already present at 1.12.8 in that thread, so I'm thinking to try commits from older tags.

@tdhock
Copy link

tdhock commented Apr 2, 2024

if there is no older fast historical reference commit known, then just use the other two commits, but call them Fast and Slow (instead of Fixed and Regression).
Regression means that something which was working/fast previously, is now broken/slow. So if there was never any point in the past which worked, then we can't say it is a regression.

@Anirban166 Anirban166 force-pushed the after-4491-got-merged branch from a9e529f to 74ed5e9 Compare April 2, 2024 19:39
…state of data.table before that regression was introduced are not working.
@Anirban166 Anirban166 force-pushed the after-4491-got-merged branch from 05743ad to 988f918 Compare April 3, 2024 18:56
@tdhock
Copy link

tdhock commented Apr 3, 2024

Time taken to run atime::atime_pkg on the tests: -28536194 minutes and -50 seconds ????

@Anirban166
Copy link
Owner Author

I'm in the process of implementing the changes we discussed yesterday and testing them so please check back later. (I'll ping you here once I'm done so you can review then)

Time taken to run atime::atime_pkg on the tests: -28536194 minutes and -50 seconds ????

That was an undefined value case (used an older variable to compute the timings) and I fixed it in the last change to my workflow (also added a check for negatives in the simple shell function I wrote to avoid repeating the arithmetic conversion from the total second count to minutes and seconds).

…emoving Before made the Regression/Fixed lines change positions)
@tdhock
Copy link

tdhock commented Apr 3, 2024

great now I am seeing the image below, which looks good. only thing missing is some explanation about why two tests are affected. what is the similarity/difference between those two tests?
68747470733a2f2f61737365742e636d6c2e6465762f623036323062336562653433643739356130323663306262343832333132323264373732393735303f636d6c3d706e672663616368652d6279706173733d34643965383862352d343237372d343164352d396636

@tdhock
Copy link

tdhock commented Apr 3, 2024

also I see below which seems reasonable with respect to timings of recent GHA R CMD check runs

image

Time taken to finish the standard R installation steps: 12 minutes and 3 seconds

Time taken to run atime::atime_pkg on the tests: 4 minutes and 9 seconds

seems like "standard R installation" is about twice as long (10 min) as R CMD check workflow (5 min), why is that?

@Anirban166
Copy link
Owner Author

only thing missing is some explanation about why two tests are affected. what is the similarity/difference between those two tests?

The third test (Test regression fixed in Rdatatable#4440) doesn't seem to be affected by this PR to me. The only thing that is slower and different is the 'Regression' label there, which is associated with the commit that faces performance regression for that test - like that should always be slower in that plot or for that test case and also in the same spot since it's a fixed commit SHA right? (as is the case with the other two labels)

I'm not sure about what's going on with the second one (Test regression fixed in Rdatatable#4558) though.

Also, another thing that confused me earlier is that the 'Regression' and 'Fixed' labels (or just two labels for a test case with any names) move places when 'Before' (or a third label) is not specified. For e.g., I got this when I didn't specify 'Before' for the first test case:

image

After including a commit for 'Before' again, the other two labels switched to their original positions (or like how it is currently). Do you know why this happens? (since this makes it look like specifying a third label would be necessary)

@Anirban166
Copy link
Owner Author

seems like "standard R installation" is about twice as long (10 min) as R CMD check workflow (5 min), why is that?

I'm not sure why you are comparing to an R CMD check, but the bulk of that time has to do with installing atime and its dependencies, and not just setting up R: (which takes considerably less time)

image

(Here is the link to that run for reference)

Btw, I termed it 'standard R installation steps' since you mentioned that here:

it would be better to see the total time from the start of CI job to when the standard R installation steps finish

Please let me know if you want to change the wording in case it does not accurately represent the duration that we are measuring here.

@tdhock
Copy link

tdhock commented Apr 4, 2024

the other two labels switched to their original positions (or like how it is currently). Do you know why this happens?

the labels have the same order as the mean of the last/largest N data points.

I'm not sure why you are comparing to an R CMD check,

because that is what data.table is running after every push already, so that is the standard of "acceptable" time usage (we don't want to go too much over that)

@tdhock
Copy link

tdhock commented Apr 4, 2024

Looks like the 15 minutes comes from installing all of the R package deps of atime, (like ggplot2 + tidy deps, there are a lot).
For now I guess that is acceptable. If data.table devs think this is unacceptable then we could optimize by installing these packages to the docker container, used in "R setup" step

@Anirban166 Anirban166 force-pushed the after-4491-got-merged branch from f38000e to eac907b Compare April 4, 2024 21:13
@Anirban166
Copy link
Owner Author

If data.table devs think this is unacceptable then we could optimize by installing these packages to the docker container, used in "R setup" step

This sounds like a good idea. I would have to extend the container I'm using, doing something like:

FROM ghcr.io/iterative/cml:0-dvc2-base1

# Installing the libgit2 system dependency for git2r and reducing the image size by removing package lists post-installation:
RUN apt-get update && \
    apt-get install -y libgit2-dev && \
    rm -rf /var/lib/apt/lists/*

# Installing atime dependencies:
RUN R -e "install.packages('atime', dependencies = TRUE, repos = 'https://cloud.r-project.org')"

… of .Rprofile, it is time to test with the CRAN mirror being only set therein [Reverted the last change of removing the git switch step; see Anirban166/Autocomment-atime-results#33 (comment) for more details]
@Anirban166 Anirban166 force-pushed the after-4491-got-merged branch from f42185a to f2ba635 Compare April 4, 2024 23:34
@Anirban166
Copy link
Owner Author

@tdhock I think it would be good to omit the second test case (Test regression fixed in #4558) for now and just do a 1:1 comparison for the remaining two historical regression tests ('Test regression introduced from #4491' and 'Test regression fixed in #4440') between this PR and #3. Thoughts?

@tdhock
Copy link

tdhock commented Apr 5, 2024

Why? I would suggest the opposite, please include as many performance tests as possible.

@Anirban166
Copy link
Owner Author

Because the difference between them is clear:

For 'Test regression introduced from #4491' you can see that base, merge-base, CRAN, Fixed, and Before are fast while Regression and HEAD are slow for this PR that replicates the regression introducing PR, and in #3 where it should not be affected, all commits are fast with the expected exception of Regression.

Similarly for 'Test regression fixed in #4440', you can see in this PR that it is the reverse case with all commits being fast except the one associated with Regression (as expected since this should not affect that test case), and in #3 where it replicates the regression fixing PR, you can see Fixed, HEAD, CRAN, and Before as fast while Regression, merge-base, and base are slow as we would expect.

Both of them make perfect sense to me, but 'Test regression fixed in #4558' doesn't in relation, especially in terms of the commit associated with 'Regression'.

Also, I'll be sharing my work today with the data.table community via an issue and I think it would be better to have such a case-to-case comparison of two historical regressions. Likely easier to follow and for me to explain as well. (After that, I absolutely agree that we should definitely try to add more test cases)

@tdhock
Copy link

tdhock commented Apr 5, 2024

ok sure if it would be easier to follow in a first communication that is fine.
make sure to create a new PR with a clear discussion of motivation (why you want to do performance testing, what is your approach, how much time it takes in relation to what CI is already being run by data.table, what historical issue this PR is trying to replicate, what is your interpretation of the plot).
This PR starts with "To reproduce on another fork..." which is helpful for reproducibility, but please add more info so that the data.table devs can understand what you did, and how would be helpful to them.

@Anirban166
Copy link
Owner Author

This PR starts with "To reproduce on another fork..." which is helpful for reproducibility, but please add more info so that the data.table devs can understand what you did, and how would be helpful to them.

I had the exact same thoughts as well when writing that comment initially but the reason why I didn't include the full context here is because I'll be making an issue in data.table (and after discussing therein, follow up with a PR including the GitHub Action and tests). In that issue I intend to explain everything plus link other PRs apart from this one - so to avoid repeating the information when people visit this PR from there, I thought to keep it condensed this way.

Yes but like I mentioned before, I'm thinking of placing all that information in the issue itself, and avoid repeating myself in the PR (these PRs will be a link in the issue, and ideally they would want to visit for the plot but read everything in one place or in the issue itself first).

make sure to create a new PR with a clear discussion of motivation (why you want to do performance testing, what is your approach, how much time it takes in relation to what CI is already being run by data.table, what historical issue this PR is trying to replicate, what is your interpretation of the plot).

Can I just use the existing ones?
I can create new PRs but I'm not sure what to title them and make them different from #2 and #3 if they're based on the same thing. Also, this PR has some valuable discussion.

And yup, I'll write all of that and try to incorporate as much detail as I can, but in my issue instead of the PR(s).

@Anirban166 Anirban166 merged commit fa25e82 into before-4491-got-merged Apr 6, 2024
1 check passed
@Anirban166
Copy link
Owner Author

Can I just use the existing ones?

Since I didn't hear back from you on this I presume you want me to make new PRs so I will go ahead with that - onto creating two new PRs now. I do prefer going for new ones in the sense that they'll be minimal (like without our discussion here), and now that we know they're working it'll be less of a 'test' (which I used in the title here, and then this was more of a work in progress with feedback cycles than a working demo).

make sure to create a new PR with a clear discussion of motivation (why you want to do performance testing, what is your approach, how much time it takes in relation to what CI is already being run by data.table, what historical issue this PR is trying to replicate, what is your interpretation of the plot).

I'm supposing that you meant PRs here or in my fork of data.table to replicate historical regressions (especially since you mentioned 'new' as well), but if you meant a PR to Rdatatable/data.table for contributing with the tests and my action directly instead of my clone here, then I was intending to do it after posting an issue and collecting feedback therein. (If you would like me to skip the issue and send a PR directly that works too, please let me know!)

I've decided upon dividing things to mention in those historical performance regression mirroring PRs and the PR/issue I'll post to data.table. The former would include details based on the PR context (like as you mentioned 'what historical issue this PR is trying to replicate'), and the latter would include other details (such as my testing procedure, motivation, and time-related notes as you mentioned).

I'll be posting the issue (or sending the PR directly) on Monday morning since I won't be available over the weekend to comment/respond to any feedback I get from the community. (I'll be spending some time over the weekend to finish constructing/adding information to write for that issue/PR though)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants