-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test suite sometimes picks up way more tests; takes ~6 times longer #343
Comments
Do you think it would be a problem to unconditionally use |
The logs are already extremely verbose, so if we could avoid adding another ~40k lines that would be great. But otherwise fine. Speaking of, if you have any ideas on that - I'd love to get rid off another chunk of thousands of lines with ~0 information content; everything about:
should just go away. In other words, the logs are currently displaying the entire content of the wheel 3 times per python version, for essentially zero gain. I'm already down to |
These are coming from setuptools, and I think I could reduce their verbosity. I'll try. |
This seems to work for me. |
I think the problem is actually a cuda-detection issue. Looking at the most recent logs in #326, the "fast" test suite runs skip all the |
Hmm, that would explain why I couldn't reproduce on a non-GPU system. I'll try on a GPU system later today. |
Ah, indeed. Collects total of 8992 tests with CPU build, and 15892 with CUDA build. I wonder if we could and should assert for that somehow. Since we can't unconditionally assume everyone building it will do some on a GPU system, the assert could be conditional to some external CUDA detection method. |
I don't think that's necessary. I'd even go as far as saying that I prefer the current setup. The GPU tests do get run on average at least once or twice per megabuild (so we have the coverage), but having the CUDA detection fail and only run the CPU tests in much less time actually keeps the overall testing time sane. Though I'm not opposed to fixing it properly (and perhaps reducing the tests and/or skipping some long-running ones, like I did in 5640240). |
xkcd#1172 ("workflows") sounds like :-). |
I would somewhat prefer that we be more "reactionary" with the tests. Developer time (yours) is quite important and waiting an extra hour during a iteration cycle is really a downer. conda-forge used to prioritize "existence tests" rather than exhaustive tests. I think we should go steer toward that direction. Having 5-30 very targetted lines like:
will likely be equally as effective in finding build errors. |
Generally yes, but since the iteration cycle for the GPU builds here is >10h in any case, it barely makes a difference. And where speed matters for debugging cycles, it's really easy to comment out the pytest call, restrict it to a single python version, or just delete some of the tested modules...
I actually prefer running the full test suite in feedstocks that I maintain, especially if the feedstock does non-trivial surgery. Too often have I seen bizarre stuff that breaks due to trivial oversights (381fcb8 is a recent example of something I wouldn't have found without extensive tests), and in the vast majority of cases the conclusion was that it's been my fault. 😅 Running more than "experience tests" is (in my experience) a relatively concrete insurance that the surgery left the patient alive and (mostly) healthy. In the case of pytorch, the full test suite is too massive to make that approach feasible, but in terms of overall philosophy, I'd rather wait an hour longer than end up with avoidable bugs that I don't even know about. As we discussed across a couple issues recently, I believe there's a middle ground to be found between our positions on this, and I'm happy to iterate on the set of modules and tests. |
I would like to use build time as proxy for "scope". I know many of our jobs depend/support conda-forge, but honestly, the recipe has become too complicated already. If we can decrease the scope by 10%, I call that a win.
I agree that adding a test here is the right move, and in fact the test that was added is narrow in scope which is what I'm advocating for. conda-forge shouldn't be considered a "I'm going to run my business on this". Anaconda provides packages for this, and also a support line to get more targetted support. We can't recreate this for "free". Honestly, the scope increase of the pytorch recipe has made me less interested in helping maintain it. The simplicity of "you can build this recipe on any linux box, GPU or not" should be under valued IMO. I know I've said this many times, but unless this whole thread is a place to bikeshed, I think you are also concerned with the scope an flakiness of the tests. My proposal is as follows:
If no GPU is detected, we should indicate it in the logs. A pytorch conda-forge maintainer should then use their judgement in pressing the merge button if they feel comfortable doing so. Key takeaway: I don't think we need to mandate the GPU test suite in the CIs. |
One thing we could consider is rerunning failing tests too. Normally I'd do that via |
There's something strange with the tests; in the following run for linux-64+CUDA+MKL (d04bba8 in #340), py311 collects a whole bunch more tests and takes longer than elsewhere (even than py312, which is the only version where we run the inductor tests).
The set of modules and skips is exactly the same as on 3.9 or 3.10, so I don't know what would explain this difference in test collection.
(note: it's expected that 3.12 runs longer due to being the only version where we include the
torchinductor
tests, and that 3.13 has more skips because dynamo doesn't yet support 3.13 in pytorch 2.5)However, after merging #340 to main, the exact same job yielded completely different behaviour, with every test run collecting 13k+ tests and taking ~50min instead of <10.
Originally posted by @h-vetinari in #340 (comment)
The text was updated successfully, but these errors were encountered: