Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduced memory usage in HighPerformance tests #3408

Merged
merged 8 commits into from
Aug 8, 2020

Conversation

Sergio0694
Copy link
Member

Fixes random CI SystemOutOfMemoryException-s

PR Type

What kind of change does this PR introduce?

  • Build or CI related changes

What is the current behavior?

The CI will randomly fail with an out of memory error.

What is the new behavior?

The error should be gone now. All the Count tests for the HighPerformance package are now reusing arrays from the same ArrayPool<byte> instance (since they use unmanaged types, so they can just be reinterpret-cast-ed to the necessary type every time). This should greatly reduce the total number of allocated arrays.

PR Checklist

Please check if your PR fulfills the following requirements:

  • Tested code with current supported SDKs
  • Pull Request has been submitted to the documentation repository instructions. Link:
  • Sample in sample app has been added / updated (for bug fixes / features)
  • Tests for the changes have been added (for bug fixes / features) (if applicable)
  • Header has been added to all new source files (run build/UpdateHeaders.bat)
  • Contains NO breaking changes

@Sergio0694 Sergio0694 added this to the 7.0 milestone Aug 3, 2020
@Sergio0694 Sergio0694 self-assigned this Aug 3, 2020
@ghost
Copy link

ghost commented Aug 3, 2020

Thanks Sergio0694 for opening a Pull Request! The reviewers will test the PR and highlight if there is any conflict or changes required. If the PR is approved we will proceed to merge the pull request 🙌

@ghost ghost requested review from michael-hawker, azchohfi and Kyaa-dost August 3, 2020 21:41
@azchohfi
Copy link
Contributor

azchohfi commented Aug 3, 2020

Nice!

@Sergio0694
Copy link
Member Author

@azchohfi Thanks! Sorry it didn't occur to me earlier to make this change! 😅
Hopefully this will finally make the CI happy when running those tests!

@azchohfi
Copy link
Contributor

azchohfi commented Aug 3, 2020

@azchohfi Thanks! Sorry it didn't occur to me earlier to make this change! 😅
Hopefully this will finally make the CI happy when running those tests!

Time will tell... I hate random issues....

@azchohfi
Copy link
Contributor

azchohfi commented Aug 3, 2020

Nop, still got an error:

Test_ReadOnlySpanExtensions_FilledCount64
Test method UnitTests.HighPerformance.Extensions.Test_ReadOnlySpanExtensions.Test_ReadOnlySpanExtensions_FilledCount64 threw exception: 
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.

@Sergio0694
Copy link
Member Author

Sergio0694 commented Aug 3, 2020

Ok this is weird though. has the memory limit been reduced recently for the CI?
This was never a problem while working on the HighPerformance package originally, it's weird that it's crashing so often now.

One easy solution would be to just drop the tests using 64 bit values. The overflow threshold is the same as the one for 32 bit values anyway, and the code path is the same as well. Technically that'd still be a slightly less "tested" API as a result, but maybe that's a compromise you guys are willing to make if these tests are such a problem for the CI?

I mean we could nuke these (and the ones with the "filled" variant too):

https://github.com/windows-toolkit/WindowsCommunityToolkit/blob/a294b8542a1f233d81642ec979ea1e378a8b3ff9/UnitTests/UnitTests.HighPerformance.Shared/Extensions/Test_ReadOnlySpanExtensions.Count.cs#L52-L57

@azchohfi Let me know what you think! 👍

@azchohfi
Copy link
Contributor

azchohfi commented Aug 3, 2020

I don't want to nuke it... If we support it and there is a way to test it, lets test it. Is there a way to split this? Maybe its because we are now running tests in parallel.
Btw, I had this issue running the tests on my box, which is strong, very strong (64GB RAM + 2 Xeon CPUs + SSD).

@Sergio0694
Copy link
Member Author

Sergio0694 commented Aug 3, 2020

Is there a way to split this? Maybe its because we are now running tests in parallel.

Possibly a dumb question, but if we're running tests in parallel, wouldn't splitting this test into multiple tests (eg. one per type) cause this error to be even more frequent? I mean, since each type test requires an array to be allocated, and if those were separate tests that could all be run in paralell, the test runner could end up allocating even more memory at once? 🤔

On this point... What if we instead merge the random/filled tests into a single test?
That way eg. no more than a given large array could possibly be allocated at once. Worth a shot?

P.S. that's a nice box you have there! 😄🚀

@azchohfi
Copy link
Contributor

azchohfi commented Aug 3, 2020

Parallel tests are configured to be by class already, so splitting it into the same class would only make sure that per-test tear down happens completely (if there is anything that is split there), and hopefully some memory gets released, but I don't think it would make much different. If we had a more complete error message/dump, we could know what is crashing with what, but I don't think we can know for a fact what will be too much memory with what, so maybe your suggestion actually works.

@Sergio0694
Copy link
Member Author

Ah, yeah if they're parallelized per class then I guess that wouldn't really have helped much.
You know what, you mentioning that "hopefully some memory gets released" made me realize that in this case it might make sense to just handle the memory manually to be sure that all rented memory just vanishes completely when those tests are done. I did an attempt in 611d687 using Marshal.AllocHGlobal, let's see how this goes 🤣

@azchohfi
Copy link
Contributor

azchohfi commented Aug 4, 2020

It ran once fine and I thought you did it, but I ran it again and these 2 failed:
Test_ReadOnlySpanExtensions_FilledCount64
Test_ReadOnlySpanExtensions_RandomCount64
Same OOM Exception....

@michael-hawker
Copy link
Member

Current code looks good, signed-off to unblock any updates. Seems like you both have a handle on things! :)

@Sergio0694
Copy link
Member Author

Sergio0694 commented Aug 4, 2020

It ran once fine and I thought you did it, but I ran it again and these 2 failed:

Ok, ok, not too bad,that's progress 😄
The fact that the first run worked fine makes me thing that we're going in the right direction here, and that the crash in the following run might be caused by the exact order of tests being executed in parallel. In particular, I noticed that lots of other tests (specifically, the ones for the ParallelHelper APIs, and for HashCode<T>) were allocating large buffers, sometimes without even pooling. If those tests were executed before the ones in question, or in parallel, I'd say it's possible those buffers might have increased the GC pressure too much and caused the exception we saw.

I made some changes in 1dcdb24 and 7e7121c and moved all those tests to unmanaged memory management, so that they should all just straight up bypass the managed heap and cause no problems anymore. The CI is running again, fingers crossed! 🤞

EDIT: it worked! 🎉
Feel free to re-trigger the CI (I don't see the option on my end) to double checked this actually fixed the issue.

@michael-hawker
Copy link
Member

Ugh, I think they moved/removed the option to re-run the pipeline (at least on a successful run?)??

@azchohfi do you know where they moved it?

@Sergio0694 Sergio0694 changed the title Reduced memory usage in Count tests Reduced memory usage in HighPerformance tests Aug 4, 2020
@Sergio0694
Copy link
Member Author

@michael-hawker @azchohfi Well the merge triggered the CI, and it run just fine again for the 2nd time in a row!

Success? 🎉🎉🎉

@azchohfi
Copy link
Contributor

azchohfi commented Aug 8, 2020

I'll run it locally a couple of extra times now.

Copy link
Contributor

@azchohfi azchohfi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, 3 times in a row in my box, all passing 🎉🎉🎉🦙🐱‍💻

@azchohfi azchohfi merged commit e4da174 into CommunityToolkit:master Aug 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants