-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Heap Trace] Perf: use hash map to speed up leaks mode 34x (IDFGH-9425) #10793
[Heap Trace] Perf: use hash map to speed up leaks mode 34x (IDFGH-9425) #10793
Conversation
aaa21ca
to
9feb0b4
Compare
166973e
to
cbc5a59
Compare
ef0ba89
to
b255bfd
Compare
I've noticed a weird issue, when This happens with and without the hashmap, and regardless of I wont have time to dig into this further. I don't think it is caused by this PR, and doesn't seem like the linked list PR would cause it either. We did not change the lock, and the |
b255bfd
to
58d5fd2
Compare
Hi @chipweinberger, I will get to this PR today :) I will investigate the 1919 total allocs issue as I will review and test the feature. |
Very curious to know what you find! Hopefully you reproduce it. |
This is my sdkconfig: |
// copy to destination | ||
if (r_get) { | ||
memcpy(r_out, r_get, sizeof(heap_trace_record_t)); | ||
} else { | ||
// this should not happen since we already | ||
// checked that index < records.count, | ||
// but could be indicative of memory corruption | ||
result = ESP_ERR_INVALID_STATE; | ||
memset(r_out, 0, sizeof(heap_trace_record_t)); | ||
} | ||
// We already checked that index < records.count, | ||
// This could be indicative of memory corruption. | ||
assert(rget != NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it normal that the copy of r_get
to r_out
was removed here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good spot. That is a bug.
I only intended to remove the error code because I changed my mind about my original code - asserting on memory corruption is probably more appropriate here.
I see that you renamed a lot of variables. The thing is that we are using snake case for variable names is esp-idf so it would be nice if you could update them. Otherwise, I will have to commit on top of your commits to change the name again. I leave a patch with the fixed variable names below feel free to use it :) I hope I didn't miss any. |
I think it would be wise to wrap the whole hashmap related logic under a config so that if the hashmap is not used, the impact of the heap trace on the RAM will remain the same. If you incorporate the comments left above I can add a commit to handle the placement of the functions. |
@SoucheSouche Thanks for the comments
Yes I think "rget" is still considered valid snake case. "fflush", "printf", etc all use that format. Personally I find the "r_" prefix hurts readability here. If "rget" is not suitable, I would then prefer "get" to keep the variable name short. This is of course personal preference. Do you prefer the "r_" prefixes? I'll go with your choice.
Yes, that's a good point. A kconfig makes sense. 👍🏻 |
I'd keep the |
Okay. Updated the PR. |
58d5fd2
to
75a4fc4
Compare
Concerning the I tried the code on esp32s2 (randomly picked board), I added a This is what I get:
Which means that 215000 bytes were successfully allocated. Given that for each allocation, there are a few overhead bytes related to the memory allocator algorithm, I would say that it adds up. |
if I reduce the allocation size to 16 bytes, Then I get the expected value of total allocations. It makes sense since only 16Kbytes were allocated during the first iteration which is way below 217KB
|
of course! 🤦🏻♂️ Thanks for investigating it! |
@SoucheSouche latest status? |
@SoucheSouche bump. hoping this makes it into 5.1 |
Hi @chipweinberger, sorry for the late reply, I was away from work. However, your PR was reviewed while I was away and a suggestion was made for a more efficient implementation. I will have a look at it today and get back at you to let you know if we will go with the suggested implementation or if we will stick to yours. |
no problem I hope your time away was enjoyable!
😀 nice! Hopefully not too much more work for you. Hard to imagine what could be faster. |
Hi @chipweinberger!
This is removing collisions in the For 1000000 allocations and free, the code with no tracing executes in 5779ms, The code tracing enabled and no hash map takes 40262ms, the code with your implementation of the has map (using 1000 entries) takes 20632ms and the code with the hash map (using 50 entries) using the linked list as entries executes in 8129ms (7381ms with 1000 entries in the |
good approach! Yes in my design the hash map must be big to remain fast. Your approach seems like a nice simplification for the user. Good work. I imagine that was tricky to code! |
The update was fairly straightforward and sat just well in your design. I mostly had to update the The new implementation has been approved by 1 reviewer out of 2 so hopefully we should see your feature in the master branch soon. I will keep you updated as usual. Also, unrelated update: I will start working on your console PRs for the duration of April, after what I will be out of work for an extended period. |
Glad to hear it! I'm glad Espressif has taken an interest in the Console PRs. Hopefully not too much work - they're fully functional but of course "polish" takes awhile too. On that topic, It would be great to solve this issue: #9366 This issue started about CDC OTG console but in reality USB Serial/JTAG is what we need. I should probably open a separate issue. We just need an official way to determine if the USB Serial JTAG is connected. I created this PR as a workaround (#10097), but in reality that PR should be denied in favor of a way that uses the hardware registers. Someone did suggest some code, but it unclear if it is actually correct. |
merged: b699033 |
Previous PR: #10521
This finishes the perf work on heap trace standalone. Leaks mode is now fast enough to be enabled in production, or during development!
Adds
esp_err_t heap_trace_set_hashmap(heap_trace_hashmap_entry_t *entries_buffer, size_t num_entries);
Motivation:
Testing:
Perf:
Most importantly! using a map makes record add & remove O(1), so we don't see a performance cliff when an old allocation is freed. 34x faster!
SPIRAM (NUMTABLE 2500, NUMITER 2, NUMENTRY 6000, NUMREC 3000)
total 41ms // no trace
total 3401ms // standard trace
total 109ms // with map
For internal ram and a more typical NUMTABLE 500, we are 1.8x faster with the hashmap.
internal memory:
total 665ms // no trace
total 1814ms // standard trace
total 1065ms // with map
For SPIRAM, also NUMTABLE 500, I'm surprised its not as dramatic. But still ~1.4x faster.
SPIRAM:
total 683ms // no trace
total 2118ms // standard trace
total 1491ms // with map