[Heap Trace] Perf: use hash map to speed up leaks mode 34x (IDFGH-9425) #10793

chipweinberger · 2023-02-17T22:27:12Z

Previous PR: #10521

This finishes the perf work on heap trace standalone. Leaks mode is now fast enough to be enabled in production, or during development!

Adds esp_err_t heap_trace_set_hashmap(heap_trace_hashmap_entry_t *entries_buffer, size_t num_entries);

Motivation:

leaks mode should be fast enough to be enabled in production
users can leave this mode enabled after boot & print leaks automatically (or save a dump to disk) when a low ram threshold is hit
if enabled before boot, leaks mode is a simple tool to find ways to save heap memory. This PR makes that use case fast.

Testing:

tested on an esp32s3.

Perf:

Most importantly! using a map makes record add & remove O(1), so we don't see a performance cliff when an old allocation is freed. 34x faster!

SPIRAM (NUMTABLE 2500, NUMITER 2, NUMENTRY 6000, NUMREC 3000)
total 41ms // no trace
total 3401ms // standard trace
total 109ms // with map

For internal ram and a more typical NUMTABLE 500, we are 1.8x faster with the hashmap.

internal memory:
total 665ms // no trace
total 1814ms // standard trace
total 1065ms // with map

For SPIRAM, also NUMTABLE 500, I'm surprised its not as dramatic. But still ~1.4x faster.

SPIRAM:
total 683ms // no trace
total 2118ms // standard trace
total 1491ms // with map

    const int NUMREC = 600;
    const int NUMENTRY = 600;
    const int NUMTABLE = 500;
    const int NUMITER = 100;

    uint32_t caps = MALLOC_CAP_INTERNAL;
    heap_trace_record_t* r = heap_caps_calloc(NUMREC, sizeof(heap_trace_record_t), caps);
    heap_trace_hashmap_entry_t* e = heap_caps_calloc(NUMENTRY, sizeof(heap_trace_hashmap_entry_t), caps);
    void** table = (void**) calloc(NUMTABLE, sizeof(void*));

    for (int i = 0; i < 3; i++) {

        int64_t t0 = esp_timer_get_time();

        if (i == 1 || i == 2) {
            heap_trace_init_standalone(r, NUMREC);
            if (i == 2) {heap_trace_set_hashmap(e, NUMENTRY);}
            heap_trace_start(HEAP_TRACE_LEAKS);
        }

        for(int j = 0; j < NUMITER; j++) {
            for (int k = 0; k < NUMTABLE; k++) {table[k] = heap_caps_malloc(1000,caps);}
            for (int k = 0; k < NUMTABLE; k++) {free(table[k]);}
        }

        int64_t t1 = esp_timer_get_time();
        printf("total %llims\n", (t1 - t0) / 1000);
        if (i == 1 || i == 2) {
            heap_trace_stop();
            heap_trace_dump();
        }
    }

chipweinberger · 2023-02-17T22:37:19Z

@SoucheSouche =)

chipweinberger · 2023-02-18T01:39:46Z

I've noticed a weird issue, when NUMTABLE is > 1919, total_allocations in the summary does not go past 1919.

This happens with and without the hashmap, and regardless of SPIRAM vs INTERNAL. It seems worse when using INTERNAL. Could there be a locking issue on master? Maybe the lock we are using does not always work?

I wont have time to dig into this further. I don't think it is caused by this PR, and doesn't seem like the linked list PR would cause it either. We did not change the lock, and the total_allocations++ logic has not changed.

SoucheSouche · 2023-02-23T07:05:58Z

Hi @chipweinberger, I will get to this PR today :) I will investigate the 1919 total allocs issue as I will review and test the feature.

chipweinberger · 2023-02-23T07:46:26Z

Very curious to know what you find! Hopefully you reproduce it.

chipweinberger · 2023-02-23T07:47:19Z

This is my sdkconfig:
sdkconfig.txt

SoucheSouche · 2023-02-27T12:33:31Z

components/heap/heap_trace_standalone.c

-        // copy to destination
-        if (r_get) {
-            memcpy(r_out, r_get, sizeof(heap_trace_record_t));
-        } else {
-            // this should not happen since we already
-            // checked that index < records.count,
-            // but could be indicative of memory corruption
-            result = ESP_ERR_INVALID_STATE;
-            memset(r_out, 0, sizeof(heap_trace_record_t));
-        }
+        // We already checked that index < records.count,
+        // This could be indicative of memory corruption.
+        assert(rget != NULL);


Is it normal that the copy of r_get to r_out was removed here ?

Oh good spot. That is a bug.

I only intended to remove the error code because I changed my mind about my original code - asserting on memory corruption is probably more appropriate here.

SoucheSouche · 2023-02-27T12:36:26Z

I see that you renamed a lot of variables. The thing is that we are using snake case for variable names is esp-idf so it would be nice if you could update them. Otherwise, I will have to commit on top of your commits to change the name again.

I leave a patch with the fixed variable names below feel free to use it :) I hope I didn't miss any.
heap_trace.patch

SoucheSouche · 2023-02-27T12:50:47Z

map_add, map_remove, map_find and subsequently the static variables and function they utilize can be placed in flash by the compiler but are called from function that are specifically placed in IRAM. Everything should be placed in RAM but this means that - in the state of your PR - regardless of whether the hashmap is used or not, a substantial amount of RAM memory will be used to support the feature.

I think it would be wise to wrap the whole hashmap related logic under a config so that if the hashmap is not used, the impact of the heap trace on the RAM will remain the same.

If you incorporate the comments left above I can add a commit to handle the placement of the functions.

chipweinberger · 2023-02-27T18:24:54Z

@SoucheSouche Thanks for the comments

variable names

Yes I think "rget" is still considered valid snake case. "fflush", "printf", etc all use that format. Personally I find the "r_" prefix hurts readability here. If "rget" is not suitable, I would then prefer "get" to keep the variable name short. This is of course personal preference. Do you prefer the "r_" prefixes? I'll go with your choice.

large amount of IRAM

Yes, that's a good point. A kconfig makes sense. 👍🏻

SoucheSouche · 2023-02-28T07:04:08Z

I'd keep the r_ everywhere to be consistent across the file.

chipweinberger · 2023-02-28T07:29:05Z

Okay. Updated the PR.

SoucheSouche · 2023-02-28T09:18:01Z

Concerning the 1919 allocations behavior, could it be that you simply run out of available heap?

I tried the code on esp32s2 (randomly picked board), I added a heap_caps_get_free_size(caps) after the table calloc and got ~217000 bytes free. With a NUMTABLE set to 1000, it means that during the first iteration, I will try to allocate 1000000 bytes.

This is what I get:

------- free size : 217656 ---------------

====== Heap Trace: 0 records (600 capacity) ======
====== Heap Trace Summary ======
Mode: Heap Trace Leaks
0 bytes 'leaked' in trace (0 allocations)
records: 0 (600 capacity, 215 high water mark)
hashmap: 600 capacity (21500 hits, 0 misses)
total allocations: 21500
total frees: 21500

Which means that 215000 bytes were successfully allocated. Given that for each allocation, there are a few overhead bytes related to the memory allocator algorithm, I would say that it adds up.

SoucheSouche · 2023-02-28T09:22:09Z

if I reduce the allocation size to 16 bytes, Then I get the expected value of total allocations. It makes sense since only 16Kbytes were allocated during the first iteration which is way below 217KB

------- free size : 217656 ---------------

====== Heap Trace: 0 records (600 capacity) ======
====== Heap Trace Summary ======
Mode: Heap Trace Leaks
0 bytes 'leaked' in trace (0 allocations)
records: 0 (600 capacity, 600 high water mark)
hashmap: 0 capacity (0 hits, 0 misses)
total allocations: 100000
total frees: 100000
(NB: Internal Buffer has overflowed, so trace data is incomplete.)
================================

====== Heap Trace: 0 records (600 capacity) ======
====== Heap Trace Summary ======
Mode: Heap Trace Leaks
0 bytes 'leaked' in trace (0 allocations)
records: 0 (600 capacity, 600 high water mark)
hashmap: 600 capacity (60000 hits, 40000 misses)
total allocations: 100000
total frees: 100000
(NB: Internal Buffer has overflowed, so trace data is incomplete.)

chipweinberger · 2023-02-28T18:35:39Z

could it be that you simply run out of available heap?

of course! 🤦🏻‍♂️ Thanks for investigating it!

chipweinberger · 2023-03-10T22:24:55Z

@SoucheSouche latest status?

chipweinberger · 2023-03-23T09:24:13Z

@SoucheSouche bump.

hoping this makes it into 5.1

SoucheSouche · 2023-03-27T06:07:06Z

Hi @chipweinberger, sorry for the late reply, I was away from work. However, your PR was reviewed while I was away and a suggestion was made for a more efficient implementation. I will have a look at it today and get back at you to let you know if we will go with the suggested implementation or if we will stick to yours.

chipweinberger · 2023-03-27T07:26:19Z

no problem I hope your time away was enjoyable!

more efficient implementation

😀 nice! Hopefully not too much more work for you. Hard to imagine what could be faster.

SoucheSouche · 2023-04-03T05:47:11Z

Hi @chipweinberger!
So I modified your hash map implementation as follow:

hash_map array is now an array of linked list.
when adding an entry to the hash map, we compute the hash of the entry and add it to the corresponding list in hash_map. So entries with the same hash will just be added to the same linked list at the given index in the hash_map array instead of trying to find an available spot in the array and potentially ending up not adding the hash in the array.

This is removing collisions in the hash_map array and gives better performances for less RAM used overall.
In the case when HEAP_TRACE_ALL is used, it gives even better results :

For 1000000 allocations and free, the code with no tracing executes in 5779ms, The code tracing enabled and no hash map takes 40262ms, the code with your implementation of the has map (using 1000 entries) takes 20632ms and the code with the hash map (using 50 entries) using the linked list as entries executes in 8129ms (7381ms with 1000 entries in the hash_map).

chipweinberger · 2023-04-03T06:14:10Z

good approach!

Yes in my design the hash map must be big to remain fast.

Your approach seems like a nice simplification for the user.

Good work. I imagine that was tricky to code!

SoucheSouche · 2023-04-03T06:57:53Z

The update was fairly straightforward and sat just well in your design. I mostly had to update the map_x functions.

The new implementation has been approved by 1 reviewer out of 2 so hopefully we should see your feature in the master branch soon. I will keep you updated as usual.

Also, unrelated update: I will start working on your console PRs for the duration of April, after what I will be out of work for an extended period.

chipweinberger · 2023-04-04T07:15:22Z

Glad to hear it!

I'm glad Espressif has taken an interest in the Console PRs. Hopefully not too much work - they're fully functional but of course "polish" takes awhile too.

On that topic, It would be great to solve this issue: #9366 This issue started about CDC OTG console but in reality USB Serial/JTAG is what we need. I should probably open a separate issue. We just need an official way to determine if the USB Serial JTAG is connected. I created this PR as a workaround (#10097), but in reality that PR should be denied in favor of a way that uses the hardware registers. Someone did suggest some code, but it unclear if it is actually correct.

chipweinberger · 2023-04-10T22:39:43Z

merged: b699033

chipweinberger force-pushed the user/chip/heap-trace-hashmap branch from aaa21ca to 9feb0b4 Compare February 17, 2023 22:34

espressif-bot added the Status: Opened Issue is new label Feb 17, 2023

github-actions bot changed the title ~~[Heap Trace] Perf: use hash map to speed up leaks mode~~ [Heap Trace] Perf: use hash map to speed up leaks mode (IDFGH-9425) Feb 17, 2023

chipweinberger force-pushed the user/chip/heap-trace-hashmap branch 5 times, most recently from 166973e to cbc5a59 Compare February 18, 2023 01:00

chipweinberger changed the title ~~[Heap Trace] Perf: use hash map to speed up leaks mode (IDFGH-9425)~~ [Heap Trace] Perf: use hash map to speed up leaks mode 34x (IDFGH-9425) Feb 18, 2023

chipweinberger force-pushed the user/chip/heap-trace-hashmap branch 2 times, most recently from ef0ba89 to b255bfd Compare February 18, 2023 01:20

chipweinberger force-pushed the user/chip/heap-trace-hashmap branch from b255bfd to 58d5fd2 Compare February 18, 2023 02:44

espressif-bot assigned SoucheSouche Feb 20, 2023

espressif-bot added Status: Selected for Development Issue is selected for development and removed Status: Opened Issue is new labels Feb 22, 2023

espressif-bot added Status: In Progress Work is in progress and removed Status: Selected for Development Issue is selected for development labels Feb 23, 2023

chipweinberger mentioned this pull request Feb 24, 2023

serial output using ESP32-S3 usb_serial_jtag controller is unreliable (IDFGH-7781) #9318

Closed

2 tasks

SoucheSouche reviewed Feb 27, 2023

View reviewed changes

[Heap Trace] Perf: use hash map to speed up leaks mode

75a4fc4

chipweinberger force-pushed the user/chip/heap-trace-hashmap branch from 58d5fd2 to 75a4fc4 Compare February 28, 2023 07:29

espressif-bot added Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally Resolution: Done Issue is done internally and removed Status: In Progress Work is in progress Resolution: NA Issue resolution is unavailable labels Apr 5, 2023

chipweinberger closed this Apr 10, 2023

This was referenced Apr 10, 2023

[v5.1] [Heap Trace Standalone] support external memory for hash map (IDFGH-9845) #11172

Closed

[v5.1] [Heap Trace Standalone] hash map uses doubly linked list, but single linked would be better (IDFGH-9846) #11173

Closed

chipweinberger deleted the user/chip/heap-trace-hashmap branch May 20, 2023 10:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Heap Trace] Perf: use hash map to speed up leaks mode 34x (IDFGH-9425) #10793

[Heap Trace] Perf: use hash map to speed up leaks mode 34x (IDFGH-9425) #10793

chipweinberger commented Feb 17, 2023 •

edited

Loading

chipweinberger commented Feb 17, 2023

chipweinberger commented Feb 18, 2023 •

edited

Loading

SoucheSouche commented Feb 23, 2023

chipweinberger commented Feb 23, 2023 •

edited

Loading

chipweinberger commented Feb 23, 2023

SoucheSouche Feb 27, 2023

chipweinberger Feb 27, 2023 •

edited

Loading

SoucheSouche commented Feb 27, 2023

SoucheSouche commented Feb 27, 2023 •

edited

Loading

chipweinberger commented Feb 27, 2023 •

edited

Loading

SoucheSouche commented Feb 28, 2023

chipweinberger commented Feb 28, 2023

SoucheSouche commented Feb 28, 2023 •

edited

Loading

SoucheSouche commented Feb 28, 2023 •

edited

Loading

chipweinberger commented Feb 28, 2023

chipweinberger commented Mar 10, 2023

chipweinberger commented Mar 23, 2023

SoucheSouche commented Mar 27, 2023

chipweinberger commented Mar 27, 2023 •

edited

Loading

SoucheSouche commented Apr 3, 2023 •

edited

Loading

chipweinberger commented Apr 3, 2023 •

edited

Loading

SoucheSouche commented Apr 3, 2023 •

edited

Loading

chipweinberger commented Apr 4, 2023 •

edited

Loading

chipweinberger commented Apr 10, 2023

[Heap Trace] Perf: use hash map to speed up leaks mode 34x (IDFGH-9425) #10793

[Heap Trace] Perf: use hash map to speed up leaks mode 34x (IDFGH-9425) #10793

Conversation

chipweinberger commented Feb 17, 2023 • edited Loading

Motivation:

Testing:

Perf:

chipweinberger commented Feb 17, 2023

chipweinberger commented Feb 18, 2023 • edited Loading

SoucheSouche commented Feb 23, 2023

chipweinberger commented Feb 23, 2023 • edited Loading

chipweinberger commented Feb 23, 2023

SoucheSouche Feb 27, 2023

Choose a reason for hiding this comment

chipweinberger Feb 27, 2023 • edited Loading

Choose a reason for hiding this comment

SoucheSouche commented Feb 27, 2023

SoucheSouche commented Feb 27, 2023 • edited Loading

chipweinberger commented Feb 27, 2023 • edited Loading

SoucheSouche commented Feb 28, 2023

chipweinberger commented Feb 28, 2023

SoucheSouche commented Feb 28, 2023 • edited Loading

SoucheSouche commented Feb 28, 2023 • edited Loading

chipweinberger commented Feb 28, 2023

chipweinberger commented Mar 10, 2023

chipweinberger commented Mar 23, 2023

SoucheSouche commented Mar 27, 2023

chipweinberger commented Mar 27, 2023 • edited Loading

SoucheSouche commented Apr 3, 2023 • edited Loading

chipweinberger commented Apr 3, 2023 • edited Loading

SoucheSouche commented Apr 3, 2023 • edited Loading

chipweinberger commented Apr 4, 2023 • edited Loading

chipweinberger commented Apr 10, 2023

chipweinberger commented Feb 17, 2023 •

edited

Loading

chipweinberger commented Feb 18, 2023 •

edited

Loading

chipweinberger commented Feb 23, 2023 •

edited

Loading

chipweinberger Feb 27, 2023 •

edited

Loading

SoucheSouche commented Feb 27, 2023 •

edited

Loading

chipweinberger commented Feb 27, 2023 •

edited

Loading

SoucheSouche commented Feb 28, 2023 •

edited

Loading

SoucheSouche commented Feb 28, 2023 •

edited

Loading

chipweinberger commented Mar 27, 2023 •

edited

Loading

SoucheSouche commented Apr 3, 2023 •

edited

Loading

chipweinberger commented Apr 3, 2023 •

edited

Loading

SoucheSouche commented Apr 3, 2023 •

edited

Loading

chipweinberger commented Apr 4, 2023 •

edited

Loading