[PROF-9476] Reduce heap profiling overhead by using libdatadog's managed string storage #4331

ivoanjo · 2025-01-30T13:46:12Z

What does this PR do?

This PR changes the heap profiler to use the new libdatadog managed string storage API added in DataDog/libdatadog#725 . (And thanks to Alex for starting this work in #3628 )

Prior to this PR, the heap profiler was doing all of the work of keeping stack traces around for the objects being tracked. This involved keeping copies of all strings for path / function names, and also having a clever deduplication mechanism that allowed us to dedup stack traces without first copying all of the strings.

The libdatadog managed string storage allows us to simplify this work by giving us a string table we can use to store strings and trade them for long-lived ids. Then these ids can be used then adding samples to libdatadog, instead of direct pointers to the strings.

This has a bunch of advantages:

We were not previously deduplicating strings in the heap profiler, only whole stacks.

Thus if we had /path/a.rb:method_a -> /path/b.rb:method_b -> /path/c.rb:method_c multiple times, we would deduplicate them. But if we also had /path/a.rb:method_a -> /path/b.rb:method_b, then we'd store yet another copy of these strings. This is no longer the case when using the new API.
Representing stacks as arrays of ids means deduplication, hashing and comparison is much faster -- we're only comparing ids and not the contents of strings.
When feeding stacks again to libdatadog, it's cheaper to use ids since libdatadog doesn't need to keep re-hashing the strings to de-duplicate them; it can use the ids to do that.

This does mean that this PR is a bit... on the noisy side in terms of changes to the heap profiler. Specifically, it takes advantage of most of the simplification opportunities that managed string stable allows:

It merges heap_record and heap_stack structures as they no longer need to exist independently, and then rips out as much code as possible based on that merge
It simplifies stack comparison and hashing

Update: As discussed, I've pushed a commit to set experimental_heap_sample_rate to 1, raising the heap sampling rate from its previous default of 10.

Motivation:

The key objective of this work is allowing us to reduce the memory and cpu overhead of heap profiling.

~~Earlier versions of this PR already showed reduced memory usage. I decided to open this PR already and do benchmarking in parallel, so I don't yet have updated numbers yet.~~

See below.

Change log entry

Yes. Reduce overhead of heap profiling

Additional Notes:

Currently CI is broken for this PR because it depends on DataDog/libdatadog#844 which has not yet been merged into libdatadog.

Obviously I plan to make sure that PR is in libdatadog, and that master is using the latest libdatadog before merging this PR, but I don't think it makes sense to hold off on review until that's the case.

How to test the change?

The existing test coverage was already quite reasonable. I added a few more tests to make sure that our stack recording/retrieval/deduping was in good shape, as well as a benchmark for the efficiency of the interning operations.

Benchmarks:

I wrote up my results in this doc (Datadog-internal). Here's a quick recap of what I saw.

High load gitlab:

Here, we’re comparing 4 configurations:

baseline: Unmodified application, no profiling at all
only-profiling-heap: Application running cpu + wall-time + allocation + heap profiling, without managed string storage; using heap sampling defaults
only-profiling-heap-mss: Same as above, but using managed string storage branch
only-profiling-heap-mss-sr1: Same as above, but with heap profiling sampling 1:1 with allocation profiler

Configuration Min memory RSS (MiB) Max memory RSS (MiB) Avg memory RSS (MiB) AVG virtual memory (GiB)

baseline worker 0 893 908 900 1.4

baseline worker 1 894 911 902 1.4

only-profiling-heap worker 0 959 985 972 1.7

only-profiling-heap worker 1 959 980 970 1.7

only-profiling-heap-mss worker 0 937 964 955 1.6

only-profiling-heap-mss worker 1 934 962 951 1.6

only-profiling-heap-mss-sr1 worker 0 956 980 968 1.6

only-profiling-heap-mss-sr1 worker 1 946 979 965 1.6

Rails app:

Here, we’re comparing 5 configurations:

baseline: Unmodified application, no profiling at all
candidate-heap-profiling: Application running cpu + wall-time + allocation + heap profiling, without managed string storage; using heap sampling defaults
candidate-heap-profiler-mss: Same as above, but using managed string storage branch
candidate-heap-profiler-mss-sr1: Same as above, but with heap profiling sampling 1:1 with allocation profiler
candidate-heap-profiling-sr1: Additional data point: what if we enabled 1:1 sampling the allocation profiler without the managed string storage?

Still reviewing what to do with them

At some point in the past, class name gathering was optional. This is now removed, but we still had this vestigial `optional_class_name` variable + passing a pointer to it, rather than passing the `class_name` as a value. This commit cleans this up and makes all APIs require/assume there's a `class_name` for an allocated object. (This enables a few other cleanups I want to make next.)

… a pointer Having a pointer back to the stack recorder memory holding the string storage seems a bit weird, although this is a small detail.

**What does this PR do?** This PR updates the crasktracker C code to build with the latest libdatadog changes (in main) that will become part of libdatadog 15. **Motivation:** I'm working on a libdatadog branch and had to do this to unblock my work, so I decided I'll create a PR with it so nobody needs to repeat this work. **Additional Notes:** I'm opening this PR as draft as we shouldn't merge this until libdatadog 15 is out. **How to test the change?** Existing test coverage is enough to validate this.

This is shared by both the stack recorder as well as the heap recorder, so there's no reason for it to live in either and should instead live in a more common location. (I also expect usage of this API may spread to more components in the future.)

A lot of these comments were leftovers from when the last thing we did in `_native_new` was the call to `TypedData_Wrap_Struct`. Refactoring work done over time, as well as evolution of the libdatadog APIs meant we could allocate the Ruby object sooner, and thus a lot of these comments became not relevant.

We don't actually need to keep re-interning these strings in the current version of the string storage; I suspect this was a leftover of the time when the string storage automatically cleaned up strings that were not used in a certain generation, but this is no longer the case. In fact, now that libdatadog relies on counting intern/unintern calls to know which strings to keep, calling `intern` forever on these strings would eventually lead to an overflow, and libdatadog would start returning errors on the calls to `intern`. (In turn, on our side, we'd raise the exception and thus stop profiling which is not amazing)

…ly ids now With the move to interned ids in libdatadog, the same stacks are represented by the same ids, which means the same underlying bytes. Thus, rather than having per-item comparisons and whatnot (which we needed because before we had pointers to strings, and strings were not interned/unique), we can now compare and hash the whole stack at once as a binary blob. This is crucially missing tests! I'll come back to add them, we definitely want some test coverage on this key element.

…ide heap_stack In the current state of the heap profiler, a heap_record is 1:1 with a heap_stack so let's consolidate them.

…to heap_record Refactoring complete, now only `heap_record` is left.

This test no longer makes sense, since it was testing that the two different approaches we had to stack hashing were correct and in sync, and all this was removed now that we use libdatadog's managed string storage.

Since these stacks go through a very different code path, these tests validate that the round trip goes fine.

**What does this PR do?** This PR includes the changes documented in the "Releasing a new version to rubygems.org" part of the README: https://github.com/datadog/libdatadog/tree/main/ruby#releasing-a-new-version-to-rubygemsorg **Motivation:** Enable Ruby to use libdatadog v16.0.1. Of particular interest, this includes improvements to crashtracking and the managed string table needed by DataDog/dd-trace-rb#4331 . **Additional Notes:** N/A **How to test the change?** I've tested this release locally using the changes in DataDog/dd-trace-rb#4353 . As a reminder, new libdatadog releases don't get automatically picked up by dd-trace-rb, so the PR that bumps the Ruby profiler will also test this release against all supported Ruby versions.

**What does this PR do?** This PR upgrades the datadog gem to use libdatadog 16.0.1. It includes a few changes to match breaking API updates in crashtracking. **Motivation:** Libdatadog 16 is needed to unblock #4331 . This version also brings a few crashtracking improvements. **Change log entry** Yes. Upgrade libdatadog dependency to 16.0.1 **Additional Notes:** As usual, I'm opening this PR as a draft as libdatadog 16.0.1 is not yet available on rubygems.org, and I'll come back to re-trigger CI and mark this as non-draft once it is. **How to test the change?** Our existing test coverage includes libdatadog testing, so a green C is good here :)

codecov-commenter · 2025-02-12T09:42:12Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.77%. Comparing base (6021b3d) to head (de97855).
Report is 89 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4331      +/-   ##
==========================================
+ Coverage   97.75%   97.77%   +0.01%     
==========================================
  Files        1351     1362      +11     
  Lines       82684    84824    +2140     
  Branches     4197     4411     +214     
==========================================
+ Hits        80827    82933    +2106     
- Misses       1857     1891      +34

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Rather than comparing intern with intern_all (less useful now that DataDog/libdatadog#844 is merged), let's keep this around as a more generic measurement of managed string storage performance.

…he GVL

ivoanjo · 2025-02-13T10:13:37Z

Don't forget to update the changelog once benchmarking results are ready. (e.g. claim both memory and cpu improvements)

👍 Updated PR description with results!

**What does this PR do?** This reverts commit 569bf56. **Motivation:** We temporarily disabled this benchmark in #4331 for the reasons explained in the comment. Once that PR is in master we can re-enable the benchmark. **Additional Notes:** This sharp edge of our benchmarks setup is annoying, but I'll fight it another day... **How to test the change?** Validate this benchmark is now running in the benchmarking platform results page.

pr-commenter · 2025-02-13T11:03:15Z

Benchmarks

Benchmark execution time: 2025-02-18 11:32:43

Comparing candidate commit de97855 in PR branch ivoanjo/prof-9476-managed-string-storage-try2 with baseline commit 6021b3d in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 33 metrics, 2 unstable metrics.

Given our current benchmark results, we've decided to use some of the headroom we gained with the managed string storage branch to improve heap sampling results. Thus, we'll take a heap sample for every object that the allocation profiler samples, rather than every 10th object. We still expose this control -- via both code and environment variable, so it can be reset back to 10 on a case-by-case basis. BUT if you're considering raising this value, please let us know about it -- we'd love to know your use-case.

ivoanjo · 2025-02-18T11:08:28Z

Update: As discussed, I've pushed a commit to set experimental_heap_sample_rate to 1, raising the heap sampling rate from its previous default of 10.

…ed-string-storage-try2" This reverts commit 744c421, reversing changes made to e9adecb.

…profiling [PROF-11394] Revert changes to heap profiling in PRs #4401, #4376, #4331

…aged-string-storage-try2" This reverts commit ded9fcb, thus re-applying the changes from #4331.

AlexJF and others added 29 commits January 27, 2025 09:35

[PROF-9476] Managed string storage PoC

bfcab7f

Disable deprecated specs

434254e

Still reviewing what to do with them

Fix using incorrect enum entry (even though they're equivalent)

e69c701

Minor: Remove leftover debug prints

4108ca4

Update to match evolving API on the libdatadog side

2e017a0

Move advance_gen call to avoid leaking profile memory if it fails

4d84a5c

Refactor intern_or_raise to avoid having two copies of it

cacfdc2

Store ManagedStringStorage as a value in heap recorder, rather than…

0bdb138

… a pointer Having a pointer back to the stack recorder memory holding the string storage seems a bit weird, although this is a small detail.

Remove leftover function declaration from header

7b64e74

Minor code cleanups

0e652cd

Avoid using struct when declaring types and use typedefs instead

3d16793

Simplify code by inlining the old struct heap_record_update_data

2eabd66

Simplify code by inlining the old struct object_record_update_data

95f3a04

First step of merging heap_stack and heap_record: move all fields ins…

06a9daa

…ide heap_stack In the current state of the heap profiler, a heap_record is 1:1 with a heap_stack so let's consolidate them.

Second step of merging heap_stack and heap_record: rename heap_stack …

378535a

…to heap_record Refactoring complete, now only `heap_record` is left.

Adopt unintern_all to avoid lots of small libdatadog calls

0a729db

Adopt intern_all to avoid lots of small libdatadog calls

284ddc4

Add benchmark for intern vs intern_all

a711484

Minor: Add note about duplicate object id detection

f261140

Improve naming of constant

afb319b

Minor tweaks to comments

0066658

Remove disabled test

46b6391

This test no longer makes sense, since it was testing that the two different approaches we had to stack hashing were correct and in sync, and all this was removed now that we use libdatadog's managed string storage.

Improve test coverage for stacks stored by heap recorder

373856e

Since these stacks go through a very different code path, these tests validate that the round trip goes fine.

ivoanjo requested a review from a team as a code owner January 30, 2025 13:46

ivoanjo mentioned this pull request Feb 6, 2025

[PROF-11306] Upgrade libdatadog dependency to 16.0.1 #4353

Merged

ivoanjo mentioned this pull request Feb 7, 2025

[PROF-11306] Package libdatadog v16.0.1 for Ruby DataDog/libdatadog#864

Merged

ivoanjo added 2 commits February 11, 2025 16:56

Merge branch 'master' into ivoanjo/prof-9476-managed-string-storage-try2

d4bbeb2

Fix to match libdatadog v16.0.1 APIs

032e7a3

ivoanjo added 4 commits February 12, 2025 10:15

Modify managed string storage benchmarking

c29be53

Rather than comparing intern with intern_all (less useful now that DataDog/libdatadog#844 is merged), let's keep this around as a more generic measurement of managed string storage performance.

Add design note explaining why heap_records is useful

7d0acaa

Add test so that heap_frame can be safely compared with memcmp

3a6b887

Move ddog_prof_ManagedStringStorage_advance_gen to happen outside t…

1947ea0

…he GVL

AlexJF approved these changes Feb 12, 2025

View reviewed changes

Temporary disable benchmark to make CI happy

569bf56

ivoanjo mentioned this pull request Feb 13, 2025

[PROF-9476] Revert "Temporary disable benchmark to make CI happy" #4376

Merged

github-actions bot added the core Involves Datadog core libraries label Feb 18, 2025

ivoanjo merged commit 744c421 into master Feb 18, 2025
524 checks passed

ivoanjo deleted the ivoanjo/prof-9476-managed-string-storage-try2 branch February 18, 2025 13:22

github-actions bot added this to the 2.11.0 milestone Feb 18, 2025

ivoanjo added a commit that referenced this pull request Feb 19, 2025

Revert "Merge pull request #4331 from DataDog/ivoanjo/prof-9476-manag…

ded9fcb

…ed-string-storage-try2" This reverts commit 744c421, reversing changes made to e9adecb.

ivoanjo mentioned this pull request Feb 19, 2025

[PROF-11394] Revert changes to heap profiling in PRs #4401, #4376, #4331 #4409

Merged

ivoanjo added a commit that referenced this pull request Feb 19, 2025

Merge pull request #4409 from DataDog/ivoanjo/prof-11394-revert-heap-…

30f6e21

…profiling [PROF-11394] Revert changes to heap profiling in PRs #4401, #4376, #4331

ivoanjo added a commit that referenced this pull request Mar 5, 2025

Re-apply "Merge pull request #4331 from DataDog/ivoanjo/prof-9476-man…

2428c34

…aged-string-storage-try2" This reverts commit ded9fcb, thus re-applying the changes from #4331.

ivoanjo mentioned this pull request Mar 5, 2025

[PROF-11405] Graduate heap profiling from alpha to preview, second try #4460

Open

ivoanjo removed the core Involves Datadog core libraries label Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROF-9476] Reduce heap profiling overhead by using libdatadog's managed string storage #4331

[PROF-9476] Reduce heap profiling overhead by using libdatadog's managed string storage #4331

ivoanjo commented Jan 30, 2025 •

edited

Loading

codecov-commenter commented Feb 12, 2025 •

edited

Loading

ivoanjo commented Feb 13, 2025

pr-commenter bot commented Feb 13, 2025 •

edited

Loading

ivoanjo commented Feb 18, 2025

Configuration	Min memory RSS (MiB)	Max memory RSS (MiB)	Avg memory RSS (MiB)	AVG virtual memory (GiB)
baseline worker 0	893	908	900	1.4
baseline worker 1	894	911	902	1.4
only-profiling-heap worker 0	959	985	972	1.7
only-profiling-heap worker 1	959	980	970	1.7
only-profiling-heap-mss worker 0	937	964	955	1.6
only-profiling-heap-mss worker 1	934	962	951	1.6
only-profiling-heap-mss-sr1 worker 0	956	980	968	1.6
only-profiling-heap-mss-sr1 worker 1	946	979	965	1.6

[PROF-9476] Reduce heap profiling overhead by using libdatadog's managed string storage #4331

[PROF-9476] Reduce heap profiling overhead by using libdatadog's managed string storage #4331

Conversation

ivoanjo commented Jan 30, 2025 • edited Loading

codecov-commenter commented Feb 12, 2025 • edited Loading

Codecov Report

ivoanjo commented Feb 13, 2025

pr-commenter bot commented Feb 13, 2025 • edited Loading

Benchmarks

ivoanjo commented Feb 18, 2025

ivoanjo commented Jan 30, 2025 •

edited

Loading

codecov-commenter commented Feb 12, 2025 •

edited

Loading

pr-commenter bot commented Feb 13, 2025 •

edited

Loading