From d30cb1878c8603cce5a71ec66104d39ede19ef8d Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Mon, 2 Dec 2024 10:04:39 +0000 Subject: [PATCH 1/4] Minor: Disable gc profiling in specs to avoid warnings On Ruby 3.0, the default for `gc_enabled = true` emits a warning stating it can't be used. This warning is a bit annoying in our tests; let's disable it and only rely on it being enabled on the specs that are actually testing this feature. --- spec/datadog/profiling/component_spec.rb | 2 ++ 1 file changed, 2 insertions(+) diff --git a/spec/datadog/profiling/component_spec.rb b/spec/datadog/profiling/component_spec.rb index f91d021c865..573c76b8bdf 100644 --- a/spec/datadog/profiling/component_spec.rb +++ b/spec/datadog/profiling/component_spec.rb @@ -52,6 +52,8 @@ skip_if_profiling_not_supported(self) settings.profiling.enabled = true + # Disabled to avoid warnings on Rubies where it's not supported; there's separate specs that test it when enabled + settings.profiling.advanced.gc_enabled = false allow(profiler_setup_task).to receive(:run) end From ea1a525fa369dcc60de04a0aef8e332c21b4e830 Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Mon, 2 Dec 2024 10:09:01 +0000 Subject: [PATCH 2/4] Fix outdated comment It's no longer true that we don't use the new Ruby profiler on Ruby 2.5; what's true is that we enable the "no signals workaround" to avoid the potentially-problematic code path. --- ext/datadog_profiling_native_extension/private_vm_api_access.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ext/datadog_profiling_native_extension/private_vm_api_access.c b/ext/datadog_profiling_native_extension/private_vm_api_access.c index dbbebd531d0..59e5c644882 100644 --- a/ext/datadog_profiling_native_extension/private_vm_api_access.c +++ b/ext/datadog_profiling_native_extension/private_vm_api_access.c @@ -158,7 +158,7 @@ bool is_current_thread_holding_the_gvl(void) { // // Thus an incorrect `is_current_thread_holding_the_gvl` result may lead to issues inside `rb_postponed_job_register_one`. // - // For this reason we currently do not enable the new Ruby profiler on Ruby 2.5 by default, and we print a + // For this reason we default to use the "no signals workaround" on Ruby 2.5 by default, and we print a // warning when customers force-enable it. bool gvl_acquired = vm->gvl.acquired != 0; rb_thread_t *current_owner = vm->running_thread; From 26a70ad47bbd92868f2f3e757fc9d1ed992a3551 Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Mon, 2 Dec 2024 10:18:03 +0000 Subject: [PATCH 3/4] [PROF-10978] Require Ruby 3.1+ for heap profiling **What does this PR do?** This PR raises the minimum Ruby version required for heap profiling from the previous value of >= 2.7 to >= 3.1 due to a new VM bug discovered (see below for details). It's mostly a revert of #3366, where we had first tried to workaround a Ruby 2.7/3.0 bug, but it turns out we missed a spot, and we could trigger VM crashes because of that. **Motivation:** Ruby versions prior to 3.1 had a special optimization called `rb_gc_force_recycle` which would allow objects to directly be garbage collected (e.g. without needing to wait for the GC). It turns out that `rb_gc_force_recycle` did not play well with the changes in Ruby 2.7 to how object ids worked. We uncovered this earlier on during the development of the heap profiler, and put in a workaround for the bug that we thought was enough... Unfortunately, it turns out that the workaround is not enough. The following reproducer, when run on Ruby 2.7 or 3.0 shows how the Ruby VM can segfault inside `id2ref` due to the issue above: ```ruby puts RUBY_DESCRIPTION require "datadog" require "objspace" require "pry" NUM_OBJECTS = 10_000_000 recycled_ids = Array.new(NUM_OBJECTS) { 123 } many_objects = Array.new(NUM_OBJECTS) { Object.new } (0...NUM_OBJECTS).each do |i| recycled_ids[i] = many_objects[i].object_id end puts "Seeded objects!" gets (0...NUM_OBJECTS).each do |i| Datadog::Profiling::StackRecorder::Testing._native_gc_force_recycle(many_objects[i]) many_objects[i] = nil end puts GC.stat puts "Recycled objects!" gets many_objects = nil 10.times { GC.start } Array.new(10_000) { Object.new } 10.times { GC.start } puts GC.stat puts "GC'd objects! (Ruby should have released pages?)" gets recycled_ids.each { |i| begin (nil == ObjectSpace._id2ref(i)) rescue nil end } puts "Done!" ``` Crash details: ``` Program received signal SIGSEGV, Segmentation fault. is_swept_object (ptr=93825033355200, objspace=) at gc.c:3868 3868 return page->flags.before_sweep ? FALSE : TRUE; (gdb) bt #0 is_swept_object (ptr=93825033355200, objspace=) at gc.c:3868 #1 is_garbage_object (objspace=0x55555555d220, objspace=0x55555555d220, ptr=93825033355200) at gc.c:3887 #2 is_live_object (ptr=93825033355200, objspace=0x55555555d220) at gc.c:3909 #3 is_live_object (ptr=93825033355200, objspace=0x55555555d220) at gc.c:3898 #4 id2ref (objid=8264881) at gc.c:3999 #5 os_id2ref (os=, objid=) at gc.c:4019 ``` This crash happens because of two things: 1. Ruby does not clean the object id entry for a recycled object from its internal hash map 2. If the memory page where the object lived is returned back to the OS, trying to `id2ref` on that id will cause Ruby to try to read invalid memory and crash. **Additional Notes:** I've chosen to disable heap profiling on 2.7 and 3.0 because I can't think of a good workaround for the bug above, especially not one that does not increase the overhead of heap profiling. **How to test the change?** This PR updates the test coverage to expect Ruby 3.1+ as the minimum for the feature. You can also quickly validate it doesn't get enabled on the older Rubies using: ``` $ DD_PROFILING_ENABLED=true DD_PROFILING_EXPERIMENTAL_HEAP_ENABLED=true bundle exec ddprofrb exec ruby -e "puts RUBY_DESCRIPTION" W, [2024-12-02T10:42:28.771611 #112585] WARN -- datadog: [datadog] Current Ruby version (3.0.5) cannot support heap profiling due to VM bugs/limitations. Please upgrade to Ruby >= 3.1 in order to use this feature. Heap profiling has been disabled. ``` --- .../extconf.rb | 8 -- .../heap_recorder.c | 96 +------------ .../stack_recorder.c | 34 ----- lib/datadog/profiling/component.rb | 20 +-- spec/datadog/profiling/component_spec.rb | 23 +--- spec/datadog/profiling/stack_recorder_spec.rb | 130 ------------------ 6 files changed, 14 insertions(+), 297 deletions(-) diff --git a/ext/datadog_profiling_native_extension/extconf.rb b/ext/datadog_profiling_native_extension/extconf.rb index 4acadd68215..bac21c65b69 100644 --- a/ext/datadog_profiling_native_extension/extconf.rb +++ b/ext/datadog_profiling_native_extension/extconf.rb @@ -170,11 +170,6 @@ def skip_building_extension!(reason) # On older Rubies, there was no jit_return member on the rb_control_frame_t struct $defs << "-DNO_JIT_RETURN" if RUBY_VERSION < "3.1" -# On older Rubies, rb_gc_force_recycle allowed to free objects in a way that -# would be invisible to free tracepoints, finalizers and without cleaning -# obj_to_id_tbl mappings. -$defs << "-DHAVE_WORKING_RB_GC_FORCE_RECYCLE" if RUBY_VERSION < "3.1" - # On older Rubies, there are no Ractors $defs << "-DNO_RACTORS" if RUBY_VERSION < "3" @@ -184,9 +179,6 @@ def skip_building_extension!(reason) # On older Rubies, objects would not move $defs << "-DNO_T_MOVED" if RUBY_VERSION < "2.7" -# On older Rubies, there was no RUBY_SEEN_OBJ_ID flag -$defs << "-DNO_SEEN_OBJ_ID_FLAG" if RUBY_VERSION < "2.7" - # On older Rubies, rb_global_vm_lock_struct did not include the owner field $defs << "-DNO_GVL_OWNER" if RUBY_VERSION < "2.6" diff --git a/ext/datadog_profiling_native_extension/heap_recorder.c b/ext/datadog_profiling_native_extension/heap_recorder.c index ea1c2ee5e4c..d186a1a0fba 100644 --- a/ext/datadog_profiling_native_extension/heap_recorder.c +++ b/ext/datadog_profiling_native_extension/heap_recorder.c @@ -7,10 +7,6 @@ #include "libdatadog_helpers.h" #include "time_helpers.h" -#if (defined(HAVE_WORKING_RB_GC_FORCE_RECYCLE) && ! defined(NO_SEEN_OBJ_ID_FLAG)) - #define CAN_APPLY_GC_FORCE_RECYCLE_BUG_WORKAROUND -#endif - // Minimum age (in GC generations) of heap objects we want to include in heap // recorder iterations. Object with age 0 represent objects that have yet to undergo // a GC and, thus, may just be noise/trash at instant of iteration and are usually not @@ -123,9 +119,6 @@ typedef struct { // Pointer to the (potentially partial) object_record containing metadata about an ongoing recording. // When NULL, this symbolizes an unstarted/invalid recording. object_record *object_record; - // A flag to track whether we had to force set the RUBY_FL_SEEN_OBJ_ID flag on this object - // as part of our workaround around rb_gc_force_recycle issues. - bool did_recycle_workaround; } recording; struct heap_recorder { @@ -342,46 +335,12 @@ void start_heap_allocation_recording(heap_recorder *heap_recorder, VALUE new_obj rb_raise(rb_eRuntimeError, "Detected a bignum object id. These are not supported by heap profiling."); } - bool did_recycle_workaround = false; - - #ifdef CAN_APPLY_GC_FORCE_RECYCLE_BUG_WORKAROUND - // If we are in a ruby version that has a working rb_gc_force_recycle implementation, - // its usage may lead to an object being re-used outside of the typical GC cycle. - // - // This re-use is in theory invisible to us unless we're lucky enough to sample both - // the original object and the replacement that uses the recycled slot. - // - // In practice, we've observed (https://github.com/DataDog/dd-trace-rb/pull/3366) - // that non-noop implementations of rb_gc_force_recycle have an implementation bug - // which results in the object that re-used the recycled slot inheriting the same - // object id without setting the FL_SEEN_OBJ_ID flag. We rely on this knowledge to - // "observe" implicit frees when an object we are tracking is force-recycled. - // - // However, it may happen that we start tracking a new object and that object was - // allocated on a recycled slot. Due to the bug, this object would be missing the - // FL_SEEN_OBJ_ID flag even though it was not recycled itself. If we left it be, - // when we're doing our liveness check, the absence of the flag would trigger our - // implicit free workaround and the object would be inferred as recycled even though - // it might still be alive. - // - // Thus, if we detect that this new allocation is already missing the flag at the start - // of the heap allocation recording, we force-set it. This should be safe since we - // just called rb_obj_id on it above and the expectation is that any flaggable object - // that goes through it ends up with the flag set (as evidenced by the GC_ASSERT - // lines in https://github.com/ruby/ruby/blob/4a8d7246d15b2054eacb20f8ab3d29d39a3e7856/gc.c#L4050C14-L4050C14). - if (RB_FL_ABLE(new_obj) && !RB_FL_TEST(new_obj, RUBY_FL_SEEN_OBJ_ID)) { - RB_FL_SET(new_obj, RUBY_FL_SEEN_OBJ_ID); - did_recycle_workaround = true; - } - #endif - heap_recorder->active_recording = (recording) { .object_record = object_record_new(FIX2LONG(ruby_obj_id), NULL, (live_object_data) { .weight = weight * heap_recorder->sample_rate, .class = alloc_class != NULL ? string_from_char_slice(*alloc_class) : NULL, .alloc_gen = rb_gc_count(), - }), - .did_recycle_workaround = did_recycle_workaround, + }), }; } @@ -685,41 +644,6 @@ static int st_object_record_update(st_data_t key, st_data_t value, st_data_t ext // If we got this far, then we found a valid live object for the tracked id. - #ifdef CAN_APPLY_GC_FORCE_RECYCLE_BUG_WORKAROUND - // If we are in a ruby version that has a working rb_gc_force_recycle implementation, - // its usage may lead to an object being re-used outside of the typical GC cycle. - // - // This re-use is in theory invisible to us and would mean that the ref from which we - // collected the object_record metadata may not be the same as the current ref and - // thus any further reporting would be innacurately attributed to stale metadata. - // - // In practice, there is a way for us to notice that this happened because of a bug - // in the implementation of rb_gc_force_recycle. Our heap profiler relies on object - // ids and id2ref to detect whether objects are still alive. Turns out that when an - // object with an id is re-used via rb_gc_force_recycle, it will "inherit" the ID - // of the old object but it will NOT have the FL_SEEN_OBJ_ID as per the experiment - // in https://github.com/DataDog/dd-trace-rb/pull/3360#discussion_r1442823517 - // - // Thus, if we detect that the ref we just resolved above is missing this flag, we can - // safely say re-use happened and thus treat it as an implicit free of the object - // we were tracking (the original one which got recycled). - if (RB_FL_ABLE(ref) && !RB_FL_TEST(ref, RUBY_FL_SEEN_OBJ_ID)) { - - // NOTE: We don't really need to set this flag for heap recorder to work correctly - // but doing so partially mitigates a bug in runtimes with working rb_gc_force_recycle - // which leads to broken invariants and leaking of entries in obj_to_id and id_to_obj - // tables in objspace. We already do the same thing when we sample a recycled object, - // here we apply it as well to objects that replace recycled objects that were being - // tracked. More details in https://github.com/DataDog/dd-trace-rb/pull/3366 - RB_FL_SET(ref, RUBY_FL_SEEN_OBJ_ID); - - on_committed_object_record_cleanup(recorder, record); - recorder->stats_last_update.objects_dead++; - return ST_DELETE; - } - - #endif - if ( recorder->size_enabled && recorder->update_include_old && // We only update sizes when doing a full update @@ -803,18 +727,12 @@ static int update_object_record_entry(DDTRACE_UNUSED st_data_t *key, st_data_t * object_record *new_object_record = recording.object_record; if (existing) { object_record *existing_record = (object_record*) (*value); - if (recording.did_recycle_workaround) { - // In this case, it's possible for an object id to be re-used and we were lucky enough to have - // sampled both the original object and the replacement so cleanup the old one and replace it with - // the new object_record (i.e. treat this as a combined free+allocation). - on_committed_object_record_cleanup(update_data->heap_recorder, existing_record); - } else { - // This is not supposed to happen, raising... - VALUE existing_inspect = object_record_inspect(existing_record); - VALUE new_inspect = object_record_inspect(new_object_record); - rb_raise(rb_eRuntimeError, "Object ids are supposed to be unique. We got 2 allocation recordings with " - "the same id. previous=%"PRIsVALUE" new=%"PRIsVALUE, existing_inspect, new_inspect); - } + + // This is not supposed to happen, raising... + VALUE existing_inspect = object_record_inspect(existing_record); + VALUE new_inspect = object_record_inspect(new_object_record); + rb_raise(rb_eRuntimeError, "Object ids are supposed to be unique. We got 2 allocation recordings with " + "the same id. previous=%"PRIsVALUE" new=%"PRIsVALUE, existing_inspect, new_inspect); } // Always carry on with the update, we want the new record to be there at the end (*value) = (st_data_t) new_object_record; diff --git a/ext/datadog_profiling_native_extension/stack_recorder.c b/ext/datadog_profiling_native_extension/stack_recorder.c index 31165ba8b3b..710b17356e2 100644 --- a/ext/datadog_profiling_native_extension/stack_recorder.c +++ b/ext/datadog_profiling_native_extension/stack_recorder.c @@ -258,8 +258,6 @@ static VALUE _native_check_heap_hashes(DDTRACE_UNUSED VALUE _self, VALUE locatio static VALUE _native_start_fake_slow_heap_serialization(DDTRACE_UNUSED VALUE _self, VALUE recorder_instance); static VALUE _native_end_fake_slow_heap_serialization(DDTRACE_UNUSED VALUE _self, VALUE recorder_instance); static VALUE _native_debug_heap_recorder(DDTRACE_UNUSED VALUE _self, VALUE recorder_instance); -static VALUE _native_gc_force_recycle(DDTRACE_UNUSED VALUE _self, VALUE obj); -static VALUE _native_has_seen_id_flag(DDTRACE_UNUSED VALUE _self, VALUE obj); static VALUE _native_stats(DDTRACE_UNUSED VALUE self, VALUE instance); static VALUE build_profile_stats(profile_slot *slot, long serialization_time_ns, long heap_iteration_prep_time_ns, long heap_profile_build_time_ns); static VALUE _native_is_object_recorded(DDTRACE_UNUSED VALUE _self, VALUE recorder_instance, VALUE object_id); @@ -297,10 +295,6 @@ void stack_recorder_init(VALUE profiling_module) { _native_end_fake_slow_heap_serialization, 1); rb_define_singleton_method(testing_module, "_native_debug_heap_recorder", _native_debug_heap_recorder, 1); - rb_define_singleton_method(testing_module, "_native_gc_force_recycle", - _native_gc_force_recycle, 1); - rb_define_singleton_method(testing_module, "_native_has_seen_id_flag", - _native_has_seen_id_flag, 1); rb_define_singleton_method(testing_module, "_native_is_object_recorded?", _native_is_object_recorded, 2); rb_define_singleton_method(testing_module, "_native_heap_recorder_reset_last_update", _native_heap_recorder_reset_last_update, 1); rb_define_singleton_method(testing_module, "_native_recorder_after_gc_step", _native_recorder_after_gc_step, 1); @@ -1006,34 +1000,6 @@ static VALUE _native_debug_heap_recorder(DDTRACE_UNUSED VALUE _self, VALUE recor return heap_recorder_testonly_debug(state->heap_recorder); } -#pragma GCC diagnostic push -// rb_gc_force_recycle was deprecated in latest versions of Ruby and is a noop. -#pragma GCC diagnostic ignored "-Wdeprecated-declarations" -#pragma GCC diagnostic ignored "-Wunused-parameter" -// This method exists only to enable testing Datadog::Profiling::StackRecorder behavior using RSpec. -// It SHOULD NOT be used for other purposes. -static VALUE _native_gc_force_recycle(DDTRACE_UNUSED VALUE _self, VALUE obj) { - #ifdef HAVE_WORKING_RB_GC_FORCE_RECYCLE - rb_gc_force_recycle(obj); - #endif - return Qnil; -} -#pragma GCC diagnostic pop - -// This method exists only to enable testing Datadog::Profiling::StackRecorder behavior using RSpec. -// It SHOULD NOT be used for other purposes. -static VALUE _native_has_seen_id_flag(DDTRACE_UNUSED VALUE _self, VALUE obj) { - #ifndef NO_SEEN_OBJ_ID_FLAG - if (RB_FL_TEST(obj, RUBY_FL_SEEN_OBJ_ID)) { - return Qtrue; - } else { - return Qfalse; - } - #else - return Qfalse; - #endif -} - static VALUE _native_stats(DDTRACE_UNUSED VALUE self, VALUE recorder_instance) { struct stack_recorder_state *state; TypedData_Get_Struct(recorder_instance, struct stack_recorder_state, &stack_recorder_typed_data, state); diff --git a/lib/datadog/profiling/component.rb b/lib/datadog/profiling/component.rb index 19239a70c78..e0bafc0cd3f 100644 --- a/lib/datadog/profiling/component.rb +++ b/lib/datadog/profiling/component.rb @@ -207,28 +207,16 @@ def self.build_profiler_component(settings:, agent_settings:, optional_tracer:) return false unless heap_profiling_enabled - if RUBY_VERSION.start_with?("2.") && RUBY_VERSION < "2.7" + if RUBY_VERSION < "3.1" Datadog.logger.warn( - "Heap profiling currently relies on features introduced in Ruby 2.7 and will be forcibly disabled. " \ - "Please upgrade to Ruby >= 2.7 in order to use this feature." + "Current Ruby version (#{RUBY_VERSION}) cannot support heap profiling due to VM bugs/limitations. " \ + "Please upgrade to Ruby >= 3.1 in order to use this feature. Heap profiling has been disabled." ) return false end - if RUBY_VERSION < "3.1" - Datadog.logger.debug( - "Current Ruby version (#{RUBY_VERSION}) supports forced object recycling which has a bug that the " \ - "heap profiler is forced to work around to remain accurate. This workaround requires force-setting " \ - "the SEEN_OBJ_ID flag on objects that should have it but don't. Full details can be found in " \ - "https://github.com/DataDog/dd-trace-rb/pull/3360. This workaround should be safe but can be " \ - "bypassed by disabling the heap profiler or upgrading to Ruby >= 3.1 where forced object recycling " \ - "was completely removed (https://bugs.ruby-lang.org/issues/18290)." - ) - end - unless allocation_profiling_enabled - raise ArgumentError, - "Heap profiling requires allocation profiling to be enabled" + raise ArgumentError, "Heap profiling requires allocation profiling to be enabled" end Datadog.logger.warn( diff --git a/spec/datadog/profiling/component_spec.rb b/spec/datadog/profiling/component_spec.rb index 573c76b8bdf..2320c482ce6 100644 --- a/spec/datadog/profiling/component_spec.rb +++ b/spec/datadog/profiling/component_spec.rb @@ -308,15 +308,15 @@ stub_const("RUBY_VERSION", testing_version) end - context "on a Ruby older than 2.7" do - let(:testing_version) { "2.6" } + context "on a Ruby older than 3.1" do + let(:testing_version) { "3.0" } it "initializes StackRecorder without heap sampling support and warns" do expect(Datadog::Profiling::StackRecorder).to receive(:new) .with(hash_including(heap_samples_enabled: false, heap_size_enabled: false)) .and_call_original - expect(Datadog.logger).to receive(:warn).with(/upgrade to Ruby >= 2.7/) + expect(Datadog.logger).to receive(:warn).with(/upgrade to Ruby >= 3.1/) build_profiler_component end @@ -367,23 +367,6 @@ build_profiler_component end end - - context "on a Ruby older than 3.1" do - let(:testing_version) { "2.7" } - - it "initializes StackRecorder with heap sampling support but shows warning and debug messages" do - expect(Datadog::Profiling::StackRecorder).to receive(:new) - .with(hash_including(heap_samples_enabled: true)) - .and_call_original - - expect(Datadog.logger).to receive(:debug).with(/Enabled allocation profiling/) - expect(Datadog.logger).to receive(:warn).with(/experimental heap profiling/) - expect(Datadog.logger).to receive(:warn).with(/experimental heap size profiling/) - expect(Datadog.logger).to receive(:debug).with(/forced object recycling.+upgrading to Ruby >= 3.1/) - - build_profiler_component - end - end end end diff --git a/spec/datadog/profiling/stack_recorder_spec.rb b/spec/datadog/profiling/stack_recorder_spec.rb index 9b2c644a192..9f1ab10e6a0 100644 --- a/spec/datadog/profiling/stack_recorder_spec.rb +++ b/spec/datadog/profiling/stack_recorder_spec.rb @@ -666,136 +666,6 @@ def sample_allocation(obj) end end - context "on Rubies supporting rb_gc_force_recycle" do - before do - skip "rb_gc_force_recycle is a no-op in current Ruby version" if RUBY_VERSION >= "3.1" - @recycled_sample_allocation_line = 0 - end - - def has_seen_id_flag(obj) - described_class::Testing._native_has_seen_id_flag(obj) - end - - # This method attempts to allocate an object on a recycled heap slot. - # - # Heap slot recycling was a troublesome feature that has been removed from Rubies >= 3.1 - # in which an object could be freed through a fast-path that bypassed a lot of runtime - # machinery such as finalizers or object id tracking and thus introduced a fair amount - # of buggy behaviour. Some of this buggy behaviour manifests when a recycled slot gets - # re-used by a new live object: the new live object id will be the same as the id of - # the object that was recycled, violating a core constraint of Ruby objects: object ids - # are unique and non-repeatable. - # - # Recycling an object slot is easy (accomplished by a rb_gc_force_recycle native method call). - # More difficult is allocating an object on a recycled slot. Ruby gives us no control on - # where to allocate an object so we have to play a probability game. This method attempts to - # maximize our chances of quickly getting an object in a recycled slot by: - # 1. Force recycling 1000 objects. - # 2. Repeatedly allocating 1000 objects and keeping references to them, thus preventing GC - # from reclaiming their slots. - # 3. Checking if any of the ids of the 1000 recycled objects now map to a live object. If - # that happens, then we know that live object was allocated on a recycled slot and we - # can return it. - def create_obj_in_recycled_slot(should_sample_original: false) - # Force-recycle 1000 objects. - # NOTE: In theory, a single force recycle would suffice but the more recycled slots - # there are to use the more probable it is for a new allocation to use it. - recycled_obj_ids = [] - 1000.times do - obj = Object.new - sample_allocation(obj) if should_sample_original - @recycled_sample_allocation_line = __LINE__ - 1 - - # Get the id of the object we're about to recycle - recycled_obj_ids << obj.object_id - - # Force recycle the given object - described_class::Testing._native_gc_force_recycle(obj) - end - - # Repeatedly allocate objects until we find one that resolves to the id of one of - # the force recycled objects - objs = [] - 100.times do - # Instead of doing this one at a time which would be slow given id2ref will - # raise on failure, allocate a ton of objects each time, increasing the - # probability of getting a hit on each iteration - # NOTE: We keep the object references around to prevent GCs from constantly - # freeing up slots from the previous iteration. Thus each consecutive - # iteration should get one step closer to re-using one of the recycled - # slots. This should not lead to OOMs since we know there are 1000 - # free recycled slots available (we recycled them above). At the very - # limit we'd expect the Ruby VM to prefer to re-use those slots rather - # than expand heap pages and when that happens we'd stop iterating. - 1000.times { objs << Object.new } - recycled_obj_ids.each do |obj_id| - return ObjectSpace._id2ref(obj_id) - rescue RangeError # rubocop:disable Lint/SuppressedException - end - end - raise "could not allocate an object in a recycled slot" - end - - it "enforces seen id flag on objects on recycled slots that get sampled" do - recycled_obj = create_obj_in_recycled_slot - - expect(has_seen_id_flag(recycled_obj)).to be false - - sample_allocation(recycled_obj) - - expect(has_seen_id_flag(recycled_obj)).to be true - end - - it "enforces seen id flag on untracked objects that replace tracked recycled objects" do - recycled_obj = create_obj_in_recycled_slot(should_sample_original: true) - - expect(has_seen_id_flag(recycled_obj)).to be false - - serialize - - expect(has_seen_id_flag(recycled_obj)).to be true - end - - it "correctly handles lifecycle of objects on recycled slots that get sampled" do - recycled_obj = create_obj_in_recycled_slot - - sample_allocation(recycled_obj) - sample_line = __LINE__ - 1 - - GC.start # Ensure recycled sample has age > 0 so it shows up in serialized profile - - recycled_sample = heap_samples.find { |s| s.has_location?(path: __FILE__, line: sample_line) } - expect(recycled_sample).not_to be nil - end - - it "supports allocation samples with duplicate ids due to force recycling" do - recycled_obj = create_obj_in_recycled_slot(should_sample_original: true) - - expect { sample_allocation(recycled_obj) }.not_to raise_error - end - - it "raises on allocation samples with duplicate ids that are not due to force recycling" do - obj = Object.new - - sample_allocation(obj) - - expect { sample_allocation(obj) }.to raise_error(/supposed to be unique/) - end - - it "can detect implicit frees due to slot recycling" do - live_objects = [] - live_objects << create_obj_in_recycled_slot(should_sample_original: true) - - # If we act on implicit frees, then we assume that even though there's a live object - # in the same slot as the original one we were tracking, we'll be able to detect this - # recycling, clean up that record and not include it in the final heap samples - relevant_sample = heap_samples.find do |s| - s.has_location?(path: __FILE__, line: @recycled_sample_allocation_line) - end - expect(relevant_sample).to be nil - end - end - # NOTE: This is a regression test that exceptions in end_heap_allocation_recording_with_rb_protect are safely # handled by the stack_recorder. context "when the heap sampler raises an exception during _native_sample" do From d760e052a723f450165648fceed1c76cbb8cb574 Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Mon, 2 Dec 2024 12:42:13 +0000 Subject: [PATCH 4/4] Minor: Tweak text in error message --- lib/datadog/profiling/component.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/datadog/profiling/component.rb b/lib/datadog/profiling/component.rb index e0bafc0cd3f..86f8fb798df 100644 --- a/lib/datadog/profiling/component.rb +++ b/lib/datadog/profiling/component.rb @@ -209,7 +209,7 @@ def self.build_profiler_component(settings:, agent_settings:, optional_tracer:) if RUBY_VERSION < "3.1" Datadog.logger.warn( - "Current Ruby version (#{RUBY_VERSION}) cannot support heap profiling due to VM bugs/limitations. " \ + "Current Ruby version (#{RUBY_VERSION}) cannot support heap profiling due to VM limitations. " \ "Please upgrade to Ruby >= 3.1 in order to use this feature. Heap profiling has been disabled." ) return false