-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate trace through Concurrent::Promises.future #1522
Conversation
Duh, one lousy test on my part. Will fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me quite a bit to untangle the many bug fixes and features in this one commit:
- Support
Concurrent::Promises::Future
- Args were not being passed to tasks by the ContextCompositeExecutorService
- Switch to prepend in FuturePatch to avoid (potential?) infinite recursion
- Support wrapping the global executors (e.g. whenever the future gets a
executer: :io
argument) - Fix piecemeal requires of concurrent-ruby modules in the integration specs which are not supported by upstream
Some of the items were alluded in the commit message, but I admit that I couldn't quite understand what they were until I reverse engineered them from the diff.
Having all these changes together made it quite hard for me to review the PR, as I had to separate them out for myself to understand why each change was there and why it was needed.
It may not be worth doing the separation after the fact for this PR (up to you), but I'd really like to see such a separation in the future, as it would make non-trivial changes such as this one a lot easier to review.
parent_context = datadog_configuration.tracer.provider.context | ||
|
||
@composited_executor.post(*args) do | ||
executor = @composited_executor.is_a?(Symbol) ? Concurrent.executor(@composited_executor) : @composited_executor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move this check to #initialize
, since these pools are global.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default executor pools are global but in Concurrent::Promises
a custom executor instance can be passed as argument in some public API methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. If that's the case, how would it affect this line of code (vs if it was executed at initialization time)? e.g. would @composited_executor
or Concurrent.executor(@composited_executor)
change based on the promise configuration?
I wouldn't quite expect them too (??).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I believe you are correct, my mind fumbled and for some reason I thought you were referring to another initializer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm in fact I now recall it's the Concurrent.executor
caller behaviour I'm more concerned with. Doing the call here is consistent with the moment where the same method is called in evaluate_to
(it's just a little bit earlier in the call stack), whereas if called in initialize
the timeline is sensibly different, including being called in another thread entirely. With the current implementation of concurrent-ruby
it does not look like it could be a problem, but I feel it would be introducing some brittleness by depending on a second internal behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that usually things break in the ways we didn't think of (lolol) but it's harmless so I'm happy to keep it -- I just asked to make sure that I wasn't missing anything :)
# DEV We save an unmodified copy of Concurrent::Future. | ||
let!(:unmodified_promises) { ::Concurrent::Promises.dup } | ||
let!(:unmodified_future) { ::Concurrent::Future.dup } | ||
|
||
# DEV We then restore Concurrent::Future, a dangerous game. | ||
after do | ||
::Concurrent.send(:remove_const, :Promises) | ||
::Concurrent.const_set('Promises', unmodified_promises) | ||
::Concurrent.send(:remove_const, :Future) | ||
::Concurrent.const_set('Future', unmodified_future) | ||
remove_patch!(:concurrent_ruby) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could use RSpec's constant stubbing for this purpose, rather than doing our own homegrown constant stubbing implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that we want to test on the real constant (integration testing) but we also want to be able to restore the original inheritance chain (Ideally we would just remove the module from the chain, but that's not possible in Ruby. I had on occasion resorted to inserting dynamically created modules, and empty them afterwards).
Are you suggesting we could do something like this to achieve a similar effect?
stub_const('Concurrent::Promises', ::Concurrent::Promises.dup)
stub_const('Concurrent::Future', ::Concurrent::Future.dup)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that we want to test on the real constant (integration testing) but we also want to be able to restore the original inheritance chain (Ideally we would just remove the module from the chain, but that's not possible in Ruby. I had on occasion resorted to inserting dynamically created modules, and empty them afterwards).
To be honest, I think we could just dependency inject the modules to modify and have the production ones as defaults, but that's usually not what has been done in the gem historically (I also think we shouldn't have as many module functions) so I didn't suggest going that far :)
Are you suggesting we could do something like this to achieve a similar effect?
Yeah, I was thinking that they should be equivalent. I can't think of any cases where it wouldn't be -- but it may just be an unknown-unknown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This after
block is pure dark magic.
It works, but no one knows if it will continue to after the next change.
If you can find a better way to reset patching, please go ahead and make that change.
Alternatively, this is the only integration that does this: maybe such strict testing is not truly required?
Either way, you might also just leave it as is if you can't find a good alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to stub_const
which seems to work well, and is more readable.
I do have some facilities to handle this in the upcoming Graft instrumentation, but further change here feels like being way out of scope for this PR. The other ways I can come up with feel even hackier.
As for testing this, pragmatically I definitely found these tests useful to check that this PR works.
That triggered during a rspec run, so unless the behaviour is spec-induced it had a chance of happening for real.
I agree. It turns out the order of discovering the issues and corresponding fixes was basically reversed from what would be a sensible sequence of changes for review, and by that point it wasn't quite straightforward to retcon these into non-broken individual commits, which I feel sorry for not doing. I tried to make the code comments relevant, but I could definitely have been more explicit in the commit message/PR description though. |
I'd be curious to see what happened -- was the patch applied more than once? Just to make sure we're fixing it properly with the prepend version.
No worries, there's always the ideal and then there's reality. Hope you don't mind if I poke when it made my life harder, and please do the same and get revenge on my PRs as well :) |
Yup, now that you mention it, the
No offense taken! It was definitely called for :) |
9281c98
to
7a61b6b
Compare
The JRuby 9.2.0.0 test failure looks like it is deterministic -- https://app.circleci.com/pipelines/github/DataDog/dd-trace-rb/3520/workflows/c571f62b-cf7f-44ed-bd05-2d134b54a8a8/jobs/138556/tests#failed-test-0 . We have 9.2.0.0 because of #1409 but to be honest I'm not entirely a fan. At most, I'd suggest just skipping support for whatever is broken on those JRuby versions, and printing out a warning "did not patch concurrent ruby... your JRuby version is too old, please update". |
Regarding the lint failure, rebasing with master will fix it now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks very good and complete, kudos @lloeki!
There are a few discussions in the comments, but overall it's in shape.
7a61b6b
to
22c0550
Compare
Rebased! |
Can you look into #1522 (comment) as well? |
I was just typing it out ;)
I wasn't aware of this PR, and I'm not either, not the least because I had to face non-conformant breakage already on JRuby. TBH I'd rather officially support only latest for each minor, given the regular stream of "oops we were not quite conformant on that not-so-corner-case" fixes that regularly land at JRuby. Not my call though.
Is there a sanctioned DSL-ish way to do this, or is this kind of thing made of hardcoded checks? |
Pfft I'll see your issue and raise you this one 🤣 For real though, since this looks like one of those "it breaks your app" kind of deal, I think it's worth doing the small detour, since people are known to get stuck on random versions and not update them. (But I also don't think we should spend too much time searching for new ones -- e.g. let's not add all minor versions of JRuby and MRI to CI)
#1503 had a few examples that got removed recently. My specific suggestion would be to: find out which JRuby version fixed it (binary search 9.2 releases?) and then do something like (in concurrent_ruby/integration.rb)
|
Thanks Ivo, I'll look into these examples and attempt something. |
Prior to this change, the profiler tried to sample all threads each time it decided to sample. This could have a negative impact on the latency of Ruby services with a lot of threads, because each sampling pass would be quite expensive. Just as an illustration, consider the `benchmarks/profiler_sample_loop` benchmark which tries to sample as fast as possible 4 threads with very deep (500 frames+) stacks. As the number of threads to sample increases, so does the performance of sampling degrade: | Sampled threads | Iterations per second | |-----------------|-----------------------------------------------------| | 4 threads | 390.921 (± 6.1%) i/s - 3.920k in 10.075624s | | 16 threads | 97.037 (± 6.2%) i/s - 972.000 in 10.072298s | | 32 threads | 49.576 (± 4.0%) i/s - 496.000 in 10.038491s | | 64 threads | 24.060 (± 8.3%) i/s - 240.000 in 10.033481s | (All numbers from my laptop: i7-1068NG7 + ruby 2.7.4p191 [x86_64-darwin19]) When combined with our dynamic sampling mechanism, if the profiler starts to take longer to sample, it just samples less often. BUT this means that it can impact service latency because suddenly we may be spending 40ms+ of CPU time (64 thread example + 500 frames, very extreme) which from experience (#1511, #1522) is not good. Instead, this PR changes the stack sampler to sample up to a set maximum of threads (16), randomly selected among all threads. This means that adding more threads to the system means very little degradation: | Sampled threads | Iterations per second | |-----------------|----------------------------------------------------| | 16 threads | 104.859 (± 6.7%) i/s - 1.050k in 10.068781s | | 32 threads | 100.957 (± 5.9%) i/s - 1.010k in 10.047250s | | 64 threads | 98.098 (± 5.1%) i/s - 981.000 in 10.038244s | | 256 threads | 84.740 (± 8.3%) i/s - 848.000 in 10.074037s | There's still a bit of degradation that I suspect is from pure VM overhead -- 256+ threads on a single Ruby VM adds up. Because we pick the threads to sample randomly, we'll eventually sample all threads -- just not at once. Finally, regarding the dynamic sampling mechanism, because the profiler will not take longer to sample, it will sample more often, and thus over a longer period we should take sample roughly the same samples.
Prior to this change, the profiler tried to sample all threads each time it decided to sample. This could have a negative impact on the latency of Ruby services with a lot of threads, because each sampling pass would be quite expensive. Just as an illustration, consider the `benchmarks/profiler_sample_loop` benchmark which tries to sample as fast as possible 4 threads with very deep (500 frames+) stacks. As the number of threads to sample increases, so does the performance of sampling degrade: | Sampled threads | Iterations per second | |-----------------|-----------------------------------------------------| | 4 threads | 390.921 (± 6.1%) i/s - 3.920k in 10.075624s | | 16 threads | 97.037 (± 6.2%) i/s - 972.000 in 10.072298s | | 32 threads | 49.576 (± 4.0%) i/s - 496.000 in 10.038491s | | 64 threads | 24.060 (± 8.3%) i/s - 240.000 in 10.033481s | (All numbers from my laptop: i7-1068NG7 + ruby 2.7.4p191 [x86_64-darwin19]) When combined with our dynamic sampling mechanism, if the profiler starts to take longer to sample, it just samples less often. BUT this means that it can impact service latency because suddenly we may be spending 40ms+ of CPU time (64 thread example + 500 frames, very extreme) which from experience (#1511, #1522) is not good. Instead, this PR changes the stack sampler to sample up to a set maximum of threads (16), randomly selected among all threads. This means that adding more threads to the system means very little degradation: | Sampled threads | Iterations per second | |-----------------|----------------------------------------------------| | 16 threads | 104.859 (± 6.7%) i/s - 1.050k in 10.068781s | | 32 threads | 100.957 (± 5.9%) i/s - 1.010k in 10.047250s | | 64 threads | 98.098 (± 5.1%) i/s - 981.000 in 10.038244s | | 256 threads | 84.740 (± 8.3%) i/s - 848.000 in 10.074037s | There's still a bit of degradation that I suspect is from pure VM overhead -- 256+ threads on a single Ruby VM adds up. Because we pick the threads to sample randomly, we'll eventually sample all threads -- just not at once. Finally, regarding the dynamic sampling mechanism, because the profiler will not take longer to sample, it will sample more often, and thus over a longer period we should take sample roughly the same samples.
Prior to this change, the profiler tried to sample all threads each time it decided to sample. This could have a negative impact on the latency of Ruby services with a lot of threads, because each sampling pass would be quite expensive. Just as an illustration, consider the `benchmarks/profiler_sample_loop` benchmark which tries to sample as fast as possible 4 threads with very deep (500 frames+) stacks. As the number of threads to sample increases, so does the performance of sampling degrade: | Sampled threads | Iterations per second | |-----------------|-----------------------------------------------------| | 4 threads | 390.921 (± 6.1%) i/s - 3.920k in 10.075624s | | 16 threads | 97.037 (± 6.2%) i/s - 972.000 in 10.072298s | | 32 threads | 49.576 (± 4.0%) i/s - 496.000 in 10.038491s | | 64 threads | 24.060 (± 8.3%) i/s - 240.000 in 10.033481s | (All numbers from my laptop: i7-1068NG7 + ruby 2.7.4p191 [x86_64-darwin19]) When combined with our dynamic sampling mechanism, if the profiler starts to take longer to sample, it just samples less often. BUT this means that it can impact service latency because suddenly we may be spending 40ms+ of CPU time (64 thread example + 500 frames, very extreme) which from experience (#1511, #1522) is not good. Instead, this PR changes the stack sampler to sample up to a set maximum of threads (16), randomly selected among all threads. This means that adding more threads to the system means very little degradation: | Sampled threads | Iterations per second | |-----------------|----------------------------------------------------| | 16 threads | 104.859 (± 6.7%) i/s - 1.050k in 10.068781s | | 32 threads | 100.957 (± 5.9%) i/s - 1.010k in 10.047250s | | 64 threads | 98.098 (± 5.1%) i/s - 981.000 in 10.038244s | | 256 threads | 84.740 (± 8.3%) i/s - 848.000 in 10.074037s | There's still a bit of degradation that I suspect is from pure VM overhead -- 256+ threads on a single Ruby VM adds up. Because we pick the threads to sample randomly, we'll eventually sample all threads -- just not at once. Finally, regarding the dynamic sampling mechanism, because the profiler will not take longer to sample, it will sample more often, and thus over a longer period we should take sample roughly the same samples. One downside of this approach is that if there really are many threads, the resulting wall clock times in a one minute profile may "drift" around the 60 second mark, e.g. maybe we only sampled a thread once per second and only 59 times, so we'll report 59s, but on the next report we'll include the missing one, so then the result will be 61s. I've observed 60 +- 1.68 secs for an app with ~65 threads, given the default maximum of 16 threads. This seems a reasonable enough margin of error given the improvement to latency (especially on such a large application! -> even bigger latency impact if we tried to sample all threads). If it helps, I've used this trivial Ruby app for testing with many threads: ```ruby 32.times do |n| eval(" def sleeper#{n} sleep end ") Thread.new { send(:"sleeper#{n}") } end sleep ```
Prior to this change, the profiler tried to sample all threads each time it decided to sample. This could have a negative impact on the latency of Ruby services with a lot of threads, because each sampling pass would be quite expensive. Just as an illustration, consider the `benchmarks/profiler_sample_loop` benchmark which tries to sample as fast as possible 4 threads with very deep (500 frames+) stacks. As the number of threads to sample increases, so does the performance of sampling degrade: | Sampled threads | Iterations per second | |-----------------|-----------------------------------------------------| | 4 threads | 390.921 (± 6.1%) i/s - 3.920k in 10.075624s | | 16 threads | 97.037 (± 6.2%) i/s - 972.000 in 10.072298s | | 32 threads | 49.576 (± 4.0%) i/s - 496.000 in 10.038491s | | 64 threads | 24.060 (± 8.3%) i/s - 240.000 in 10.033481s | (All numbers from my laptop: i7-1068NG7 + ruby 2.7.4p191 [x86_64-darwin19]) When combined with our dynamic sampling mechanism, if the profiler starts to take longer to sample, it just samples less often. BUT this means that it can impact service latency because suddenly we may be spending 40ms+ of CPU time (64 thread example + 500 frames, very extreme) which from experience (#1511, #1522) is not good. Instead, this PR changes the stack sampler to sample up to a set maximum of threads (16), randomly selected among all threads. This means that adding more threads to the system means very little degradation: | Sampled threads | Iterations per second | |-----------------|----------------------------------------------------| | 16 threads | 104.859 (± 6.7%) i/s - 1.050k in 10.068781s | | 32 threads | 100.957 (± 5.9%) i/s - 1.010k in 10.047250s | | 64 threads | 98.098 (± 5.1%) i/s - 981.000 in 10.038244s | | 256 threads | 84.740 (± 8.3%) i/s - 848.000 in 10.074037s | There's still a bit of degradation that I suspect is from pure VM overhead -- 256+ threads on a single Ruby VM adds up. Because we pick the threads to sample randomly, we'll eventually sample all threads -- just not at once. Finally, regarding the dynamic sampling mechanism, because the profiler will not take longer to sample, it will sample more often, and thus over a longer period we should take sample roughly the same samples. One downside of this approach is that if there really are many threads, the resulting wall clock times in a one minute profile may "drift" around the 60 second mark, e.g. maybe we only sampled a thread once per second and only 59 times, so we'll report 59s, but on the next report we'll include the missing one, so then the result will be 61s. I've observed 60 +- 1.68 secs for an app with ~65 threads, given the default maximum of 16 threads. This seems a reasonable enough margin of error given the improvement to latency (especially on such a large application! -> even bigger latency impact if we tried to sample all threads). If it helps, I've used this trivial Ruby app for testing with many threads: ```ruby 32.times do |n| eval(" def sleeper#{n} sleep end ") Thread.new { send(:"sleeper#{n}") } end sleep ```
Prior to this change, the profiler tried to sample all threads each time it decided to sample. This could have a negative impact on the latency of Ruby services with a lot of threads, because each sampling pass would be quite expensive. Just as an illustration, consider the `benchmarks/profiler_sample_loop` benchmark which tries to sample as fast as possible 4 threads with very deep (500 frames+) stacks. As the number of threads to sample increases, so does the performance of sampling degrade: | Sampled threads | Iterations per second | |-----------------|-----------------------------------------------------| | 4 threads | 390.921 (± 6.1%) i/s - 3.920k in 10.075624s | | 16 threads | 97.037 (± 6.2%) i/s - 972.000 in 10.072298s | | 32 threads | 49.576 (± 4.0%) i/s - 496.000 in 10.038491s | | 64 threads | 24.060 (± 8.3%) i/s - 240.000 in 10.033481s | (All numbers from my laptop: i7-1068NG7 + ruby 2.7.4p191 [x86_64-darwin19]) When combined with our dynamic sampling mechanism, if the profiler starts to take longer to sample, it just samples less often. BUT this means that it can impact service latency because suddenly we may be spending 40ms+ of CPU time (64 thread example + 500 frames, very extreme) which from experience (#1511, #1522) is not good. Instead, this PR changes the stack sampler to sample up to a set maximum of threads (16), randomly selected among all threads. This means that adding more threads to the system means very little degradation: | Sampled threads | Iterations per second | |-----------------|----------------------------------------------------| | 16 threads | 104.859 (± 6.7%) i/s - 1.050k in 10.068781s | | 32 threads | 100.957 (± 5.9%) i/s - 1.010k in 10.047250s | | 64 threads | 98.098 (± 5.1%) i/s - 981.000 in 10.038244s | | 256 threads | 84.740 (± 8.3%) i/s - 848.000 in 10.074037s | There's still a bit of degradation that I suspect is from pure VM overhead -- 256+ threads on a single Ruby VM adds up. Because we pick the threads to sample randomly, we'll eventually sample all threads -- just not at once. Finally, regarding the dynamic sampling mechanism, because the profiler will not take longer to sample, it will sample more often, and thus over a longer period we should take sample roughly the same samples. One downside of this approach is that if there really are many threads, the resulting wall clock times in a one minute profile may "drift" around the 60 second mark, e.g. maybe we only sampled a thread once per second and only 59 times, so we'll report 59s, but on the next report we'll include the missing one, so then the result will be 61s. I've observed 60 +- 1.68 secs for an app with ~65 threads, given the default maximum of 16 threads. This seems a reasonable enough margin of error given the improvement to latency (especially on such a large application! -> even bigger latency impact if we tried to sample all threads). If it helps, I've used this trivial Ruby app for testing with many threads: ```ruby 32.times do |n| eval(" def sleeper#{n} sleep end ") Thread.new { send(:"sleeper#{n}") } end sleep ```
@lloeki any update on this one? |
We faced with the same issue. We use the graphql-ruby with a new usage of |
We have already issue #1316 |
I've requested the team take another look at this! Hopefully they'll be able to reply soon :) |
22c0550
to
87eb055
Compare
87eb055
to
e72489b
Compare
@ivoanjo @lloeki JRuby CI is currently passing (JRuby 9.2.0 is no longer tested), I think this is good to merge? |
@ekump LGTM, thanks for picking it up! |
@composited_executor.post(*args) do | ||
# Pass the original arguments to the composited executor, which | ||
# pushes them (possibly transformed) as block args | ||
executor.post(*args) do |*block_args| | ||
begin | ||
original_context = tracer.provider.context | ||
tracer.provider.context = parent_context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this type of context sharing works as we expect anymore.
It's not safe to share the same context object between different threads. They will fight for who the active_span
is at any point in time, and span parenting will likely get intertwined between all threads sharing the same context.
We should use TraceDigest propagation instead:
dd-trace-rb/lib/datadog/tracing.rb
Line 23 in c6c3935
def continue_trace!(digest, &block) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to use continue_trace!
expect(inner_span.parent_id).to eq(outer_span.span_id) | ||
end | ||
|
||
context 'when there are multiple futures with inner spans that have the same parent' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base.class_eval do | ||
alias_method :ns_initialize_without_datadog, :ns_initialize | ||
remove_method(:ns_initialize) | ||
def ns_initialize(value, opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work!
Argument passing is preserved through the composited executor. Concurrent::Future has been moved to a prepend style to resolve an infinite call loop situation.
This allows a rollback of the effects of prepend upon the original module being restored after each test.
…_trace method for thread safety
5c1c3d2
to
8278d5d
Compare
Hey @Stashchenko, this change has just been released in 1.16.0. Let us know if this addresses your issue! |
Just checked, Unfortunately, doesn't work :( In the second screenshot there is not faraday trace :( Our setup: # frozen_string_literal: true
require 'datadog/statsd'
require 'ddtrace'
require 'datadog/appsec'
require 'faraday'
Datadog.configure do |c|
service_name = 'my-service'
c.service = service_name
c.tracing.analytics.enabled = true
c.tracing.instrument(:concurrent_ruby)
c.tracing.instrument(:rails)
c.tracing.instrument(:redis, service_name: "#{service_name}-redis")
c.tracing.instrument(:active_support, cache_service: service_name)
c.tracing.instrument(:active_record, service_name: "#{service_name}-postgres")
c.tracing.instrument(:http, split_by_domain: true)
c.tracing.instrument(:faraday, service_name:)
c.runtime_metrics.enabled = true
c.runtime_metrics.statsd = Datadog::Statsd.new('localhost', (ENV['STATSD_PORT'] || '8120').to_i)
c.ci.enabled = ENV['CI'].present?
c.ci.instrument :rspec
c.appsec.instrument :rails
end |
I tried to experiment with our monkey patch, and what I've found: I've made only changes to the # frozen_string_literal: true
require 'concurrent/executor/executor_service'
module Datadog
module Tracing
module Contrib
module ConcurrentRuby
# This patches the Future - to wrap executor service using ContextCompositeExecutorService
class ContextCompositeExecutorService
extend Forwardable
include Concurrent::ExecutorService
attr_accessor :composited_executor
def initialize(composited_executor)
@composited_executor = composited_executor
end
# The post method runs the task within composited executor - in a
# different thread. The original arguments are captured to be
# propagated to the composited executor post method
def post(*)
tracer = Tracing.send(:tracer)
parent_context = tracer.provider.context
executor = @composited_executor
executor = Concurrent.executor(@composited_executor) if @composited_executor.is_a?(Symbol)
# Pass the original arguments to the composited executor, which
# pushes them (possibly transformed) as block args
executor.post(*) do |*block_args|
original_context = tracer.provider.context
tracer.provider.context = parent_context
# Pass the executor-provided block args as they should have been
# originally passed without composition, see ChainPromise#on_resolvable
yield(*block_args)
ensure
# Restore context in case the current thread gets reused
tracer.provider.context = original_context
end
end
delegate %i[can_overflow? serialized?] => :composited_executor
end
end
end
end
end Looks like the issue for us is somewhere in the |
# Restore context in case the current thread gets reused | ||
tracer.provider.context = original_context | ||
end | ||
digest = Tracing.active_trace.to_digest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is causing crashes on launch for us when Tracing.active_trace
is nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@segiddins , oops, sorry for this. I am addressing it and should be released with 1.16.1 that is gona be around the corner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed fix is up at #3242
👋 @Stashchenko, with the 1.16 change, you can see that there's a "floating" block of tracing (the one with a big The original patch, where The new implementation actually represents the accurate trace flow in a parallel application, and won't cause span parenting issues.
Now this is unexpected, given this change has nothing to do with Faraday. |
👋 @Stashchenko, I was not able to reproduce your case. Could you provide me something more to further investigate with the missing faraday span The snippet I used to mimic your behavior: require 'ddtrace'
require 'graphql'
require 'faraday'
require 'concurrent-ruby'
require 'ostruct'
require 'pry'
class GraphQL::Types::BaseObject < GraphQL::Schema::Object
end
class PersonType < GraphQL::Types::BaseObject
field :name, String, null: false
end
class PostType < GraphQL::Types::BaseObject
field :id, ID, null: false
field :author, PersonType
def author
Concurrent::Promises.future do
Faraday.get 'https://www.google.com'
OpenStruct.new(id: object.author_id, name: object.author_id)
end.value
end
end
class QueryType < GraphQL::Schema::Object
field :post, PostType do
argument :id, ID
end
# Then provide an implementation:
def post(id:)
OpenStruct.new(id: id, author_id: id)
end
end
class Schema < GraphQL::Schema
query QueryType
# define schema lazy_resolve for Future
lazy_resolve(Concurrent::Promises::Future, :value)
end
Datadog.configure do |c|
c.service = 'Bug Reproduction'
c.diagnostics.debug = true
c.tracing.instrument :faraday
c.tracing.instrument :concurrent_ruby
c.tracing.instrument :graphql, schemas: [Schema]
end
query_string = <<-GraphqlQueryString
{
p1: post(id: 1) { author { name } }
p2: post(id: 2) { author { name } }
p3: post(id: 3) { author { name } }
}
GraphqlQueryString
result_hash = Datadog::Tracing.trace("outer_span") do
Schema.execute(query_string)
end
|
Hi @TonyCTHsu , you should not get the value just in the same method call. Concurrent::Promises.future do
Faraday.get 'https://www.google.com'
OpenStruct.new(id: object.author_id, name: object.author_id)
end.value # This piece It's probably, immediately runs the query and nothing to do for lazy resolver. One more step to reproduce. Try to get list of products with author's name: query_string = <<-GraphqlQueryString
{
posts { author { name } }
}
GraphqlQueryString That what we have in our code. Smt like that should call the Faraday Concurrent::Promises.future do
response = Faraday.get 'https://api.publicapis.org/entries'
value = JSON.parse(response.body).sample
OpenStruct.new(id: object.author_id, name: value['API'])
end |
Hi @marcotc ,
It's not Faraday related :( |
Hi guys , tried to reproduce @TonyCTHsu example on a clean project. class PersonType < BaseObject
field :name, String, null: false
end
class PostType < BaseObject
field :author, PersonType
field :id, ID, null: false
def author
Concurrent::Promises.future do
Datadog::Tracing.trace('example.trace') do
response = Faraday.get 'https://api.publicapis.org/entries'
value = JSON.parse(response.body)['entries'].sample
OpenStruct.new(id: object.author_id, name: value['API'])
end
end
end
end
field :posts, [PostType], null: false
def posts
Array.new(10) { OpenStruct.new(id: SecureRandom.uuid, author_id: SecureRandom.uuid) }
end Looks like something incompatible inside our project. Will dig deeper to compare what's the diff. Sorry for the confusion. And thanks a lot for support! |
Argument passing is preserved through the composited executor.
Concurrent::Future
has been moved to a prepend style to resolve an infinite call loop situation.