Skip to content

Commit

Permalink
Merge pull request #2128 from DataDog/feature-single-span-sampling
Browse files Browse the repository at this point in the history
  • Loading branch information
marcotc authored Sep 6, 2022
2 parents 7fa72c2 + 17093d8 commit eaa1469
Show file tree
Hide file tree
Showing 30 changed files with 1,317 additions and 96 deletions.
3 changes: 2 additions & 1 deletion .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -124,8 +124,9 @@ Lint/EmptyClass: # (new in 1.3)
Enabled: true
Lint/LambdaWithoutLiteralBlock: # (new in 1.8)
Enabled: true
# Prevents `return` in an assignment block `var = begin; return; end` block.
Lint/NoReturnInBeginEndBlocks: # (new in 1.2)
Enabled: true
Enabled: false
Lint/NumberedParameterAssignment: # (new in 1.9)
Enabled: true
Lint/OrAssignmentToConstant: # (new in 1.9)
Expand Down
8 changes: 8 additions & 0 deletions docs/GettingStarted.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ To contribute, check out the [contribution guidelines][contribution docs] and [d
- [Sampling](#sampling)
- [Application-side sampling](#application-side-sampling)
- [Priority sampling](#priority-sampling)
- [Single Span Sampling](#single-span-sampling)
- [Distributed tracing](#distributed-tracing)
- [HTTP request queuing](#http-request-queuing)
- [Processing pipeline](#processing-pipeline)
Expand Down Expand Up @@ -2042,6 +2043,7 @@ end
| `tracing.sampler` | | `nil` | Advanced usage only. Sets a custom `Datadog::Tracing::Sampling::Sampler` instance. If provided, the tracer will use this sampler to determine sampling behavior. See [Application-side sampling](#application-side-sampling) for details. |
| `tracing.sampling.default_rate` | `DD_TRACE_SAMPLE_RATE` | `nil` | Sets the trace sampling rate between `0.0` (0%) and `1.0` (100%). See [Application-side sampling](#application-side-sampling) for details. |
| `tracing.sampling.rate_limit` | `DD_TRACE_RATE_LIMIT` | `100` (per second) | Sets a maximum number of traces per second to sample. Set a rate limit to avoid the ingestion volume overages in the case of traffic spikes. |
| `tracing.sampling.span_rules` | `DD_SPAN_SAMPLING_RULES`,`ENV_SPAN_SAMPLING_RULES_FILE` | `nil` | Sets [Single Span Sampling](#single-span-sampling) rules. These rules allow you to keep spans even when their respective traces are dropped. |
| `tracing.report_hostname` | `DD_TRACE_REPORT_HOSTNAME` | `false` | Adds hostname tag to traces. |
| `tracing.test_mode.enabled` | `DD_TRACE_TEST_MODE_ENABLED` | `false` | Enables or disables test mode, for use of tracing in test suites. |
| `tracing.test_mode.trace_flush` | | `nil` | Object that determines trace flushing behavior. |
Expand Down Expand Up @@ -2186,6 +2188,12 @@ trace.reject!
trace.keep!
```

#### Single Span Sampling

You can configure sampling rule that allow you keep spans despite their respective traces being dropped by a trace-level sampling rule.

[//]: # (TODO: See <Single Span Sampling documentation URL here> for the full documentation on Single Span Sampling.)

### Distributed Tracing

Distributed tracing allows traces to be propagated across multiple instrumented applications so that a request can be presented as a single trace, rather than a separate trace per service.
Expand Down
8 changes: 8 additions & 0 deletions lib/datadog/core/configuration/components.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
require_relative '../../tracing/tracer'
require_relative '../../tracing/flush'
require_relative '../../tracing/sync_writer'
require_relative '../../tracing/sampling/span/rule_parser'
require_relative '../../tracing/sampling/span/sampler'

module Datadog
module Core
Expand Down Expand Up @@ -80,6 +82,7 @@ def build_tracer(settings, agent_settings)
enabled: settings.tracing.enabled,
trace_flush: trace_flush,
sampler: sampler,
span_sampler: build_span_sampler(settings),
writer: writer,
tags: build_tracer_tags(settings),
)
Expand Down Expand Up @@ -183,6 +186,11 @@ def writer_update_priority_sampler_rates_callback(sampler)
end
end

def build_span_sampler(settings)
rules = Tracing::Sampling::Span::RuleParser.parse_json(settings.tracing.sampling.span_rules)
Tracing::Sampling::Span::Sampler.new(rules || [])
end

def build_profiler(settings, agent_settings, tracer)
return unless settings.profiling.enabled

Expand Down
44 changes: 44 additions & 0 deletions lib/datadog/core/configuration/settings.rb
Original file line number Diff line number Diff line change
Expand Up @@ -540,6 +540,7 @@ def initialize(*_)
option :sampler

# Client-side sampling configuration.
# @see https://docs.datadoghq.com/tracing/trace_ingestion/mechanisms/
# @public_api
settings :sampling do
# Default sampling rate for the tracer.
Expand All @@ -566,6 +567,48 @@ def initialize(*_)
o.default { env_to_float(Tracing::Configuration::Ext::Sampling::ENV_RATE_LIMIT, 100) }
o.lazy
end

# Single span sampling rules.
# These rules allow a span to be kept when its encompassing trace is dropped.
#
# The syntax for single span sampling rules can be found here:
# TODO: <Single Span Sampling documentation URL here>
#
# @default `DD_SPAN_SAMPLING_RULES` environment variable.
# Otherwise, `ENV_SPAN_SAMPLING_RULES_FILE` environment variable.
# Otherwise `nil`.
# @return [String,nil]
# @public_api
option :span_rules do |o|
o.default do
rules = ENV[Tracing::Configuration::Ext::Sampling::Span::ENV_SPAN_SAMPLING_RULES]
rules_file = ENV[Tracing::Configuration::Ext::Sampling::Span::ENV_SPAN_SAMPLING_RULES_FILE]

if rules
if rules_file
Datadog.logger.warn(
'Both DD_SPAN_SAMPLING_RULES and DD_SPAN_SAMPLING_RULES_FILE were provided: only ' \
'DD_SPAN_SAMPLING_RULES will be used. Please do not provide DD_SPAN_SAMPLING_RULES_FILE when ' \
'also providing DD_SPAN_SAMPLING_RULES as their configuration conflicts. ' \
"DD_SPAN_SAMPLING_RULES_FILE=#{rules_file} DD_SPAN_SAMPLING_RULES=#{rules}"
)
end
rules
elsif rules_file
begin
File.read(rules_file)
rescue => e
# `File#read` errors have clear and actionable messages, no need to add extra exception info.
Datadog.logger.warn(
"Cannot read span sampling rules file `#{rules_file}`: #{e.message}." \
'No span sampling rules will be applied.'
)
nil
end
end
end
o.lazy
end
end

# [Continuous Integration Visibility](https://docs.datadoghq.com/continuous_integration/) configuration.
Expand Down Expand Up @@ -644,6 +687,7 @@ def initialize(*_)
end
end
end

# rubocop:enable Metrics/BlockLength
# rubocop:enable Metrics/ClassLength
# rubocop:enable Layout/LineLength
Expand Down
6 changes: 6 additions & 0 deletions lib/datadog/tracing/configuration/ext.rb
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,12 @@ module NET
module Sampling
ENV_SAMPLE_RATE = 'DD_TRACE_SAMPLE_RATE'.freeze
ENV_RATE_LIMIT = 'DD_TRACE_RATE_LIMIT'.freeze

# @public_api
module Span
ENV_SPAN_SAMPLING_RULES = 'DD_SPAN_SAMPLING_RULES'.freeze
ENV_SPAN_SAMPLING_RULES_FILE = 'DD_SPAN_SAMPLING_RULES_FILE'.freeze
end
end

# @public_api
Expand Down
92 changes: 57 additions & 35 deletions lib/datadog/tracing/flush.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,71 +3,93 @@
module Datadog
module Tracing
module Flush
# Consumes only completed traces (where all spans have finished)
class Finished
# Consumes and returns completed traces (where all spans have finished)
# from the provided \trace_op, if any.
# Consumes and returns a {TraceSegment} to be flushed, from
# the provided {TraceSegment}.
#
# Only finished spans are consumed. Any spans consumed are
# removed from +trace_op+ as a side effect. Unfinished spans are
# unaffected.
#
# @abstract
class Base
# Consumes and returns a {TraceSegment} to be flushed, from
# the provided {TraceSegment}.
#
# Any traces consumed are removed from +trace_op+ as a side effect.
# Only finished spans are consumed. Any spans consumed are
# removed from +trace_op+ as a side effect. Unfinished spans are
# unaffected.
#
# @param [TraceOperation] trace_op
# @return [TraceSegment] trace to be flushed, or +nil+ if the trace is not finished
def consume!(trace_op)
return unless full_flush?(trace_op)
return unless flush?(trace_op)

get_trace(trace_op)
end

def full_flush?(trace_op)
trace_op && trace_op.sampled? && trace_op.finished?
# Should we consume spans from the +trace_op+?
# @abstract
def flush?(trace_op)
raise NotImplementedError
end

protected

# Consumes all finished spans from trace.
# @return [TraceSegment]
def get_trace(trace_op)
trace_op.flush!
trace_op.flush! do |spans|
spans.select! { |span| single_sampled?(span) } unless trace_op.sampled?

spans
end
end

# Single Span Sampling has chosen to keep this span
# regardless of the trace-level sampling decision
def single_sampled?(span)
span.get_metric(Sampling::Span::Ext::TAG_MECHANISM) == Sampling::Span::Ext::MECHANISM_SPAN_SAMPLING_RATE
end
end

# Consumes and returns completed traces (where all spans have finished),
# if any, from the provided +trace_op+.
#
# Spans consumed are removed from +trace_op+ as a side effect.
class Finished < Base
# Are all spans finished?
def flush?(trace_op)
trace_op && trace_op.finished?
end
end

# Performs partial trace flushing to avoid large traces residing in memory for too long
class Partial
# Consumes and returns completed or partially completed
# traces from the provided +trace_op+, if any.
#
# Partial trace flushing avoids large traces residing in memory for too long.
#
# Partially completed traces, where not all spans have finished,
# will only be returned if there are at least
# +@min_spans_for_partial+ finished spans.
#
# Spans consumed are removed from +trace_op+ as a side effect.
class Partial < Base
# Start flushing partial trace after this many active spans in one trace
DEFAULT_MIN_SPANS_FOR_PARTIAL_FLUSH = 500

attr_reader :min_spans_for_partial

def initialize(options = {})
super()
@min_spans_for_partial = options.fetch(:min_spans_before_partial_flush, DEFAULT_MIN_SPANS_FOR_PARTIAL_FLUSH)
end

# Consumes and returns completed or partially completed
# traces from the provided +trace_op+, if any.
#
# Partially completed traces, where not all spans have finished,
# will only be returned if there are at least
# +@min_spans_for_partial+ finished spans.
#
# Any spans consumed are removed from +trace_op+ as a side effect.
#
# @return [TraceSegment] partial or complete trace to be flushed, or +nil+ if no spans are finished
def consume!(trace_op)
return unless partial_flush?(trace_op)

get_trace(trace_op)
end

def partial_flush?(trace_op)
return false unless trace_op.sampled?
def flush?(trace_op)
return true if trace_op.finished?
return false if trace_op.finished_span_count < @min_spans_for_partial

true
end

protected

def get_trace(trace_op)
trace_op.flush!
end
end
end
end
Expand Down
9 changes: 9 additions & 0 deletions lib/datadog/tracing/metadata/tagging.rb
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,15 @@ def set_tags(tags)
tags.each { |k, v| set_tag(k, v) }
end

# Returns true if the provided `tag` was set to a non-nil value.
# False otherwise.
#
# @param [String] tag the tag or metric to check for presence
# @return [Boolean] if the tag is present and not nil
def has_tag?(tag) # rubocop:disable Naming/PredicateName
!get_tag(tag).nil? # nil is considered not present, thus we can't use `Hash#has_key?`
end

# This method removes a tag for the given key.
def clear_tag(key)
meta.delete(key)
Expand Down
3 changes: 3 additions & 0 deletions lib/datadog/tracing/sampling/rate_limiter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ class TokenBucket < RateLimiter
def initialize(rate, max_tokens = rate)
super()

raise ArgumentError, "rate must be a number: #{rate}" unless rate.is_a?(Numeric)
raise ArgumentError, "max_tokens must be a number: #{max_tokens}" unless max_tokens.is_a?(Numeric)

@rate = rate
@max_tokens = max_tokens

Expand Down
10 changes: 10 additions & 0 deletions lib/datadog/tracing/sampling/rate_sampler.rb
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,16 @@ class RateSampler < Sampler
# * +sample_rate+: the sample rate as a {Float} between 0.0 and 1.0. 0.0
# means that no trace will be sampled; 1.0 means that all traces will be
# sampled.
#
# DEV-2.0: Allow for `sample_rate` zero (drop all) to be allowed. This eases
# DEV-2.0: usage for all internal users of the {RateSampler} class: both
# DEV-2.0: RuleSampler and Single Span Sampling leverage the RateSampler, but want
# DEV-2.0: `sample_rate` zero to mean "drop all". They work around this by hard-
# DEV-2.0: setting the `sample_rate` to zero like so:
# DEV-2.0: ```
# DEV-2.0: sampler = RateSampler.new
# DEV-2.0: sampler.sample_rate = sample_rate
# DEV-2.0: ```
def initialize(sample_rate = 1.0)
super()

Expand Down
29 changes: 29 additions & 0 deletions lib/datadog/tracing/sampling/span/ext.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# frozen_string_literal: true

module Datadog
module Tracing
module Sampling
module Span
# Single Span Sampling constants.
module Ext
# Accept all spans (100% retention).
DEFAULT_SAMPLE_RATE = 1.0
# Unlimited.
# @see Datadog::Tracing::Sampling::TokenBucket
DEFAULT_MAX_PER_SECOND = -1

# Sampling decision method used to come to the sampling decision for this span
TAG_MECHANISM = '_dd.span_sampling.mechanism'
# Sampling rate applied to this span, if a rule applies
TAG_RULE_RATE = '_dd.span_sampling.rule_rate'
# Rate limit configured for this span, if a rule applies
TAG_MAX_PER_SECOND = '_dd.span_sampling.max_per_second'

# This span was sampled on account of a Span Sampling Rule
# @see Datadog::Tracing::Sampling::Span::Rule
MECHANISM_SPAN_SAMPLING_RATE = 8
end
end
end
end
end
9 changes: 9 additions & 0 deletions lib/datadog/tracing/sampling/span/matcher.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ module Sampling
module Span
# Checks if a span conforms to a matching criteria.
class Matcher
attr_reader :name, :service

# Pattern that matches any string
MATCH_ALL_PATTERN = '*'

Expand Down Expand Up @@ -54,6 +56,13 @@ def match?(span)
end
end

def ==(other)
return super unless other.is_a?(Matcher)

name == other.name &&
service == other.service
end

private

# @param pattern [String]
Expand Down
Loading

0 comments on commit eaa1469

Please sign in to comment.