Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for http.client_ip tag for Rack-based frameworks #2248

Merged
merged 15 commits into from
Sep 8, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions lib/datadog/core/configuration/settings.rb
Original file line number Diff line number Diff line change
Expand Up @@ -618,6 +618,27 @@ def initialize(*_)
# @default `{}`
# @return [Hash,nil]
option :writer_options, default: ->(_i) { {} }, lazy: true

# Client IP configuration
# @public_api
settings :client_ip do
# Whether client IP collection is disabled. This disables client IPs from HTTP requests to be reported in traces.
#
# @default `DD_TRACE_CLIENT_IP_HEADER_DISABLED` environment variable, otherwise `false`.
# @return [Boolean]
option :disabled do |o|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantically, prefer enabled to disabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually agree but since the environment variable is DISABLED I figured I'd keep it disabled.
Do you still think it's worth the setting name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also rename the environment variable to DD_TRACE_CLIENT_IP_HEADER_ENABLED to be consistent with other environment variable?

o.default { env_to_bool(Tracing::Configuration::Ext::ClientIp::ENV_DISABLED, false) }
o.lazy
end
# An optional name of a custom header to resolve the client IP from.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# An optional name of a custom header to resolve the client IP from.
# An optional name of a custom header to resolve the client IP from.

Linter should have caught this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops.
Rubocop seems to pass but I shouldv'e considered the formatting in the other options.
Fixed

#
# @default `DD_TRACE_CLIENT_IP_HEADER` environment variable, otherwise `nil`.
# @return [String,nil]
option :header_name do |o|
o.default { ENV.fetch(Tracing::Configuration::Ext::ClientIp::ENV_HEADER_NAME, nil) }
o.lazy
end
end
end

# The `version` tag in Datadog. Use it to enable [Deployment Tracking](https://docs.datadoghq.com/tracing/deployment_tracking/).
Expand Down
191 changes: 191 additions & 0 deletions lib/datadog/tracing/client_ip.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
# typed: true

require 'ipaddr'

require_relative '../core/configuration'
require_relative 'metadata/ext'
require_relative 'span'

module Datadog
module Tracing
# Common functions for supporting the `http.client_ip` span attribute.
module ClientIp
DEFAULT_IP_HEADERS_NAMES = %w[
x-forwarded-for
x-real-ip
x-client-ip
x-forwarded
x-cluster-client-ip
forwarded-for
forwarded
via
true-client-ip
].freeze

# A collection of headers.
class HeaderCollection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could go to the Core namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

# Gets a single value of the header with the given name, case insensitive.
#
# @param [String] header_name Name of the header to get the value of.
# @returns [String, nil] A single value of the header, or nil if the header with
# the given name is missing from the collection.
def get(header_name)
nil
end

def self.from_hash(hash)
HashHeaderCollection.new(hash)
end
end

# Sets the `http.client_ip` tag on the given span.
#
# This function respects the user's settings: if they disable the client IP tagging,
# or provide a different IP header name.
#
# @param [Span] span The span that's associated with the request.
# @param [HeaderCollection, #get, nil] headers A collection with the request headers.
# @param [String, nil] remote_ip The remote IP the request associated with the span is sent to.
def self.set_client_ip_tag(span, headers, remote_ip)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noob question: should this have ! in the name?

Copy link
Contributor

@TonyCTHsu TonyCTHsu Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

! is a Ruby convention for hinting side-effect. For example, Array#map returns a new array while Array#map! returns a mutated self. Another common pattern is whether a method raise an exception, such as find_by vs find_by!.

I don't think this method deserve !, but this one seems to be a better candidate.


Since headers, remote_ip are optional. It is awkward to see

client_ip.set_client_ip_tag(span, nil, '1.1.1.1').

Consider making those keyword argument, so the caller makes the invokation a bit more explicit like,

client_ip.set_client_ip_tag(span, remote_ip: '1.1.1.1') or
client_ip.set_client_ip_tag(span, headers: headers).


Control flow using exception handling is slow in Ruby. This method need to be changed for

  1. Reconsider whether if the scenario is truly an exception
  2. Streamline the exceptional handling cases to avoid nested rescue/raise

Copy link
Contributor Author

@barweiss barweiss Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks!

As for using exceptions, I was indeed wondering whether this would have a performance impact. I'll change it so it doesn't use any exceptions.

return if configuration.disabled

ip = client_address_from_request(headers, remote_ip)
if !configuration.header_name && ip.nil?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is only done when there's no IP reported? This feels off somehow (like, when would this happen), so I must be missing something.

Copy link
Contributor Author

@barweiss barweiss Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this is still confusing after the refactor

header_names = ip_headers(headers).keys
span.set_tag(TAG_MULTIPLE_IP_HEADERS, header_names.join(',')) unless header_names.empty?
return
end

unless valid_ip?(ip)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this nesting of unless is confusing.

Is this necessary since extract_ip_from_full_address seems to return the address when its internal criteria don't match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely refactored that part, let me know if it makes more sense now :)

ip = extract_ip_from_full_address(ip)
return unless valid_ip?(ip)
end
ip = strip_zone_specifier(ip) if valid_ipv6?(ip)

span.set_tag(Tracing::Metadata::Ext::HTTP::TAG_CLIENT_IP, ip)
end

TAG_MULTIPLE_IP_HEADERS = '_dd.multiple-ip-headers'.freeze

# A header collection implementation that looks up headers in a Hash.
class HashHeaderCollection < HeaderCollection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could go to the Core namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, fixed

def initialize(hash)
super()
@hash = hash.transform_keys(&:downcase)
end

def get(header_name)
@hash[header_name.downcase]
end
end

def self.ip_headers(headers)
return {} unless headers

{}.tap do |result|
DEFAULT_IP_HEADERS_NAMES.each do |name|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a job for DEFAULT_IP_HEADERS_NAMES.each_with_object({}) do |name, result|

Copy link
Contributor Author

@barweiss barweiss Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with reduce (each_with_object returns an array)

EDIT: No it does not, I misread the documentation. Changed

value = headers.get(name)
next if value.nil?

result[name] = value
end
end
end

def self.client_address_from_request(headers, remote_ip)
return headers.get(configuration.header_name) if configuration.header_name && headers

ip_values_from_headers = ip_headers(headers).values
case ip_values_from_headers.size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case ip_values_from_headers.size
case ip_values_from_headers.size

Method return value of this method comes from the case return value, so I'd suggest to space this to conform with the Ruby style guide where implicit returns live on their own "line".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done

when 0
remote_ip
when 1
ip_values_from_headers.first
end
end

def self.strip_zone_specifier(ipv6)
if /\A(.*?)%.*/ =~ ipv6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I had to add this as well at Sqreen.

This could be achieved with ipv6.gsub(/%.*$/, '') instead of relying on intermediate match objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Fixed

return Regexp.last_match(1)
end

ipv6
end

# Extracts the IP part from a full address (`ipv4:port` or `[ipv6]:port`).
#
# @param [String] address Full address to split
# @returns [String] The IP part of the full address.
def self.extract_ip_from_full_address(address)
if /\A\[(.*)\]:\d+\Z/ =~ address
return Regexp.last_match(1)
end

if /\A(.*):\d+\Z/ =~ address
return Regexp.last_match(1)
end

address
end

def self.configuration
Datadog.configuration.tracing.client_ip
end

# Determines whether the given IP is valid.
#
# @param [String] ip The IP to validate.
# @returns [Boolean]
def self.valid_ip?(ip)
valid_ipv4?(ip) || valid_ipv6?(ip)
end

# --- Section vendored from the ipaddress gem --- #
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a rationale for not using IPAddr.new(addr).ipv4? and IPAddr.new(addr).ipv6??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to vendor third party code, we should find a way to slip the corresponding license, at least in LICENSE-3rd_party.csv.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I avoided IPAddr initially because it has the netmask handling which I didn't want but I guess it's solvable with some verification and it spares vendoring from ipaddress.
I'll give IPAddr a shot


# rubocop:disable Layout/LineLength, Style/SpecialGlobalVars

#
# Checks if the given string is a valid IPv4 address
#
# Example:
#
# IPAddress::valid_ipv4? "2002::1"
# #=> false
#
# IPAddress::valid_ipv4? "172.16.10.1"
# #=> true
#
# Vendored from `ipaddress` gem from file 'lib/ipaddress.rb', line 198.
def self.valid_ipv4?(addr)
if /^(0|[1-9]{1}\d{0,2})\.(0|[1-9]{1}\d{0,2})\.(0|[1-9]{1}\d{0,2})\.(0|[1-9]{1}\d{0,2})$/ =~ addr
return $~.captures.all? { |i| i.to_i < 256 }
end

false
end

#
# Checks if the given string is a valid IPv6 address
#
# Example:
#
# IPAddress::valid_ipv6? "2002::1"
# #=> true
#
# IPAddress::valid_ipv6? "2002::DEAD::BEEF"
# #=> false
#
# Vendored from `ipaddress` gem from file 'lib/ipaddress.rb', line 230.
def self.valid_ipv6?(addr)
# https://gist.github.com/cpetschnig/294476
# http://forums.intermapper.com/viewtopic.php?t=452
if /^\s*((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?\s*$/ =~ addr
return true
end

false
end
# rubocop:enable Layout/LineLength, Style/SpecialGlobalVars
end
end
end
6 changes: 6 additions & 0 deletions lib/datadog/tracing/configuration/ext.rb
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,12 @@ module Transport
ENV_DEFAULT_PORT = 'DD_TRACE_AGENT_PORT'.freeze
ENV_DEFAULT_URL = 'DD_TRACE_AGENT_URL'.freeze
end

# @public_api
module ClientIp
ENV_DISABLED = 'DD_TRACE_CLIENT_IP_HEADER_DISABLED'.freeze
ENV_HEADER_NAME = 'DD_TRACE_CLIENT_IP_HEADER'.freeze
end
end
end
end
Expand Down
35 changes: 35 additions & 0 deletions lib/datadog/tracing/contrib/rack/header.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
require_relative '../../client_ip'

module Datadog
module Tracing
module Contrib
module Rack
# Classes and utilities for handling headers in Rack.
module Header
# An implementation of a header collection that looks up headers from a Rack environment.
class RequestHeaderCollection < Datadog::Tracing::ClientIp::HeaderCollection
# Creates a header collection from a rack environment.
def initialize(env)
super()
@env = env
end

# Gets the value of the header with the given name.
def get(header_name)
@env[Header.header_to_rack_header(header_name)]
end

# Tests whether a header with the given name exists in the environment.
def key?(header_name)
@env.key?(Header.header_to_rack_header(header_name))
end
end

def self.header_to_rack_header(name)
"HTTP_#{name.to_s.upcase.gsub(/[-\s]/, '_')}"
end
end
end
end
end
end
27 changes: 15 additions & 12 deletions lib/datadog/tracing/contrib/rack/middlewares.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,14 @@
require 'date'

require_relative '../../../core/environment/variable_helpers'
require_relative '../../client_ip'
require_relative '../../metadata/ext'
require_relative '../../propagation/http'
require_relative '../analytics'
require_relative '../utils/quantization/http'
require_relative 'ext'
require_relative 'header'
require_relative 'request_queue'
require_relative '../utils/quantization/http'

module Datadog
module Tracing
Expand Down Expand Up @@ -133,8 +135,9 @@ def set_request_tags!(trace, request_span, env, status, headers, response, origi
# So when its not available, we want the original, unmutated PATH_INFO, which
# is just the relative path without query strings.
url = env['REQUEST_URI'] || original_env['PATH_INFO']
request_headers = parse_request_headers(env)
response_headers = parse_response_headers(headers || {})
request_header_collection = Header::RequestHeaderCollection.new(env)
request_headers_tags = parse_request_headers(request_header_collection)
response_headers_tags = parse_response_headers(headers || {})

# The priority
# 1. User overrides span.resource
Expand Down Expand Up @@ -177,6 +180,10 @@ def set_request_tags!(trace, request_span, env, status, headers, response, origi
)
end

if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_CLIENT_IP).nil?
Tracing::ClientIp.set_client_ip_tag(request_span, request_header_collection, env['REMOTE_ADDR'])
end

if request_span.get_tag(Tracing::Metadata::Ext::HTTP::TAG_BASE_URL).nil?
request_obj = ::Rack::Request.new(env)

Expand All @@ -195,12 +202,12 @@ def set_request_tags!(trace, request_span, env, status, headers, response, origi
end

# Request headers
request_headers.each do |name, value|
request_headers_tags.each do |name, value|
request_span.set_tag(name, value) if request_span.get_tag(name).nil?
end

# Response headers
response_headers.each do |name, value|
response_headers_tags.each do |name, value|
request_span.set_tag(name, value) if request_span.get_tag(name).nil?
end

Expand All @@ -219,13 +226,13 @@ def configuration
Datadog.configuration.tracing[:rack]
end

def parse_request_headers(env)
def parse_request_headers(headers)
{}.tap do |result|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to above, looks like a job for each_with_object

Copy link
Contributor Author

@barweiss barweiss Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

whitelist = configuration[:headers][:request] || []
whitelist.each do |header|
rack_header = header_to_rack_header(header)
if env.key?(rack_header)
result[Tracing::Metadata::Ext::HTTP::RequestHeaders.to_tag(header)] = env[rack_header]
if headers.key?(rack_header)
result[Tracing::Metadata::Ext::HTTP::RequestHeaders.to_tag(header)] = headers[rack_header]
end
end
end
Expand All @@ -248,10 +255,6 @@ def parse_response_headers(headers)
end
end
end

def header_to_rack_header(name)
"HTTP_#{name.to_s.upcase.gsub(/[-\s]/, '_')}"
end
end
end
end
Expand Down
1 change: 1 addition & 0 deletions lib/datadog/tracing/metadata/ext.rb
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ module HTTP
TYPE_OUTBOUND = 'http'
TYPE_PROXY = 'proxy'
TYPE_TEMPLATE = 'template'
TAG_CLIENT_IP = 'http.client_ip'

# General header functionality
module Headers
Expand Down
Loading