Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate and implement open telemetry tracing #18534

Conversation

achen2289
Copy link
Contributor

@achen2289 achen2289 commented Oct 20, 2022

Test plan -

  • Set Open Telemetry tracer and ensured query state changes are tracked as well as query completes execution successfully.
  • Ensured spans of same query are grouped under same parent span / traceId and any prior context, if provided, is properly propagated
  • Ensured baggage propagation is supported as well as set as tags of new spans
  • Tested trace exporting (export to Datadog agent)
  • Configured without open telemetry as well as with SimpleTracer / NoopTracer to ensure backwards compatibility
  • Tested edge cases with system config not matching plugin config
== RELEASE NOTES ==

General Changes
* Introduced a new plugin type for tracing. The legacy tracer module can be replaced by the new plugin for loading customized tracing infrastructure.

Open Telemetry Changes
* Introduced a new Open Telemetry tracer implementation. Users are able to enable the tracer by installing the ``presto-open-telemetry`` plugin and updating the application configuration (``config.properties``). Open Telemetry tracer can take in propagated context (only B3 specification currently supported) and baggage (W3C specification) headers, if provided, and inject into new traces / spans. Traces can be exported to any specified backend with the ``OTEL_EXPORTER_OTLP_ENDPOINT`` environment variable.

@achen2289 achen2289 requested a review from a team as a code owner October 20, 2022 22:17
@achen2289 achen2289 requested a review from presto-oss October 20, 2022 22:17
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 20, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: achen2289 / name: Alex Chen (21888a73662611aaee0fd34fc6ee20aa48646f34, f1099bde939e1aa1cc1568d93faa9f33a8285ace, 1eab0578bfbad5c047744cf68c9ad5e4fdd828d5, 9c47baad2aaeeca5a41fe418ccfce1400932cfc2)

@achen2289 achen2289 force-pushed the achen/integrate-and-implement-open-telemetry-tracing branch 8 times, most recently from 119f1c7 to 49d6041 Compare October 26, 2022 00:16
@achen2289 achen2289 force-pushed the achen/integrate-and-implement-open-telemetry-tracing branch from f070e75 to 3fd2443 Compare October 26, 2022 16:52
@cliandy
Copy link
Contributor

cliandy commented Oct 27, 2022

@highker could we get a first pass on this?

Copy link
Contributor

@tanjialiang tanjialiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the first commit

presto-main/etc/config.properties Outdated Show resolved Hide resolved
presto-main/etc/config.properties Outdated Show resolved Hide resolved
@Override
public void endTrace(String annotation)
{
addPoint(annotation);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to take care of any unfinished Span?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understanding the closing logic of opentelemetry here? Thank you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking into this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanjialiang Addressed by ending any unended spans at the time of ending trace.

@achen2289 achen2289 force-pushed the achen/integrate-and-implement-open-telemetry-tracing branch 2 times, most recently from a2a2097 to e932e63 Compare October 28, 2022 04:30
Copy link
Contributor

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high-level comments only

@achen2289 achen2289 force-pushed the achen/integrate-and-implement-open-telemetry-tracing branch 9 times, most recently from 5173a23 to 35a6292 Compare November 3, 2022 17:42
@achen2289 achen2289 force-pushed the achen/integrate-and-implement-open-telemetry-tracing branch 2 times, most recently from 936cf9e to ee19b16 Compare November 23, 2022 02:28
@achen2289
Copy link
Contributor Author

Hey @highker, I want to try to get my changes pushed through by the end of next week if possible. If there's anything in this PR that still needs further discussion or you still want revised, I'm happy to remove it from this PR and address it as a new one for the future if that works.

Copy link
Contributor

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash all commits into one.

I have one major comment in the end. Please check that one first.

Comment on lines 55 to 57
public static final String PRESTO_W3C_PROPAGATION = "traceparent";
public static final String PRESTO_B3_SINGLE_HEADER_PROPAGATION = "b3";
public static final String PRESTO_BAGGAGE_HEADER = "baggage";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove these headers to presto-open-telemetry. Check my other comment

Comment on lines 1233 to 1237
stringProperty(
CONTEXT_PROPAGATOR,
"Context propagator for Open Telemetry tracing. w3c, b3_single_header, or b3_multiple_header",
tracingConfig.getContextPropagator(),
false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the header instead of session property to control the flags?

Comment on lines 219 to 224
String tunnelTraceId = trimEmptyToNull(servletRequest.getHeader(PRESTO_TRACE_TOKEN));
String contextPropagator = sessionPropertyManager
.map(manager -> manager.decodeSystemPropertyValue(CONTEXT_PROPAGATOR, null, String.class))
.orElse(TracingConfig.ContextPropagator.B3_SINGLE_HEADER);
String propagatedContext = getPropagatedContext(contextPropagator, servletRequest);
String baggage = servletRequest.getHeader(PRESTO_BAGGAGE_HEADER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's refactor the code a bit so we don't have to use session property or the header explicitly. Check my other comment

* @param servletRequest http request with headers
* @return header value extracted from http request based on context propagator
*/
private static String getPropagatedContext(String contextPropagator, HttpServletRequest servletRequest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to presto-open-telemetry module. Check my other comment

{
private final String tracerType;
private final boolean systemTracingEnabled;
private @Nullable TracerProvider tracerProvider;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

@Nullable
private TracerProvider tracerProvider;


public TracerProvider getTracerProvider()
{
return tracerProvider;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check non null

Comment on lines 66 to 71
OpenTelemetry openTelemetry = OpenTelemetrySdk.builder()
.setTracerProvider(sdkTracerProvider)
.setPropagators(ContextPropagators.create(OpenTelemetryBuilder.getPropagatorInstance(contextPropagator)))
.buildAndRegisterGlobal();

return openTelemetry;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directly return instead of creating openTelemetry

Comment on lines 16 to 25
public interface TracerProvider
{
String getName();

String getTracerType();
/**
*
* @return A @Tracer that should be kept throughout the whole duration of tracing.
*/
Tracer getNewTracer();
Tracer getNewTracer(String traceToken, String contextPropagator, String propagatedContext, String baggage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Major] this is my major comment.

Similar to many other "handle" classes in our codebase, let's create a handle interface in com.facebook.presto.spi.tracing:

public interface TracerHandle
{
}

Change the signature of getNewTracer to

Tracer getNewTracer(TracerHandle handle);

Create a new method of TracerProvider:

/**
 * The method returns a function to take a set of HTTP headers and generate the tracer handle
 */
Function<Set<String>, TracerHandle> getHandleGenerator();

In your OpenTelemetry implementation, you should have

public class OpenTelemetryTracerHandle
        implements TracerHandle
{
    private final String traceToken;
    private final String contextPropagator;
    private final String propagatedContext;
    private final String baggage;

    ... 
}

In your OpenTelemetryTracerProvider, you should have

@Override
public Tracer getNewTracer(TracerHandle handle)
{
    OpenTelemetryTracerHandle tracerHandle = (TracerHandle) handle;
    return new OpenTelemetryTracer(tracerHandle.getXXX, ...);
}

In your getHandleGenerator implementation, move all the logics you have in presto-main here so we can hide all those headers and specialized extraction methods.

@@ -115,4 +116,9 @@ default Iterable<HistoryBasedPlanStatisticsProvider> getHistoryBasedPlanStatisti
{
return emptyList();
}

default Iterable<TracerProvider> getTracerProviders()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add javadoc

Comment on lines 18 to 26
String getName();

String getTracerType();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add javadoc

@achen2289 achen2289 force-pushed the achen/integrate-and-implement-open-telemetry-tracing branch 6 times, most recently from d171ea9 to 1e44c8f Compare November 29, 2022 19:02
@achen2289
Copy link
Contributor Author

@highker Trace handle implementation done

@highker highker self-requested a review November 30, 2022 04:32
Copy link
Contributor

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome change! All nits. Otherwise, good to go

@@ -211,19 +211,33 @@ else if (nameParts.size() == 2) {

this.sessionFunctions = parseSessionFunctionHeader(servletRequest);
this.sessionPropertyManager = requireNonNull(sessionPropertyManager, "sessionPropertyManager is null");
String tunnelTraceId = trimEmptyToNull(servletRequest.getHeader(PRESTO_TRACE_TOKEN));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's a bit anti pattern; but in order to keep the logic so we don't pollute the traceToken below. Check my comment below.


// If tunnel trace token is null, we expose the Presto tracing id.
// Otherwise we preserve the ability of trace token tunneling but
// still trace Presto internally for aggregation purposes.
traceToken = Optional.ofNullable(tunnelTraceId == null ? this.tracer.get().getTracerId() : tunnelTraceId);
traceToken = Optional.ofNullable(this.tracer.get().getTracerId());
}
else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new branch here: else if (trimEmptyToNull(servletRequest.getHeader(PRESTO_TRACE_TOKEN)) != null) or something like that. And inside the body, we can either create a SimpleTracerProvider and get the token or directly assign traceToken from PRESTO_TRACE_TOKEN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated trace token to replicate similar logic as before


private static Map<String, String> getRequestHeaderMap(HttpServletRequest servletRequest)
{
Map<String, String> headerMap = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use immutableMap builder

}
}

private static Map<String, String> getRequestHeaderMap(HttpServletRequest servletRequest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's not critical but good to follow:

we don't usually name a function or var with word "Map"; instead, we use getRequestHeaders. Same for the variable headerMap --> headers. And everything else in the PR.

@@ -211,19 +211,33 @@ else if (nameParts.size() == 2) {

this.sessionFunctions = parseSessionFunctionHeader(servletRequest);
this.sessionPropertyManager = requireNonNull(sessionPropertyManager, "sessionPropertyManager is null");
String tunnelTraceId = trimEmptyToNull(servletRequest.getHeader(PRESTO_TRACE_TOKEN));

Map<String, String> requestHeaderMap = getRequestHeaderMap(servletRequest);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: requestHeaders

if (headers.containsKey(PRESTO_B3_SINGLE_HEADER_PROPAGATION)) {
return "b3_single_header";
}
else if (headers.containsKey(PRESTO_W3C_PROPAGATION)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove else; it's redundant

CONTEXT_PROPAGATOR_ERROR,
"Only b3 single header context propagation mode is currently supported.");
}
else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove else; it's redundant

if (contextPropagator.equals(TracingConfig.ContextPropagator.B3_SINGLE_HEADER)) {
return headers.get(PRESTO_B3_SINGLE_HEADER_PROPAGATION);
}
else if (contextPropagator.equals(TracingConfig.ContextPropagator.W3C)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove else; it's redundant

@@ -113,6 +113,7 @@
QUERY_PLANNING_TIMEOUT(0x0001_0027, INTERNAL_ERROR),
NATIVE_EXECUTION_TASK_ERROR(0x0001_0028, INTERNAL_ERROR),
UNSUPPORTED_ANALYZER_TYPE(0x0001_0029, INTERNAL_ERROR),
CONTEXT_PROPAGATOR_ERROR(0x0001_002A, INTERNAL_ERROR),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a generic error; check my other comment

Comment on lines 92 to 87
throw new PrestoException(
CONTEXT_PROPAGATOR_ERROR,
"Only b3 single header context propagation mode is currently supported.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two options:

  • you can create your own error code class within the module with a brand new base code; check any other implementation of ErrorCodeSupplier; or
  • you can just throw runtime error or put generic code to be a catch-all to save the effort, which is not recommended but fine

@achen2289
Copy link
Contributor Author

@highker Addressed nits, and made one further change for open telemetry baggage propagation. Put it in new commit so it's easier to see and will squash if it looks good.

@highker highker self-requested a review November 30, 2022 22:57
@highker
Copy link
Contributor

highker commented Nov 30, 2022

Thanks. Please squash all commits into one.

@achen2289 achen2289 force-pushed the achen/integrate-and-implement-open-telemetry-tracing branch from 11edf44 to 19a4184 Compare November 30, 2022 23:07
Copy link
Contributor

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more nits; otherwise LGTM

@@ -211,19 +212,39 @@ else if (nameParts.size() == 2) {

this.sessionFunctions = parseSessionFunctionHeader(servletRequest);
this.sessionPropertyManager = requireNonNull(sessionPropertyManager, "sessionPropertyManager is null");
String tunnelTraceId = trimEmptyToNull(servletRequest.getHeader(PRESTO_TRACE_TOKEN));

ImmutableMap<String, String> requestHeaders = getRequestHeaders(servletRequest);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ImmutableMap<String, String> -> Map<String, String>

traceToken = Optional.ofNullable(tunnelTraceId == null ? this.tracer.get().getTracerId() : tunnelTraceId);
String tunnelTraceId = trimEmptyToNull(servletRequest.getHeader(PRESTO_TRACE_TOKEN));
if (tunnelTraceId != null) {
traceToken = Optional.ofNullable(tunnelTraceId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Optional.of(tunnelTraceId)

it's not null already

traceToken = Optional.ofNullable(tunnelTraceId);
}

private static ImmutableMap<String, String> getRequestHeaders(HttpServletRequest servletRequest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ImmutableMap<String, String> -> Map<String, String>

Comment on lines 30 to 34
public static class ContextPropagator
{
public static final String W3C = "w3c";
public static final String B3_SINGLE_HEADER = "b3_single_header";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


OpenTelemetryErrorCode(int code, ErrorType type)
{
errorCode = new ErrorCode(code, name(), type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a base so we don't collide with other error codes. Find a unique one closest to an existing one. I assume it's starting with 0x05... something
Screen Shot 2022-11-30 at 3 56 58 PM

this.openTelemetry = OpenTelemetryBuilder.build(contextPropagator);
this.currentContextPropagator = contextPropagator;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not expert in open telemetry but are we sure the logic is correct by modifying static variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it should be fine. If we have a different context propagator, we need to rebuild the open telemetry instance for all tracers that may exist.

@highker highker self-assigned this Dec 1, 2022
@achen2289 achen2289 force-pushed the achen/integrate-and-implement-open-telemetry-tracing branch 3 times, most recently from d227668 to 6ad17a0 Compare December 1, 2022 04:22
Copy link
Contributor

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last nit

public static final String OTEL = "otel";
}

public static class ContextPropagator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delete this class from this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

support simple tracer in new open telemetry implementation

bump okhttp3 version to 4.5.0

use system val if client val not set

add support for context extraction from b3 propagation and overall interface

revert back to okhttp3 version 3.9.0 for this PR

implement W3C baggage header extraction

tracer handle implementation, remove context propagator from system property, general refactor

address nits and open telemetry baggage updates
@achen2289 achen2289 force-pushed the achen/integrate-and-implement-open-telemetry-tracing branch from 6ad17a0 to 54c6530 Compare December 1, 2022 05:32
@highker highker changed the title achen/integrate and implement open telemetry tracing integrate and implement open telemetry tracing Dec 1, 2022
@highker highker merged commit 2c1d402 into prestodb:master Dec 1, 2022
@achen2289
Copy link
Contributor Author

@highker Thanks for the helpful reviews and merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants