Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Q] How to trace microservices? #1076

Closed
mrdulin opened this issue Jul 26, 2019 · 13 comments
Closed

[Q] How to trace microservices? #1076

mrdulin opened this issue Jul 26, 2019 · 13 comments
Assignees
Labels
api: cloudtrace Issues related to the googleapis/cloud-trace-nodejs API. type: question Request for information or clarification. Not an issue.

Comments

@mrdulin
Copy link

mrdulin commented Jul 26, 2019

My application is designed as microservice. Here is my application arch:

service A Web API => event-driven cloud function => service B API

I want to use Cross-Service Trace Contexts to trace a completed business transaction which means I expect below trace span.

Here is a demo for Cross-Service Trace Contexts feature. But I use the local nodejs web server. There is no background cloud function.

I deployed my real application services above on GCP. But the cross-service trace contexts doesn't work well. I use trace context in my background cloud function like this:

const pkg = require('./package.json');
const tracer = require('@google-cloud/trace-agent').start({
  serviceContext: {
    service: pkg.name,
    version: pkg.version
  }
});
const fetch = require('node-fetch');
const CLOUD_FUNCTION_PUBSUB_SERVICE_URL = process.env.CLOUD_FUNCTION_PUBSUB_SERVICE_URL || '';
console.log('CLOUD_FUNCTION_PUBSUB_SERVICE_URL: ', CLOUD_FUNCTION_PUBSUB_SERVICE_URL);

exports.TestCloudTracePubSubService = async function TestCloudTrace(data, context) {
  const pubSubMessage = data;
  const jsonString = pubSubMessage.data ? Buffer.from(pubSubMessage.data, 'base64').toString() : '{}';
  const message = JSON.parse(jsonString);
  const span = tracer.createChildSpan({ name: pkg.name, traceContext: message.traceContext });
  span.addLabel('traceContext', message.traceContext);
  console.log(`message: ${JSON.stringify(message, null, 2)}`);

  const body = await fetch(CLOUD_FUNCTION_PUBSUB_SERVICE_URL).then((res) => res.json());
  console.log(`body: ${JSON.stringify(body, null, 2)}`);

  span.endSpan();
};

I try to create child span with traceContext passed by event message. But it doesn't work.

As you can see, there is only a root span. Not only for background cloud function, but also for HTTP cloud function.

So, how can I trace a whole business transaction? The microservice arch is designed like this:

GKE service A API => background cloud function => GKE service B API

I can't even add a label to cloud trace within HTTP and background functions.

I got below error in my HTTP cloud function:

@google-cloud/trace-agent ERROR StackdriverTracer#start: Tracing might not work as the following modules were loaded before the trace agent was initialized: [accepts, array-flatten, body-parser, bytes, content-disposition, content-type, cookie, cookie-signature, debug, depd, destroy, ee-first, encodeurl, escape-html, etag, express, finalhandler, forwarded, fresh, http-errors, iconv-lite, inherits, ipaddr.js, media-typer, merge-descriptors, methods, mime, mime-db, mime-types, ms, negotiator, on-finished, parseurl, path-to-regexp, proxy-addr, qs, range-parser, raw-body, safe-buffer, send, serve-static, setprototypeof, statuses, type-is, unpipe, utils-merge, vary]

I am thinking use trace REST API within HTTP and backgroud Cloud function, but I am not sure that how can I use traceContext generated from cloud trace library with trace REST API.

I want to generate a trace and spans like below:

图片

I want to trace a whole business transaction, even some microservice is called asynchronously(background cloud function).

Is this possible? Any advice will be good!

@kjin kjin self-assigned this Jul 26, 2019
@kjin
Copy link
Contributor

kjin commented Jul 27, 2019

Hi @mrdulin, I assume you are using Google Cloud Functions. GCF has been working on supporting this library but I believe this support is not yet released. Unfortunately because of how the platform works, using require('@google-cloud/trace-agent') in your cloud function will not work and result in the error message you encountered above (note: v4.1.0+ of this library makes it a warning rather than an error).

In the meantime, you can try this workaround that uses the custom span API:

https://gist.github.com/kjin/3b86f1eaf6c8ee9c458df8708e77ae53

Note two things: that (1) you would be manually be creating root spans via tracer.runInRootSpan (instead of having them created automatically by our express plugin), and (2) you would be manually specifying the span context via the traceContext option in conjunction with tracer.propagation.extract to process the incoming headers.

Let me know how that works for you!

@mrdulin
Copy link
Author

mrdulin commented Jul 29, 2019

@kjin Thanks for replying. Your example is very helpful, but what about the background cloud function? Should I pass traceContext parsed from message to runInRootSpan? Is this a correct way?

I am trying to use tracer.runInRootSpan in backgroud cloud function and pass traceContext parsed from message payload to it. But it seems doesn't work like HTTP cloud function.

const tracer = require('@google-cloud/trace-agent').start({
  samplingRate: 0,
  bufferSize: 1,
  serviceContext: { service: 'TestCloudTracePubSubService' }
});
const fetch = require('node-fetch');
const CLOUD_FUNCTION_HTTP_SERVICE_URL = process.env.CLOUD_FUNCTION_HTTP_SERVICE_URL || '';
console.log('CLOUD_FUNCTION_HTTP_SERVICE_URL: ', CLOUD_FUNCTION_HTTP_SERVICE_URL);

exports.TestCloudTracePubSubService = function TestCloudTrace(data, context, callback) {
  const pubSubMessage = data;
  const jsonString = pubSubMessage.data ? Buffer.from(pubSubMessage.data, 'base64').toString() : '{}';
  const message = JSON.parse(jsonString);
  console.log(`message: ${JSON.stringify(message, null, 2)}`);
  const traceContext = parseContextFromHeader(message.traceContext);
  tracer.runInRootSpan(
    {
      name: 'TestCloudTracePubSubService', // Your function name here
      traceContext
    },
    async (rootSpan) => {
      // Your logic here. Just be sure to call endSpan() after sending a response.
      const fetchChildSpan = tracer.createChildSpan({ name: 'fetch data from http cloud function' });
      const fetchChildSpanTraceContext = fetchChildSpan.getTraceContext();
      console.log('fetchChildSpanTraceContext: ', fetchChildSpanTraceContext);
      const body = await fetch(CLOUD_FUNCTION_HTTP_SERVICE_URL).then((res) => res.json());
      console.log(`body: ${JSON.stringify(body, null, 2)}`);
      fetchChildSpan.endSpan();
      rootSpan.endSpan();
      callback();
    }
  );
};

function parseContextFromHeader(str) {
  if (!str) {
    return null;
  }
  const matches = str.match(/^([0-9a-fA-F]+)(?:\/([0-9]+))(?:;o=(.*))?/);
  if (!matches || matches.length !== 4 || matches[0] !== str || (matches[2] && isNaN(Number(matches[2])))) {
    return null;
  }
  return {
    traceId: matches[1],
    spanId: matches[2],
    options: isNaN(Number(matches[3])) ? undefined : Number(matches[3])
  };
}

The trace timeline for background cloud function:

图片

update

Here is my nodejs web service (service A):

https://github.com/mrdulin/nodejs-gcp/blob/master/src/stackdriver/trace/cloud-function/server.js

The trace timeline:

图片

Here is my HTTP cloud function (service B):

https://github.com/mrdulin/nodejs-gcp/blob/master/src/stackdriver/trace/cloud-function/index.js

The trace timeline:

图片

As you can see, they are separated into two traces. I want to associate them with the same trace context. How can I do this? I am trying to set x-cloud-trace-context request header when I call service B(The HTTP cloud function) and use it as traceContext for service B. But the trace timeline is weird.

For a business transaction, the traceId for service A is aae391c35efd448cb6bdc49c1a5fd87d, the traceId for service B is 64ed01636afb93f10f8478eae63b9bf3.

They aren't same, so I think that's why they are separated into two traces. I think if service A and service B use the same traceId, there will be one trace timeline for my business transaction. Does it make sense?

If there are examples for using cross-service trace context and. context propagation, that will be great.

@mrdulin
Copy link
Author

mrdulin commented Jul 29, 2019

For now, I am using the same traceId and pass it to each downstream service like this:

web service 1(publish message) => background cloud function => HTTP cloud function => web service 2

Here is the trace timeline:

图片

It seems work, but I don't know does it make sense.

@kjin
Copy link
Contributor

kjin commented Jul 29, 2019

So it seems like there are two separate issues at play here:

  1. Trace ID doesn't match up between A -> B (where A is a Node server, B is an HTTP cloud function)
    • After looking at this a little bit I think this is because node-fetch doesn't work with the Trace Agent. I was surprised by this but I think it's due to the fact that we use it internally. This is a bug in our library and I'll open an issue to track its fix shortly.
  2. Background cloud functions don't get correlated
    • This might be an impossible problem to solve. IIUC background cloud functions have no direct communication channel to the server/function that indirectly triggered them, so there's no proper way I can think of to pass the Trace ID to a background cloud function. You could somehow embed the trace ID in the incoming data (like for a PubSub object, an "attribute"?). But at the same time, I would not expect a background function to have a latency impact; in your screenshot, the spans describing it are completely disjoint from the main request, right?

@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Jul 29, 2019
@mrdulin
Copy link
Author

mrdulin commented Jul 30, 2019

@kjin

I made a demo for tracing cross services. Here are my services:

.
├── 01-web-api
├── 02-create-budget
├── 03-create-campaign
├── 04-create-adgroup
├── 05-create-ad
├── 06-google-adwords-api
├── deploy.sh
├── docs
└── utils.js

The workflow is from 01-web-api to 06-google-adwords-api

02 - 05 services are background cloud functions

Here is the timeline, it seems work. But I am not sure does it make sense.

The key point for trace cross services is to use the same traceContext.

I pass the traceContext to background cloud function by pubsub message payload(attribute will be fine as well). And forward traceContext to 06-google-adwords-api(RESTful web service)

图片

The question is, can you tell me does this make sense? I found an issue that the cold start time of cloud function will affect the latency.

@kjin
Copy link
Contributor

kjin commented Jul 30, 2019

I think it makes sense. If I understand correctly, this is how I would guess things work:

  1. User hits /campaign endpoint, which is part of 01-web-api. This results in a PubSub publish.
  2. Four background cloud functions 02-05 are triggered sequentially as a result of the publish. Each one makes a request to 06-google-adwords-api. However, the background cloud function spans do not show up in the screenshot, so all we see is that the /GoogleAdwordsAPI endpoint gets hit four times over non-overlapping time segments.

And that you were able to get 06 to show up because you passed the trace context from 01 to 06 via each of 02-05. You receive the value of the trace context in each of these four background functions 02-05 via the input data attribute, but only pass it onto 06 rather than using it in these functions. So the one missing part of this diagram, if any, is the spans that represent the running time of each of 02-05 (which should be technically possible since you have that information).

@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Jul 31, 2019
@bcoe bcoe added type: question Request for information or clarification. Not an issue. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Jul 31, 2019
@kjin
Copy link
Contributor

kjin commented Aug 8, 2019

Hi @mrdulin -- I'm going to close this issue. Please comment if you have any other Q's and I'll re-open. Thanks!

@kjin kjin closed this as completed Aug 8, 2019
@klerick
Copy link

klerick commented Aug 12, 2019

@kjin I think, The problem is in root span. If I close root span in the first microservice, I can not see all the time of work my microservices. But If I do not close root span in the first microservice, I will get a memory leak. I think If we have the feature: send trace data from children span and clear memory without close root span(We can do it in another service), it will be a good point for trace microservices.

@kjin
Copy link
Contributor

kjin commented Aug 12, 2019

@klerick Thanks for the suggestion. I don't think we'll have the bandwidth to implement such a feature, though. (I'm also not sure it's possible to start a span in one service and end it in another, the Stackdriver Trace API may not allow that.)

@klerick
Copy link

klerick commented Aug 13, 2019

@kjin I think, the Stackdriver Trace API does not depend on it because this behavior is common for Stackdriver. I tried to extend the class RootSpanData but the filed "children" is private.
For example:
In the first microservice we create rootSpan and create childSpan if needed. Then we close childSpan, send trace to StrackDriver and clean the child filled in rootspan class. We can get data from rootSpan, such us getCurrentRootSpan, getCurrentContextId. This data pushes to the next microservice and create rootspan with these details. if this microservice is the last one, we close rootspan.
I can try to create PR, but I’m not sure that I know all the details of the library.

@klerick
Copy link

klerick commented Aug 14, 2019

@kjin I think If will add the feature, create root span from existing data. We can use the trace agent for microservice.
From StrackDriver documantation:

Sends new traces to Stackdriver Trace or updates existing traces. If the ID of a trace that you send matches that of an existing trace, any fields in the existing trace and its spans are overwritten by the provided values, and any new fields provided are merged with the existing trace data. If the ID does not match, a new trace is created.

We will have the opportunity to change endTime for root span and we will have correctly charts)

@google-cloud-label-sync google-cloud-label-sync bot added the api: cloudtrace Issues related to the googleapis/cloud-trace-nodejs API. label Jan 31, 2020
@manwithsteelnerves
Copy link

Can someone share a workflow for implementing distributed tracing in cloud functions/cloud run?

@manwithsteelnerves
Copy link

@bcoe @mrdulin Can you guys confirm if its still the same way we need to do?
I was expecting it is automated by now but unfortunately, only see first service trace alone(cloud run) but not the pubsub function trigger trace.

Right now I just have imported at start of each service alone and don't have any further code. Do I need to pass the context from each service and pass it to others(pubsub subscriptions/other services)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: cloudtrace Issues related to the googleapis/cloud-trace-nodejs API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

6 participants