Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracing query comments #6387

Closed
wants to merge 2 commits into from

Conversation

arka-g
Copy link
Collaborator

@arka-g arka-g commented Jun 26, 2020

Setup

While trying to enable the opentracing-jaeger tracer for vitess, I ran into an issue with how the span information is parsed in vitess. We connect to vtgates using the mysql protocol, and the trace information is passed to vitess using leading query comments.

Problem

In extractMapFromString (code), the string is split on the : delimeter and then each of those as key/value pairs are passed to the opentracing library.

In the opentracing library, there is similar validation done where it expects key value pairs (code) and then depending on the key, the value is further split on :. The code for that is here.

So, we are splitting on a delimeter that is expected to be passed along as values in the key/value pairs containing trace information. The docs for the jaeger client also say this is the expected format of values (ref)

arka-g added 2 commits June 12, 2020 08:04
…. doing this in vtgate is not compatible with the opentracing-jaeger library

Signed-off-by: Arka Ganguli <arkaganguli1@gmail.com>
Signed-off-by: Arka Ganguli <arkaganguli1@gmail.com>
@sougou sougou requested a review from systay June 29, 2020 20:38
@systay
Copy link
Collaborator

systay commented Jun 30, 2020

The code looks good, but I think we need to solve this in a way that does not introduce new conflicts. I coded up an alternative solution here: #6398

@systay
Copy link
Collaborator

systay commented Jul 1, 2020

I think the current way of doing this is broken, and changing the splitting character is not going to be enough. Since this part of the API, and changing it is a breaking change, we should make sure that we come up with a way that is future proof.

Instead of coming up with out own format, WDYT about simply using an established format, such as protobufs TextFormat, or maybe even just using JSON?

/*VT_SPAN_CONTEXT={"uber-trace-id":"123:456:789:1"}*/ SELECT * FROM table

and this works fine - until someone sticks a */ into a key or value. Not sure how to cleanly handle that situation. Ideas? /cc @sougou @harshit-gangal @deepthi

@arka-g
Copy link
Collaborator Author

arka-g commented Jul 2, 2020

Parsing it as JSON sounds good to me. Doesn't the */ problem exist today as well if you rely on passing trace information via leading query comments?

/*VT_SPAN_CONTEXT=uber-trac*/e-id=123:456:789:1*/select * from table;
ERROR 1105 (HY000): vtgate: unrecognized statement:  e-id=123:456:789:1*/select * from table

@shlomi-noach
Copy link
Contributor

Once we parse tracer information, can we throw away the comment? If so, we can use "custom" comment delimiters, such as

  • /*vt:comment
  • vt:endcomment*/`

which are valid comments SQL-wise; if vitess is the first to intercept these, then it's:

  • reasonable to assume an innocent client will not use vt:uncomment anywhere
  • safe to strip out that comment from the query when delegating the query down the line

I could be out of context here, or maybe there's SQL parsers that need to intercept the query before vitess does?

@sougou
Copy link
Contributor

sougou commented Jul 13, 2020

What kind of metadata do we send through here? If we are the ones generating it, can we make sure that the metadata has no delimiters, allowing only ids and numbers. If so, we can safely use , and : as delimiters, and things should just parse out fine with the simple strings.Split.

@systay
Copy link
Collaborator

systay commented Jul 14, 2020

The key/value pairs are generated by the open tracing backend, and not us. Don't think we can change what they use.

@systay
Copy link
Collaborator

systay commented Jul 14, 2020

Ok, how about json that is base64 encoded?

instead of

/*VT_SPAN_CONTEXT={"uber-trace-id":"123:456:789:1"}*/ SELECT * FROM table

it would be

/*VT_SPAN_CONTEXT=eyJ1YmVyLXRyYWNlLWlkIjoiMTIzOjQ1Njo3ODk6MSJ9*/ SELECT * FROM table

base64/json might not be the most efficient way of doing it, but it's simple and has good support on all platforms. WDYT?

@sougou
Copy link
Contributor

sougou commented Jul 14, 2020

Base64 encoding is a brilliant idea :).

But, I still have one last question before we commit to it. Eventhough that info is generated by the tool, it mostly comes from our span tags right? So, as long as the tool doesn't intentionally add crazy stuff the content should be normalized.

@arka-g
Copy link
Collaborator Author

arka-g commented Jul 14, 2020

In our case, we are generating the span ids in our application which are passed along to the jaeger-client adapter in vitess. The ids we generate should be valid for both zipkin and jaeger clients, but the reason why we need to pass the information encoded in this way is because those libraries expect the values in a certain format.

@arka-g
Copy link
Collaborator Author

arka-g commented Jul 22, 2020

Closing this PR in favor of #6463. Thanks @systay!

@arka-g arka-g closed this Jul 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants