Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove ambiguity on '\v' from line-protocol parser #8720

Merged
merged 1 commit into from
Jan 20, 2021

Conversation

adrian-thurston
Copy link
Contributor

The line protocol parser allows vertical tab ('\v') in whitespace, but also
allows it in measurements, tags keys, tag values, and field keys. This
ambiguity causes a blowup in the parsing state machine and triggers undefined
behaviour when the vertical tab character is seen. The parser will attempt to
simultaneously extend a compoent of the line, and move on to the next one.

This patch removes vertical tab from measurements, tags, and fields. The
resulting state machine goes from 745 states to 90, with a similar reduction in
the number of lines of generated code.

The line protocol parser allows vertical tab ('\v') in whitespace, but also
allows it in measurements, tags keys, tag values, and field keys. This
ambiguity causes a blowup in the parsing state machine and triggers undefined
behaviour when the vertical tab character is seen. The parser will attempt to
simultaneously extend a compoent of the line, and move on to the next one.

This patch removes vertical tab from measurements, tags, and fields. The
resulting state machine goes from 745 states to 90, with a similar reduction in
the number of lines of generated code.
Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the pull request!
🤝 ✒️ Just a reminder that the CLA has not yet been signed, and we'll need it before merging. Please sign the CLA when you get a chance, then post a comment here saying !signed-cla

Copy link
Contributor

@ssoroka ssoroka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ 🎉

@srebhan
Copy link
Member

srebhan commented Jan 20, 2021

While this change makes definitively sense I wonder if this is a breaking change. Could you guy please outline if the vertical tab ever worked in any case!? If not we should merge this right away but otherwise ppl might rely on it...

@srebhan srebhan self-assigned this Jan 20, 2021
@srebhan srebhan added the fix pr to fix corresponding bug label Jan 20, 2021
@ssoroka
Copy link
Contributor

ssoroka commented Jan 20, 2021

This change would only affect key names, key values, and field names, not field values.
I suspect it's more likely that if you're using odd characters in these you're writing data that you won't be able to query or do anything with. I'm in support of merging this.

@adrian-thurston
Copy link
Contributor Author

@srebhan In general, when it comes to parser ambiguities in ragel parsers, the parser would be entering into undefined behaviour and bad things would be happening. It would be simultaneously considering both possibilities as it moves forward.

In this particular case though, the ambiguity is between recognizing whitespace, which simply terminates the previous string, and extending the string. So all that happens is some strings get terminated early or have the wrong value in them. But it only happens in places where whitespace is permitted.. So to sum up what happened before this patch, a \v in:

  1. measurement name: measurement parses wrong
  2. key name: \v accepted
  3. key value: parses wrong
  4. value name: \v accepted

So to answer your question, \v worked only in key name and value name. In the other places it parsed wrong. So we would be breaking for anywone who is currently using \v successfully in those places.

@ssoroka ssoroka merged commit 4462b17 into influxdata:master Jan 20, 2021
@adrian-thurston adrian-thurston deleted the fix/lp-vertical-tab branch January 20, 2021 23:40
adrian-thurston added a commit to influxdata/line-protocol that referenced this pull request Feb 8, 2021
arstercz pushed a commit to arstercz/telegraf that referenced this pull request Mar 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix pr to fix corresponding bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants