-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: remove ambiguity on '\v' from line-protocol parser #8720
fix: remove ambiguity on '\v' from line-protocol parser #8720
Conversation
The line protocol parser allows vertical tab ('\v') in whitespace, but also allows it in measurements, tags keys, tag values, and field keys. This ambiguity causes a blowup in the parsing state machine and triggers undefined behaviour when the vertical tab character is seen. The parser will attempt to simultaneously extend a compoent of the line, and move on to the next one. This patch removes vertical tab from measurements, tags, and fields. The resulting state machine goes from 745 states to 90, with a similar reduction in the number of lines of generated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the pull request!
🤝 ✒️ Just a reminder that the CLA has not yet been signed, and we'll need it before merging. Please sign the CLA when you get a chance, then post a comment here saying !signed-cla
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ 🎉
While this change makes definitively sense I wonder if this is a breaking change. Could you guy please outline if the vertical tab ever worked in any case!? If not we should merge this right away but otherwise ppl might rely on it... |
This change would only affect key names, key values, and field names, not field values. |
@srebhan In general, when it comes to parser ambiguities in ragel parsers, the parser would be entering into undefined behaviour and bad things would be happening. It would be simultaneously considering both possibilities as it moves forward. In this particular case though, the ambiguity is between recognizing whitespace, which simply terminates the previous string, and extending the string. So all that happens is some strings get terminated early or have the wrong value in them. But it only happens in places where whitespace is permitted.. So to sum up what happened before this patch, a
So to answer your question, |
See influxdata/telegraf#8720 for discussion.
The line protocol parser allows vertical tab ('\v') in whitespace, but also
allows it in measurements, tags keys, tag values, and field keys. This
ambiguity causes a blowup in the parsing state machine and triggers undefined
behaviour when the vertical tab character is seen. The parser will attempt to
simultaneously extend a compoent of the line, and move on to the next one.
This patch removes vertical tab from measurements, tags, and fields. The
resulting state machine goes from 745 states to 90, with a similar reduction in
the number of lines of generated code.