-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use simdjson for json_parse #7658
Conversation
✅ Deploy Preview for meta-velox canceled.
|
This pull request was exported from Phabricator. Differential Revision: D51469435 |
Summary: In `json_parse` when the input is invalid, we throw exception and it's slow (both the creation and throwing). To avoid creating or throwing the exception, we switch the implementation to `simdjson` and set a pre-canned exception when the input is invalid. This reduces the CPU time in some queries (with JSON validity check in filter) by more than 20 times (from 2.34 days to 2.68 hours). Differential Revision: D51469435
e9e8663
to
14aa377
Compare
This pull request was exported from Phabricator. Differential Revision: D51469435 |
Summary: In `json_parse` when the input is invalid, we throw exception and it's slow (both the creation and throwing). To avoid creating or throwing the exception, we switch the implementation to `simdjson` and set a pre-canned exception when the input is invalid. This reduces the CPU time in some queries (with JSON validity check in filter) by more than 20 times (from 2.34 days to 2.68 hours). Differential Revision: D51469435
14aa377
to
ae4ad91
Compare
This pull request was exported from Phabricator. Differential Revision: D51469435 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clever. Thanks.
@@ -114,6 +139,9 @@ class JsonParseFunction : public exec::VectorFunction { | |||
.argumentType("varchar") | |||
.build()}; | |||
} | |||
|
|||
private: | |||
mutable simdjson::dom::parser parser_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I tried ondemand parser first and find it is not able to invalidate some malformed JSON.
Summary: In `json_parse` when the input is invalid, we throw exception and it's slow (both the creation and throwing). To avoid creating or throwing the exception, we switch the implementation to `simdjson` and set a pre-canned exception when the input is invalid. This reduces the CPU time in some queries (with JSON validity check in filter) by more than 20 times (from 2.34 days to 2.68 hours). Reviewed By: mbasmanova Differential Revision: D51469435
ae4ad91
to
a323eec
Compare
Summary: In `json_parse` when the input is invalid, we throw exception and it's slow (both the creation and throwing). To avoid creating or throwing the exception, we switch the implementation to `simdjson` and set a pre-canned exception when the input is invalid. This reduces the CPU time in some queries (with JSON validity check in filter) by more than 20 times (from 2.34 days to 2.68 hours). Reviewed By: mbasmanova Differential Revision: D51469435
This pull request was exported from Phabricator. Differential Revision: D51469435 |
a323eec
to
75fc8fd
Compare
This pull request was exported from Phabricator. Differential Revision: D51469435 |
This pull request has been merged in 0460044. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: In
json_parse
when the input is invalid, we throw exception and it's slow (both the creation and throwing). To avoid creating or throwing the exception, we switch the implementation tosimdjson
and set a pre-canned exception when the input is invalid. This reduces the CPU time in some queries (with JSON validity check in filter) by more than 20 times (from 2.34 days to 2.68 hours).Differential Revision: D51469435