-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22130][Core] UTF8String.trim() scans " " twice #19355
Conversation
} | ||
// skip all of the space (0x20) in the right side | ||
while (e >= 0 && getByte(e) == 0x20) e--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: while you're optimizing this, you can bring the declaration of e down here, as it won't be used unless there's a non-space char.
I think this condition can start with e > s
too. At the end, s and e point to the first and last non-space char. When the loop starts, s points to a non-space char. So you can stop when e == s; this is the case of one non-space char.
Might be worth adding test cases for an empty string, and single-non-char string too.
Test build #82205 has finished for PR 19355 at commit
|
Test build #82209 has finished for PR 19355 at commit
|
Retest this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM, too.
Test build #82210 has finished for PR 19355 at commit
|
Merged to master. |
What changes were proposed in this pull request?
This PR allows us to scan a string including only white space (e.g.
" "
) once while the current implementation scans twice (right to left, and then left to right).How was this patch tested?
Existing test suites