Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22130][Core] UTF8String.trim() scans " " twice #19355

Closed
wants to merge 2 commits into from

Conversation

kiszk
Copy link
Member

@kiszk kiszk commented Sep 26, 2017

What changes were proposed in this pull request?

This PR allows us to scan a string including only white space (e.g. " ") once while the current implementation scans twice (right to left, and then left to right).

How was this patch tested?

Existing test suites

}
// skip all of the space (0x20) in the right side
while (e >= 0 && getByte(e) == 0x20) e--;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: while you're optimizing this, you can bring the declaration of e down here, as it won't be used unless there's a non-space char.

I think this condition can start with e > s too. At the end, s and e point to the first and last non-space char. When the loop starts, s points to a non-space char. So you can stop when e == s; this is the case of one non-space char.

Might be worth adding test cases for an empty string, and single-non-char string too.

@SparkQA
Copy link

SparkQA commented Sep 26, 2017

Test build #82205 has finished for PR 19355 at commit 243f681.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 26, 2017

Test build #82209 has finished for PR 19355 at commit d39c648.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM, too.

@SparkQA
Copy link

SparkQA commented Sep 26, 2017

Test build #82210 has finished for PR 19355 at commit d39c648.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in 12e740b Sep 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants