[SPARK-22130][Core] UTF8String.trim() scans " " twice #19355

kiszk · 2017-09-26T16:54:56Z

What changes were proposed in this pull request?

This PR allows us to scan a string including only white space (e.g. " ") once while the current implementation scans twice (right to left, and then left to right).

How was this patch tested?

Existing test suites

srowen · 2017-09-26T17:45:40Z

common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java

    }
+    // skip all of the space (0x20) in the right side
+    while (e >= 0 && getByte(e) == 0x20) e--;


Nit: while you're optimizing this, you can bring the declaration of e down here, as it won't be used unless there's a non-space char.

I think this condition can start with e > s too. At the end, s and e point to the first and last non-space char. When the loop starts, s points to a non-space char. So you can stop when e == s; this is the case of one non-space char.

Might be worth adding test cases for an empty string, and single-non-char string too.

SparkQA · 2017-09-26T20:05:40Z

Test build #82205 has finished for PR 19355 at commit 243f681.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-09-26T20:27:11Z

Test build #82209 has finished for PR 19355 at commit d39c648.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2017-09-26T20:32:57Z

Retest this please.

dongjoon-hyun

+1, LGTM, too.

SparkQA · 2017-09-26T23:48:15Z

Test build #82210 has finished for PR 19355 at commit d39c648.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-09-27T14:20:11Z

Merged to master.

initial commit

243f681

srowen reviewed Sep 26, 2017

View reviewed changes

address review comment

d39c648

srowen approved these changes Sep 26, 2017

View reviewed changes

dongjoon-hyun approved these changes Sep 26, 2017

View reviewed changes

HyukjinKwon approved these changes Sep 27, 2017

View reviewed changes

asfgit closed this in 12e740b Sep 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-22130][Core] UTF8String.trim() scans " " twice #19355

[SPARK-22130][Core] UTF8String.trim() scans " " twice #19355

kiszk commented Sep 26, 2017

srowen Sep 26, 2017

SparkQA commented Sep 26, 2017

SparkQA commented Sep 26, 2017

dongjoon-hyun commented Sep 26, 2017

dongjoon-hyun left a comment

SparkQA commented Sep 26, 2017

HyukjinKwon commented Sep 27, 2017

[SPARK-22130][Core] UTF8String.trim() scans " " twice #19355

[SPARK-22130][Core] UTF8String.trim() scans " " twice #19355

Conversation

kiszk commented Sep 26, 2017

What changes were proposed in this pull request?

How was this patch tested?

srowen Sep 26, 2017

Choose a reason for hiding this comment

SparkQA commented Sep 26, 2017

SparkQA commented Sep 26, 2017

dongjoon-hyun commented Sep 26, 2017

dongjoon-hyun left a comment

Choose a reason for hiding this comment

SparkQA commented Sep 26, 2017

HyukjinKwon commented Sep 27, 2017