Skip to content

Commit

Permalink
[SPARK-22130][CORE] UTF8String.trim() scans " " twice
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

This PR allows us to scan a string including only white space (e.g. `"     "`) once while the current implementation scans twice (right to left, and then left to right).

## How was this patch tested?

Existing test suites

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes #19355 from kiszk/SPARK-22130.
  • Loading branch information
kiszk authored and HyukjinKwon committed Sep 27, 2017
1 parent d2b8b63 commit 12e740b
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 6 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -498,17 +498,16 @@ private UTF8String copyUTF8String(int start, int end) {

public UTF8String trim() {
int s = 0;
int e = this.numBytes - 1;
// skip all of the space (0x20) in the left side
while (s < this.numBytes && getByte(s) == 0x20) s++;
// skip all of the space (0x20) in the right side
while (e >= 0 && getByte(e) == 0x20) e--;
if (s > e) {
if (s == this.numBytes) {
// empty string
return EMPTY_UTF8;
} else {
return copyUTF8String(s, e);
}
// skip all of the space (0x20) in the right side
int e = this.numBytes - 1;
while (e > s && getByte(e) == 0x20) e--;
return copyUTF8String(s, e);
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -222,10 +222,13 @@ public void substring() {

@Test
public void trims() {
assertEquals(fromString("1"), fromString("1").trim());

assertEquals(fromString("hello"), fromString(" hello ").trim());
assertEquals(fromString("hello "), fromString(" hello ").trimLeft());
assertEquals(fromString(" hello"), fromString(" hello ").trimRight());

assertEquals(EMPTY_UTF8, EMPTY_UTF8.trim());
assertEquals(EMPTY_UTF8, fromString(" ").trim());
assertEquals(EMPTY_UTF8, fromString(" ").trimLeft());
assertEquals(EMPTY_UTF8, fromString(" ").trimRight());
Expand Down

0 comments on commit 12e740b

Please sign in to comment.