fix: regexp_split fails in empty match pattern #12305
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The implementation of Re2RegexpSplit has a bug, that it might fail to split the string when the pattern is a empty string itself.
For example, in the function calling: regexp_split("abcd", "").
The expected result is {"", "a", "b", "c", "d", ""}, but the current implementation would throw error.
This testcase comes from presto https://github.com/prestodb/presto/blob/099bd42eba287b1ea25bf55404c7a18882e0f6d5/presto-main/src/test/java/com/facebook/presto/operator/scalar/AbstractTestRegexpFunctions.java#L231
See detailed description in #12304.
This PR fix this bug in Re2RegexpSplit implementation.