-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for word boundaries \b
and \B
#5479
Add support for word boundaries \b
and \B
#5479
Conversation
Signed-off-by: Anthony Chang <antchang@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A different situation here; can you add a unit test for the fallback to CPU when in split mode?
Signed-off-by: Anthony Chang <antchang@nvidia.com>
build |
Signed-off-by: Anthony Chang <antchang@nvidia.com>
Signed-off-by: Anthony Chang <antchang@nvidia.com>
build |
…pport-word-boundaries Signed-off-by: Anthony Chang <antchang@nvidia.com>
build |
The tests fail with this case
I will need to look into whether we can support word boundaries surrounding string anchors. If not, we will need to fallback to CPU similar to in #5610 |
…pport-word-boundaries Signed-off-by: Anthony Chang <antchang@nvidia.com>
Signed-off-by: Anthony Chang <antchang@nvidia.com>
Signed-off-by: Anthony Chang <antchang@nvidia.com>
Signed-off-by: Anthony Chang <antchang@nvidia.com>
build |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I see that we already cover \b
and \B
in the scala fuzz tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be for 22.08 now
build |
1 similar comment
build |
Closes #4517, closes #4289
cuDF now supports word boundaries so we no longer need to fallback to CPU for them. However, there are consistencies from cuDF (see #5478) so word boundaries are still disabled for GpuStringSplit.
Signed-off-by: Anthony Chang antchang@nvidia.com