Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Regexp: Add support for word and non-word boundaries in regexp pattern #4289

Closed
andygrove opened this issue Dec 3, 2021 · 1 comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request

Comments

@andygrove
Copy link
Contributor

andygrove commented Dec 3, 2021

Is your feature request related to a problem? Please describe.
We currently fall back to CPU for regexp patterns containing word (\b) and non-word (\B) boundaries in regexp patterns.

Here is one example of a difference between CPU and GPU for regexp_replace.

The test is effectively running regexp_replace("A\nB", "\b", "_REPLACE").

CPU output:

_REPLACE_A_REPLACE_\n_REPLACE_B_REPLACE_

GPU output:

_REPLACE__REPLACE__REPLACE_A\nB

Describe the solution you'd like
Support on GPU consistently with Spark.

Describe alternatives you've considered
None

Additional context
None

@andygrove andygrove added feature request New feature or request ? - Needs Triage Need team to review and classify labels Dec 3, 2021
@Salonijain27 Salonijain27 removed the ? - Needs Triage Need team to review and classify label Dec 7, 2021
@andygrove andygrove self-assigned this Dec 7, 2021
@andygrove andygrove modified the milestone: Nov 30 - Dec 10 Dec 7, 2021
@andygrove
Copy link
Contributor Author

Related cuDF issue: rapidsai/cudf#9950

@andygrove andygrove added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Dec 22, 2021
@andygrove andygrove removed their assignment Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants