Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builtin,js: implement JS string.split_any #21612

Merged
merged 3 commits into from
Jun 8, 2024

Conversation

juan-db
Copy link
Contributor

@juan-db juan-db commented May 30, 2024

Implement string.split_any for the JS backend.

This PR fixes #21440.

@JalonSolov
Copy link
Contributor

JalonSolov commented May 31, 2024

In the string 2345, there is nothing after the 5, therefore nothing to split. The only splits would be between 2 and 3, 3 and 4, and 4 and 5.

The only thing after the 5, is the NUL at the end, which is not a "real" character.

@JalonSolov
Copy link
Contributor

Doing it like this is just as bad as the Go utils.TrimSplit() function... if you call that on an empty string, it does NOT return an empty array... it returns an array with one element, and that element is ''.

@juan-db
Copy link
Contributor Author

juan-db commented May 31, 2024

In the string 2345, there is nothing after the 5, therefore nothing to split. The only splits would be between 2 and 3, 3 and 4, and 4 and 5.

The only thing after the 5, is the NUL at the end, which is not a "real" character.

What is the V version splitting in the 1213.split_any('1') case then to give ['', '2', '3']? Nothing precedes the first character in the string.

@JalonSolov
Copy link
Contributor

JalonSolov commented May 31, 2024

It split with 1 as the delimiter. Since there was only an "empty string" before the first delimiter, that's all it gave.

Same as ';2;3'.split_any(';')

@juan-db
Copy link
Contributor Author

juan-db commented May 31, 2024

I understand that reasoning, but it feels like that's more driven by implementation result than actual intention.

The most common use case I know for split is for CSV values. In these cases, you expect it to split the string every time it encounters the delimiter.

E.g. if you had the following CSV:

person,start,end,role,earnings,manager
john,2015,,founder,1USD,

you'd want to reliably and conveniently extract each field.

split_any is a more niche version so I don't really know what the most common use case for it would be over regular split, but either way I believe the behavior should be consistent.

@JalonSolov
Copy link
Contributor

JalonSolov commented May 31, 2024

The difference is in the description of split_any...

Split a string using the chars in the delimiter string as delimiters chars.

Meaning, you can include several different delimiters, and it will split on all of them.

println('12;34:56^78,'.split_any(';:^,'))

output is

['12', '34', '56', '78']

Obviously contrived, but suppose you had a timestamp in a log file, and wanted to split the pieces... with split_any you could split them all in one call, instead of separate calls with each different separator, and trying to figure it out from there...

println('[2024-05-31_19:39:23] This is the log message.'.split_any('[-_:]'))

output is

['', '2024', '05', '31', '19', '39', '23', ' This is the log message.']

... and in this case, I'd have to agree... the '' as the first entry looks like a bug to me.

@juan-db
Copy link
Contributor Author

juan-db commented Jun 4, 2024

I'm not sure how to proceed in this case because the suggested behavior is contrary to all my experiences with other languages and isn't consistent with the regular split behavior where it overlaps in concept.

In rough pseudocode, this seems like the correct simplified example to me.

output := []
buffer := ''

for c in str:
    if c == delim
        // *any* time we encounter delimiter, *split the string*
        output << buffer
        buffer = ''
    else
        buffer += c

output << buffer // make sure everything after the last delimiter is included

// for the suggested solution, instead of the above, we would have to handle
// the end of the string as a special case
/*
if buffer.len != 0 {
    output << buffer
}
*/

return output

@juan-db
Copy link
Contributor Author

juan-db commented Jun 4, 2024

I don't hold much authority here, so I think either someone needs to decide it's more important to keep the existing V behavior, or we need more input on what the expected behavior should be.

@JalonSolov
Copy link
Contributor

The expected behavior is that all backends should produce exactly the same results.

If anything is changed, then all backends should be changed so that the above statement is still true.

@spytheman
Copy link
Member

spytheman commented Jun 4, 2024

Here is a plan of action:

  1. Make the JS version match the V version, so that the existing PR can be merged, without breaking existing code
  2. Make another PR, that changes the behavior of both the pure V and the JS versions to what you think is a more useful behavior (that would be a breaking change, but it will be caught by the CI, and we can test and discuss it extensively there).

@juan-db juan-db requested a review from spytheman June 6, 2024 16:46
Copy link
Member

@spytheman spytheman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work.

@spytheman spytheman merged commit da4afef into vlang:master Jun 8, 2024
76 checks passed
raw-bin pushed a commit to raw-bin/v that referenced this pull request Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

backend js: no method string.split_any
3 participants