Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tr: Accept non UTF-8 arguments for sets #6563

Merged
merged 2 commits into from
Jul 18, 2024

Conversation

RenjiSann
Copy link
Contributor

Fixes #6343

@RenjiSann RenjiSann force-pushed the renji/tr-non-utf8 branch from a09e6f8 to 52d0431 Compare July 12, 2024 09:29
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)

@RenjiSann RenjiSann force-pushed the renji/tr-non-utf8 branch from 52d0431 to aac483e Compare July 12, 2024 09:59
@RenjiSann
Copy link
Contributor Author

I'm having an issue here.
In order to accept non UTF-8 sets, I use OsString, which at some points, I need to transform into bytes.

To do this, I can either use OsStr::as_encoded_bytes, which is cross-platform, but only available from Rust 1.74, so this would need a bump of MSRV from 1.70 to 1.74.

The other solution is to use OsStrExt::as_bytes which is available in Rust 1.70 but only for Unix and not for windows, so there wouldn't be support for windows I guess.

Is there another possible solution ? Which one shall we proceed with ? @tertsdiepraam

@RenjiSann RenjiSann changed the title tr: Accept non utf8 arguments for sets tr: Accept non UTF-8 arguments for sets Jul 12, 2024
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)

@sylvestre sylvestre requested a review from BenWiederhake July 13, 2024 08:19
@BenWiederhake
Copy link
Collaborator

That's an issue indeed.

It seems @sylvestre doesn't want to raise the MSRV, and I don't know enough on the topic to have an opinion. So let's not do that, for now.

This isn't the first time that uutils needs to convert an OsString to bytes, so I suggest taking any existing implementation (e.g. in cut: https://github.com/uutils/coreutils/pull/6037/files#diff-939c8f8ce2deb6faec07f4ce55505aec484179070d0c817da4f931a676c38cf4R355 ), moving it to some "canonical" place (perhaps src/uucore/src/lib/lib.rs?), and then just calling it. I hope this isn't too much work, and it would have the benefit of not copy-pasting the code all over the project.

So in short I'm suggesting to use OsStrExt::as_bytes, but without the code duplication.

@sylvestre
Copy link
Contributor

It seems @sylvestre doesn't want to raise the MSRV

wat ? :)

Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/timeout/timeout is no longer failing!

@RenjiSann
Copy link
Contributor Author

I moved os_string_to_bytes from cut to the uucore crate (and transformed it into os_str_as_bytes in the meantime), and used it inside of tr.

Copy link
Collaborator

@BenWiederhake BenWiederhake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good, the test is fine, lovely!

CI is red for a reason, please fix those issues first:

@RenjiSann
Copy link
Contributor Author

My bad, thanks for noticing

@RenjiSann RenjiSann force-pushed the renji/tr-non-utf8 branch 2 times, most recently from 68305d2 to 8ec48d6 Compare July 16, 2024 20:02
Copy link
Collaborator

@BenWiederhake BenWiederhake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, thank you! Remaining CI failures are known issues: #6534 #6570 (both)

I hope I'm not being too nitpicky, but can you change the first commit's subject line to follow the convention a bit closer? I suggest:

  • cut: move os_string_as_bytes utility to uucore (mention util name)
  • tr: accept non utf8 arguments for sets (lowercase first letter)

Thanks for fixing tr! :)

@RenjiSann RenjiSann force-pushed the renji/tr-non-utf8 branch from 8ec48d6 to dac38bb Compare July 17, 2024 09:54
@RenjiSann
Copy link
Contributor Author

  • cut: move os_string_as_bytes utility to uucore (mention util name)
  • tr: accept non utf8 arguments for sets (lowercase first letter)

Thanks for fixing tr! :)

Done ! 😁

@BenWiederhake BenWiederhake merged commit 9ab7fa9 into uutils:main Jul 18, 2024
65 of 68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tr: Should handle high input bytes in command-line
3 participants