Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Text.Ascii.Trim/TrimStart/TrimEnd methods include '\0' character/byte in trimming #104201

Closed
assumenothing opened this issue Jun 29, 2024 · 2 comments · Fixed by #105350
Closed

Comments

@assumenothing
Copy link

assumenothing commented Jun 29, 2024

Description

When using the publicly accessible System.Text.Ascii.Trim method (including TrimStart and TrimEnd), characters/bytes with a value of zero (character literal of '\0') will also be trimmed, even though it is not normally considered a white space character.

Reproduction Steps

string testString = "\0string\0";
Range trimRange = System.Text.Ascii.Trim(testString);
Console.WriteLine($"Trim Range = {trimRange}"); // results in the range [1..7], which trims the \0 chars

Expected behavior

It is expected that a string starting or ending with '\0' characters should not be trimmed (to match behavior of other similar APIs like String.Trim and System.MemoryExtensions.Trim).

Actual behavior

The resulting range returned includes '\0' characters to be trimmed.

This is likely due to a mistake in the implementation, which assumes that element values that are less than or equal to 0x20 are not going to become negative when subtracting by one. Or it was derived from an algorithm that was originally designed for a C-like language (which assumes that strings are always terminated with '\0' and will never appear within).

// Problem is that these statements result in identical values when
// used with the implementation's white space TrimMask test:
Console.WriteLine($"1U << (0x00 - 1) = 0x{1U << (0x00 - 1):x}"); // NUL ASCII code
Console.WriteLine($"1U << (0x20 - 1) = 0x{1U << (0x20 - 1):x}"); // Space ASCII code

Regression?

No response

Known Workarounds

Avoid using the Ascii.Trim, Ascii.TrimStart, and Ascii.TrimEnd methods if '\0' characters should not be trimmed.

Configuration

.NET 8.0.6 x64 on WIndows 10 (Console Application)

Other information

The most obvious solution here with the smallest impact is to simply add documentation indicating that the character/byte value zero will be included in the trimming. Otherwise the fix would involve making a breaking change to the API.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jun 29, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jun 29, 2024
@stephentoub
Copy link
Member

cc: @adamsitnik

@stephentoub stephentoub added area-System.Text.Encoding bug and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jun 30, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

@stephentoub stephentoub added this to the 9.0.0 milestone Jun 30, 2024
@tarekgh tarekgh removed the untriaged New issue has not been triaged by the area owner label Jul 1, 2024
@stephentoub stephentoub self-assigned this Jul 23, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Aug 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants