Consider handling UTF-16 files correctly #6123

preetpalS · 2022-10-17T03:19:50Z

preetpalS
Oct 17, 2022

Consider the repository: https://github.com/preetpalS/UTF-16-Code-Text

It has two identical C# source files, one encoded in UTF-8 and the other in UTF-16LE (Little-Endian).

The UTF-16LE file has its language mis-classified as Smalltalk but this can be fixed using .gitattributes so this has a workaround.

The other issue is that the code for UTF-16 encoded files is weighted differently (presumably since the encoding requires double the bytes to represent the same information). 66.8% is the weighting for the code in the file encoded in UTF-16LE and 33.2% is the weighting for the code in the file encoded in UTF-8.

If UTF-16 files were handled correctly, the code weighting would be correct for these files, and they would not be mis-classified.

preetpalS · 2022-10-17T06:39:01Z

preetpalS
Oct 17, 2022
Author

Feel free to close this discussion as you can use working-tree-encoding setting in .gitattributes to have files encoded in UTF-16LE in your working tree but stored in UTF-8 in your git repository.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider handling UTF-16 files correctly #6123

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Consider handling UTF-16 files correctly #6123

preetpalS Oct 17, 2022

Replies: 1 comment

preetpalS Oct 17, 2022 Author

preetpalS
Oct 17, 2022

preetpalS
Oct 17, 2022
Author