Skip to content

Commit

Permalink
Set activeCodePage to UTF-8
Browse files Browse the repository at this point in the history
Consider this program:

  #include <cstdio>

  int main(void) {
    const char *filename = u8"ディセント3.txt";
    auto fp = std::fopen(filename, "r");
    if (fp) {
      std::fclose(fp);
      return 0;
    } else {
      return 1;
    };
  }

If a file named ディセント3.txt exists, then will that program
successfully open it? The answer is: it depends.

filename is going to point to these bytes:

  Raw bytes: e3 83 87 e3 82 a3 e3 82 bb e3 83 b3 e3 83 88 33 2e 74 78 74 00
  Characters: ディセント3.txt␀

Internally, Windows uses UTF-16. When you call fopen(), Windows will
convert the filename parameter into UTF-16 [1]. If the program is run
with a UTF-8 Windows code page, then the above bytes will be correctly
interpreted as UTF-8 when being converted into UTF-16 [2]. The final
UTF-16 string will be this*:

  Raw bytes: ff fe c7 30 a3 30 bb 30 f3 30 c8 30 33 00 2e 00 74 00 78 00 74 00
  Characters: ディセント3.txt

On the other hand, if the program is run with code page 932, then the
original bytes will be incorrectly interpreted as code page 932 when
being converted into UTF-16. The final UTF-16 string will be this*:

  Raw bytes: ff fe 5d 7e fd ff 67 7e 63 ff 67 7e 7b ff 5d 7e 73 ff 5d 7e fd ff 33 00 2e 00 74 00 78 00 74 00
  Characters: 繝�繧」繧サ繝ウ繝�3.txt

In other words, if that program gets compiled on Windows with a UTF-8
execution character set, then it needs to be run with a UTF-8 Windows
code page. Otherwise, mojibake might happen.

*Unlike the first string, this one does not have a null terminator. This
is because the Windows kernel doesn’t use null terminated strings for
paths [3][4].

---

Before this commit, Descent 3 would pass UTF-8 to fopen(), even if
Descent 3 is run with a non-UTF-8 Windows code page [5]. This commit
makes sure that Descent 3 gets run with a UTF-8 Windows code page.

The Windows code page isn’t just used by fopen(). It also gets used by
many other functions in the Windows API [6]. I don’t know if Descent 3
uses any of those other functions, but if it does, then this commit will
also help make sure that those functions receive strings with the
correct character encoding. Descent 3 uses UTF-8 for strings by default
[7]. Making sure that Descent 3 uses UTF-8 everywhere will make
encoding-related mistakes less likely in the future.

Fixes DescentDevelopers#483.

[1]: <https://stackoverflow.com/a/7950569/7593853>
[2]: <https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=msvc-170#remarks>
[3]: <https://stackoverflow.com/a/52372115/7593853>
[4]: <https://googleprojectzero.blogspot.com/2016/02/the-definitive-guide-on-win32-to-nt.html>
[5]: <DescentDevelopers#475 (comment)>
[6]: <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page#-a-vs--w-apis>
[7]: adf58ec (Explicitly declare execution character set, 2024-07-07)
  • Loading branch information
Jayman2000 committed Jul 17, 2024
1 parent adf58ec commit 5a19942
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 1 deletion.
9 changes: 8 additions & 1 deletion Descent3/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,13 @@ set(CPPS
if(WIN32)
set(PLATFORM_LIBS wsock32.lib winmm.lib)
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} /SAFESEH:NO /NODEFAULTLIB:LIBC")
set(MANIFEST ${CMAKE_CURRENT_BINARY_DIR}/Descent3.exe.manifest)
configure_file(
${CMAKE_CURRENT_SOURCE_DIR}/Descent3.exe.manifest.in
${MANIFEST}
@ONLY
NEWLINE_STYLE WIN32
)
endif()

if(UNIX AND NOT APPLE)
Expand All @@ -287,7 +294,7 @@ endif()

file(GLOB_RECURSE INCS "../lib/*.h")

add_executable(Descent3 WIN32 ${HEADERS} ${CPPS} ${INCS})
add_executable(Descent3 WIN32 ${HEADERS} ${CPPS} ${INCS} ${MANIFEST})
target_link_libraries(Descent3 PRIVATE
2dlib AudioEncode bitmap cfile czip d3music dd_video ddebug ddio libmve libacm
fix grtext manage mem misc model module movie stream_audio linux SDL2::SDL2
Expand Down
9 changes: 9 additions & 0 deletions Descent3/Descent3.exe.manifest.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
<assemblyIdentity type="win32" name="DescentDevelopers.Descent3.engine" version="@PROJECT_VERSION_MAJOR@.@PROJECT_VERSION_MINOR@.@PROJECT_VERSION_PATCH@.0" />
<application>
<windowsSettings>
<activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
</windowsSettings>
</application>
</assembly>

0 comments on commit 5a19942

Please sign in to comment.