“basename” does not support gbk encoding #48648

zpcpi · 2023-02-11T06:46:25Z

Julia Version 1.8.5
Commit 17cfb8e (2023-01-08 06:45 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 8 × Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, haswell)
Threads: 1 on 8 virtual cores
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS =

code
`path = "d:\\AppData\\Local\\Programs\\Julia-1.8.5\\bin\\libLLVM-13jl.dll"
basename(path) # "libLLVM-13jl.dll"

path = String(Base.libllvm_path()) # "d:\\Users\\\xc5\xf4\xb3\xcc\\AppData\\Local\\Programs\\Julia-1.8.5\\bin\\libLLVM-13jl.dll"
basename(path) # ERROR: type Nothing has no field captures
`

Base.libllvm_path()
GBK: "d:\\Users\\鹏程\\AppData\\Local\\Programs\\Julia-1.8.5\\bin\\libLLVM-13jl.dll"

The text was updated successfully, but these errors were encountered:

melonedo · 2023-02-11T06:57:47Z

Probably caused by #45126, since basename needn't support illegal encodings.

vtjnash · 2023-02-12T22:44:24Z

We may need to ensure utf8 is disabled for path regexes (or all ascii-only regexes which don't use character classes?)

vtjnash · 2023-02-14T21:47:16Z

fixed by #45127

Previously, we might try to interpret the random bytes in a path as UTF-8 and excluding \n, causing the regex match to fail or be incomplete in some cases. But those are valid in a path, so we want PCRE2 to treat them as transparent bytes. Accordingly, change r""a to specify all flags needed to interpret the values simply as ASCII. Note, this would be breaking if someone was previously trying to match a Unicode character by `\u` while also disabling UCP matching of \w and \s, but that seems an odd specific choice to need. julia> match(r"\u03b1"a, "α") ERROR: PCRE compilation error: character code point value in \u.... sequence is too large at offset 6 (this would have previously worked). Note that explicitly starting the regex with (*UTF) or using a literal α in the regex would continue to work as before however. Note that `s` (DOTALL) is a more efficient matcher (if the pattern contains `.`), as is `a`, so it is often preferable to set both when in doubt: http://man.he.net/man3/pcre2perform Refs: #48648

zpcpi mentioned this issue Feb 11, 2023

Fix Base.libllvm_path and jl_get_libllvm don't support non-ASCII characters in path on Windows (#45126) #45127

Merged

vtjnash added bug Indicates an unexpected problem or unintended behavior filesystem Underlying file system and functions that use it labels Feb 12, 2023

vtjnash closed this as completed Feb 14, 2023

vtjnash mentioned this issue Feb 15, 2023

ensure the path regexes will accept all valid paths #48686

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

“basename” does not support gbk encoding #48648

“basename” does not support gbk encoding #48648

zpcpi commented Feb 11, 2023 •

edited

Loading

melonedo commented Feb 11, 2023

vtjnash commented Feb 12, 2023

vtjnash commented Feb 14, 2023

“basename” does not support gbk encoding #48648

“basename” does not support gbk encoding #48648

Comments

zpcpi commented Feb 11, 2023 • edited Loading

melonedo commented Feb 11, 2023

vtjnash commented Feb 12, 2023

vtjnash commented Feb 14, 2023

zpcpi commented Feb 11, 2023 •

edited

Loading