-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
“basename” does not support gbk encoding #48648
Labels
bug
Indicates an unexpected problem or unintended behavior
filesystem
Underlying file system and functions that use it
Comments
Probably caused by #45126, since |
We may need to ensure utf8 is disabled for path regexes (or all ascii-only regexes which don't use character classes?) |
fixed by #45127 |
vtjnash
added a commit
that referenced
this issue
Feb 15, 2023
Previously, we might try to interpret the random bytes in a path as UTF-8 and excluding \n, causing the regex match to fail or be incomplete in some cases. But those are valid in a path, so we want PCRE2 to treat them as transparent bytes. Accordingly, change r""a to specify all flags needed to interpret the values simply as ASCII. Note, this would be breaking if someone was previously trying to match a Unicode character by `\u` while also disabling UCP matching of \w and \s, but that seems an odd specific choice to need. julia> match(r"\u03b1"a, "α") ERROR: PCRE compilation error: character code point value in \u.... sequence is too large at offset 6 (this would have previously worked). Note that explicitly starting the regex with (*UTF) or using a literal α in the regex would continue to work as before however. Note that `s` (DOTALL) is a more efficient matcher (if the pattern contains `.`), as is `a`, so it is often preferable to set both when in doubt: http://man.he.net/man3/pcre2perform Refs: #48648
vtjnash
added a commit
that referenced
this issue
Feb 16, 2023
Previously, we might try to interpret the random bytes in a path as UTF-8 and excluding \n, causing the regex match to fail or be incomplete in some cases. But those are valid in a path, so we want PCRE2 to treat them as transparent bytes. Accordingly, change r""a to specify all flags needed to interpret the values simply as ASCII. Note, this would be breaking if someone was previously trying to match a Unicode character by `\u` while also disabling UCP matching of \w and \s, but that seems an odd specific choice to need. julia> match(r"\u03b1"a, "α") ERROR: PCRE compilation error: character code point value in \u.... sequence is too large at offset 6 (this would have previously worked). Note that explicitly starting the regex with (*UTF) or using a literal α in the regex would continue to work as before however. Note that `s` (DOTALL) is a more efficient matcher (if the pattern contains `.`), as is `a`, so it is often preferable to set both when in doubt: http://man.he.net/man3/pcre2perform Refs: #48648
vtjnash
added a commit
that referenced
this issue
Feb 17, 2023
Previously, we might try to interpret the random bytes in a path as UTF-8 and excluding \n, causing the regex match to fail or be incomplete in some cases. But those are valid in a path, so we want PCRE2 to treat them as transparent bytes. Accordingly, change r""a to specify all flags needed to interpret the values simply as ASCII. Note, this would be breaking if someone was previously trying to match a Unicode character by `\u` while also disabling UCP matching of \w and \s, but that seems an odd specific choice to need. julia> match(r"\u03b1"a, "α") ERROR: PCRE compilation error: character code point value in \u.... sequence is too large at offset 6 (this would have previously worked). Note that explicitly starting the regex with (*UTF) or using a literal α in the regex would continue to work as before however. Note that `s` (DOTALL) is a more efficient matcher (if the pattern contains `.`), as is `a`, so it is often preferable to set both when in doubt: http://man.he.net/man3/pcre2perform Refs: #48648
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Indicates an unexpected problem or unintended behavior
filesystem
Underlying file system and functions that use it
Julia Version 1.8.5
Commit 17cfb8e (2023-01-08 06:45 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 8 × Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, haswell)
Threads: 1 on 8 virtual cores
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS =
code
`path = "d:\\AppData\\Local\\Programs\\Julia-1.8.5\\bin\\libLLVM-13jl.dll"
basename(path) # "libLLVM-13jl.dll"
path = String(Base.libllvm_path()) # "d:\\Users\\\xc5\xf4\xb3\xcc\\AppData\\Local\\Programs\\Julia-1.8.5\\bin\\libLLVM-13jl.dll"
basename(path) # ERROR: type Nothing has no field captures
`
Base.libllvm_path()
GBK: "d:\\Users\\鹏程\\AppData\\Local\\Programs\\Julia-1.8.5\\bin\\libLLVM-13jl.dll"
The text was updated successfully, but these errors were encountered: