-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
of_utf8
: add Uchar.is_valid
to check the input
#51
of_utf8
: add Uchar.is_valid
to check the input
#51
Conversation
37b9c5d
to
124f9c5
Compare
IIUC, the problem is that the ranges Instead of converting the characters in these ranges and checking their validity, is it possible to change the ranges in the pattern match to extract the invalid characters into separate match cases? |
I think it is. Is it better than using |
We currently have four ranges ( |
I'm not sure. Having a pattern match on the range and an additional validity check feels a bit strange: I would expect to see one or the other, but why check for the range if, in the end, the range contains invalid characters? On the other hand, having a final check will be more robust and prevent similar issues in the future. Unless someone else has an opinion on this (cc @emillon), I'll let you decide what's better here |
I think that there's something that's causing confusion but is not immediately clear from reading the code: on the one hand, there are byte ranges (what we're pattern matching on, the To make an analogy, decimal IPv4 addresses look like |
So yes I think these changes are good. |
124f9c5
to
3d33071
Compare
CHANGES: * `Zed_utf8.next_error`: raise `Zed_utf8.Out_of_bounds` in case of invalid offset (@Lucccyo, ocaml-community/zed#52) * `kill_next_word` should not raise `Out_of_bound` (@Lucccyo, ocaml-community/zed#55) * `of_utf8`: add `Uchar.is_valid` to check the input (@Lucccyo, ocaml-community/zed#51)
According to the documentation (https://ocaml.org/p/zed/2.0.6/doc/Zed_string/index.html#val-of_utf8),
Zed_string.of_utf8
should not raiseStdlib.Invalid_argument
if the input is not valid.Uchar.is_valid
returnsfalse
if the value is bigger thanUchar.max
(U+10FFFF
) or belongs to the invalid Unicode range (fromU+D800
toU+DFFF
). In this case,Zed_string.of_utf8
raisesZed_utf8.Invalid (input, "not a Unicode scalar value")
.