-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 validation #322
Comments
Thinking about this more, I think it may make sense to have a return value that carries more information. Taking inspriation from Rust's (typename $utf8_error
(record
(field $valid_up_to $size)
(field $incomplete bool)
(field $error_len $size)
)
)
...
(@interface func (export "validate")
(param $bytes (list u8))
(result $error (expected (error $utf8_error)))
)
|
Hi, I'm a little bit confused by this. Why would a language's standard library call this instead of using the UTF-8 validation function they already have, and use for all other targets? |
@jedisct1 Because of the way that the web works, file size is very important, and so relying on the browser's functionality means smaller file sizes. |
I was mocking up a witx description for #322 and had a use for an `option` type. Of course, I could also just emit a variant manually, or reuse `expected` by omitting the error type, but `option` better conveys the sense that the `none` case doesn't necessarily represent an error.
UTF-8 is a very popular string encoding, for example, it's the encoding used by over 95% of all Web content. It's not uncommon for applications to need to do UTF-8 validation on their own strings, and since all WebAssembly VMs have UTF-8 validation logic built in as required by the spec, we should define a WASI API to let applications call into the VM's UTF-8 validation logic rather than having to bundle their own.
I'm picturing an API which takes a byte slice as input and returns a boolean value indicating whether it's valid or not. This is the minimum that WebAssembly engines themselves are required to have, and would be enough for eg. the use case of implementing a UTF-8 validity check for a a WASI API implemented in wasm.
More elaborate APIs are possible, such as validation which returns the position where an error occurred, and possibly information about the error, but I think it makes sense to start with something simple. I won't have time to make an official proposal myself for a while, but I wanted to file this issue to see what others think!
The text was updated successfully, but these errors were encountered: