Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 validation #322

Open
sunfishcode opened this issue Oct 2, 2020 · 3 comments
Open

UTF-8 validation #322

sunfishcode opened this issue Oct 2, 2020 · 3 comments
Labels
feature-request Requests for new WASI APIs

Comments

@sunfishcode
Copy link
Member

UTF-8 is a very popular string encoding, for example, it's the encoding used by over 95% of all Web content. It's not uncommon for applications to need to do UTF-8 validation on their own strings, and since all WebAssembly VMs have UTF-8 validation logic built in as required by the spec, we should define a WASI API to let applications call into the VM's UTF-8 validation logic rather than having to bundle their own.

I'm picturing an API which takes a byte slice as input and returns a boolean value indicating whether it's valid or not. This is the minimum that WebAssembly engines themselves are required to have, and would be enough for eg. the use case of implementing a UTF-8 validity check for a a WASI API implemented in wasm.

More elaborate APIs are possible, such as validation which returns the position where an error occurred, and possibly information about the error, but I think it makes sense to start with something simple. I won't have time to make an official proposal myself for a while, but I wanted to file this issue to see what others think!

@sunfishcode
Copy link
Member Author

Thinking about this more, I think it may make sense to have a return value that carries more information.

Taking inspriation from Rust's str::from_utf8 and Utf8Error, I'm picturing an API like this:

(typename $utf8_error
    (record
        (field $valid_up_to $size)
        (field $incomplete bool)
        (field $error_len $size)
    )
)

...

(@interface func (export "validate")
    (param $bytes (list u8))
    (result $error (expected (error $utf8_error)))
)

@sunfishcode sunfishcode added the feature-request Requests for new WASI APIs label Mar 27, 2021
@jedisct1
Copy link
Member

Hi,

I'm a little bit confused by this.

Why would a language's standard library call this instead of using the UTF-8 validation function they already have, and use for all other targets?

@Pauan
Copy link

Pauan commented Mar 27, 2021

@jedisct1 Because of the way that the web works, file size is very important, and so relying on the browser's functionality means smaller file sizes.

sunfishcode added a commit that referenced this issue Mar 29, 2021
I was mocking up a witx description for #322 and had a use for an
`option` type. Of course, I could also just emit a variant manually,
or reuse `expected` by omitting the error type, but `option` better
conveys the sense that the `none` case doesn't necessarily represent
an error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Requests for new WASI APIs
Projects
None yet
Development

No branches or pull requests

3 participants