Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable reference to the encodings without a full library #64

Open
lifthrasiir opened this issue Nov 22, 2014 · 1 comment
Open

Stable reference to the encodings without a full library #64

lifthrasiir opened this issue Nov 22, 2014 · 1 comment

Comments

@lifthrasiir
Copy link
Owner

Rust-encoding is (and will continue to be) big, and some library may want to provide a reference to the encoding but not the direct transcode facililty. emk/rust-uchardet#1 is one example, but there might be some more (for example, quoted-printable decoder may want not to transcode itself but may want to signal the encoding anyway).

A possible solution is to have a small crate (encoding-label would be fine) that has an enum representing all available encodings and a function to convert a label to that enum. encoding_from_whatwg_label will then depend on that function. Since Cargo does not allow multiple versions of the same crate linked together, this is fine; a function in the main crate that converts the enum to the actual EncodingRef cannot fail/panic.

I still want to explore some other possibilities and actual use cases. This is a good-to-have feature but not a blocker.

@emk
Copy link

emk commented Nov 22, 2014

Ah, nice idea. A standalone enum with encoding labels sounds like a great idea. It wouldn't hurt to include some functions to look up encodings by name, if the list isn't too huge.

I'm probably going to create a uchardet-encoding library with full support for encoding, including a function which just turns an arbitrary byte buffer into a String. One of my use cases for this is substudy, which wants to read in arbitrary subtitle *.srt files, and automagically normalize everything to UTF-8 without bugging the user. This kind of transparent encoding conversion can be really handy sometimes, especially when combining legacy data from multilingual sources. In practice, there's a surprisingly wide range of encodings where automatic encoding detection will do the Right Thing without bugging the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants