Stable reference to the encodings without a full library #64

lifthrasiir · 2014-11-22T20:01:27Z

Rust-encoding is (and will continue to be) big, and some library may want to provide a reference to the encoding but not the direct transcode facililty. emk/rust-uchardet#1 is one example, but there might be some more (for example, quoted-printable decoder may want not to transcode itself but may want to signal the encoding anyway).

A possible solution is to have a small crate (encoding-label would be fine) that has an enum representing all available encodings and a function to convert a label to that enum. encoding_from_whatwg_label will then depend on that function. Since Cargo does not allow multiple versions of the same crate linked together, this is fine; a function in the main crate that converts the enum to the actual EncodingRef cannot fail/panic.

I still want to explore some other possibilities and actual use cases. This is a good-to-have feature but not a blocker.

The text was updated successfully, but these errors were encountered:

emk · 2014-11-22T20:21:03Z

Ah, nice idea. A standalone enum with encoding labels sounds like a great idea. It wouldn't hurt to include some functions to look up encodings by name, if the list isn't too huge.

I'm probably going to create a uchardet-encoding library with full support for encoding, including a function which just turns an arbitrary byte buffer into a String. One of my use cases for this is substudy, which wants to read in arbitrary subtitle *.srt files, and automagically normalize everything to UTF-8 without bugging the user. This kind of transparent encoding conversion can be really handy sometimes, especially when combining legacy data from multilingual sources. In practice, there's a surprisingly wide range of encodings where automatic encoding detection will do the Right Thing without bugging the user.

lifthrasiir added this to the 0.4 ("1.0" minus the language stability) milestone Nov 22, 2014

lifthrasiir mentioned this issue Nov 22, 2014

Should rust-uchardet depend on rust-encoding? emk/rust-uchardet#1

Closed

lifthrasiir mentioned this issue Feb 10, 2015

feat(headers): add AcceptCharset header hyperium/hyper#300

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stable reference to the encodings without a full library #64

Stable reference to the encodings without a full library #64

lifthrasiir commented Nov 22, 2014

emk commented Nov 22, 2014

Stable reference to the encodings without a full library #64

Stable reference to the encodings without a full library #64

Comments

lifthrasiir commented Nov 22, 2014

emk commented Nov 22, 2014