Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make CharsetDetector adhere to WHAT-NG recommendation #47

Closed
kngenie opened this issue Nov 17, 2014 · 3 comments
Closed

Make CharsetDetector adhere to WHAT-NG recommendation #47

kngenie opened this issue Nov 17, 2014 · 3 comments

Comments

@kngenie
Copy link
Member

kngenie commented Nov 17, 2014

CharsetDetector fails to detect correct character encoding when META tag says charset=UTF-16 but it is in fact in UTF-8. It is because CharsetDetector puts higher priority on META tag over charset detected from content. Reimplement CharsetDetector in reference to WHAT-NG recommendation http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#encoding-sniffing-algorithm

Known internally as ARI-3933.

@kngenie
Copy link
Member Author

kngenie commented Nov 17, 2014

In commit deed711 CharsetDetector was refactored and two steps in WHAT-NG recommendation missing were added (overriding META tag and use of BOM). Bug fixes in deed711, additional unit tests added in 9524744. I suggest RotatingCharsetDetector should now be deprecated.

@kngenie
Copy link
Member Author

kngenie commented Nov 19, 2014

Maintenance branch charset-detector

@kngenie
Copy link
Member Author

kngenie commented Jan 19, 2015

Merged to iipc/master in 1dbd4f3.

@kngenie kngenie closed this as completed Jan 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant