-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bytes vs. characters and "cookie charset" #15
Comments
I don't think this conclusion is warranted. I would instead say:
|
That would be ideal for newly-built sites, but will cause unrecoverable data corruption if an existing site is migrating to the new API, especially if they still need to use |
Also, I agree that USVString is the right script-level interface to the feature; by "raw UTF-8" I meant that the USVString-using interface should use UTF-8 encoding when serializing/deserializing cookies so that it is compatible on the server side with the existing behavior of most modern browsers. |
It would mean that existing sites cannot use the new API if they have stored non-ASCII data with the old API, it's true. But given that it's impossible to store such data in a portable way today, that seems fine. |
Right, but it also breaks interoperation between document.cookie/meta h-e=s-c and the async API on the same site. There would be no way to round-trip data in IE between the two interfaces. |
That's fair. But the correct way to fix that is to make those features interoperable, instead of adding a new set of APIs that extend that non-interoperability and continue to behave differently in all browsers. |
Agreed. I guess a possibly-better resolution would be for |
I agree with @domenic. Fix the core problem, don't paper over it with additional APIs. |
@adrianba @aliams Any idea how best to reach cross-browser interoperability on cookie charset? It looks like other than IE/Edge, most modern browsers use UTF-8 for this; see https://inikulin.github.io/cookie-compat/#CHARSET0001 (and nearby cases) for data and whatwg/html#804 for context |
Also, Safari seems to truncate at the first non-ASCII byte! The explainer currently mandates UTF-8 interpretation for bytes for predictable interoperation and affordable internationalization (in terms of byte count in the cookie jar and in terms of complexity.) I'd love to hear your thoughts on it, and would be happy to address any outstanding issues (perhaps in pending pull request #17 ?) |
Closing this issue for now, but I'm happy to reopen this discussion if browser implementations are not ready for consistent UTF-8 handling for cookies or would like to have a more detailed discussion of how to change this behavior without breaking apps. |
Most modern browsers assume UTF-8 when exposing cookie data to scripts and
<meta http-equiv=set-cookie ... >
, but IE and Edge use the system locale's "ANSI" codepage for this instead (using silent lossy conversion on write), causing a lack of interoperability in practice. The cookie jar itself seems to be byte-oriented and eight-bit-clean in all modern browsers. In practice, using URL-encoding or Base64 armoring is possible but adds a lot of overhead (encodeURIComponent
andescape
inflate characters up to 3x, base64 1.5x), decrease readability and debuggability (often the data is user-entered and users can use browser cookie jar inspectors to look at it), and (in the case of base64) don't have a built-in codec in IE. Length inflation also runs up against cookie length and cookie jar per-domain size caps.As a result, sites storing non-ASCII data (often user input) in cookies either need to deal with some degree of cross-browser incompatibility or need to use an ugly and inefficient workaround. On the server-side, guessing based on User-Agent sniffing combined with approximation based on IP geolocation, Accept-Language analysis, and/or script-provided IE-specific
navigator.systemLanguage
is the best hope for portably encoding/decoding cookies which will be shared with scripts and/or set in HTML.Given all this, I think it would be nice to have the new async cookies API allow easy use of raw UTF-8 in all browsers but also provide a way to read and write cookies in the browser's default "cookie charset" as well as raw bytes.
The text was updated successfully, but these errors were encountered: