Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving binary string in sessionStorage in IE produces and Error #11

Closed
AugustePop opened this issue Jul 17, 2017 · 10 comments
Closed

Saving binary string in sessionStorage in IE produces and Error #11

AugustePop opened this issue Jul 17, 2017 · 10 comments

Comments

@AugustePop
Copy link

AugustePop commented Jul 17, 2017

The BinaryString encoding may produce string containing 0-coded character, namely "\0". IE will throw an error when we try to save the string directly in to sessionStorage.

var str = String.fromCharCode(0);
/* Error thrown in following line */
sessionStorage.setItem('test', str);

I ran a simple test against IE sessionStorage, and find out only char with code 0 will result in an exception.

@AugustePop
Copy link
Author

I uploaded a patch creating another encoding (BinaryStringC) for IE compatibility.

This encoding is basically copied entirely from BinaryString while saving the string with code point + 1, and basing the end character on 0x8002 instead of 0x8000.

0001-add-BinaryStringC-encoding-for-IE-compatibility.patch.txt

@rotemdan
Copy link
Owner

rotemdan commented Jul 31, 2017

Hi, thanks for reporting and I apologize for the late response (I took some time to consider this). I was not aware of this issue but could verify it in IE11 (Win10).

The BinaryString encoding was initially added because a different string compression library, namely lzstring supported it. In retrospect (despite the size reduction) I don't think it has a significant practical advantage over Base64 (especially when it comes to debugging and human readability, embedding in JSON etc.) and would rather have never introduced it in the first place.

I'm not sure if adding another, slightly different variant of the encoding would be a great idea (even through an option). I would rather recommend to use Base64 instead and simply deprecate the BinaryString encoding altogether (though still support it for backwards compatibility).

(strangely, despite having relatively thorough tests, I didn't cover the possibility of that particular character. I hope today I would've designed the tests better)

@rotemdan
Copy link
Owner

rotemdan commented Jul 31, 2017

I've added deprecation warnings for the BinaryString encoding to the reference. I don't think I'd remove support for it from the code though, at least in the foreseeable future (several years or so).

I have considered to add an additional variant to overcome the issue. The problem is that due to backwards compatibility I can't make it the default behavior for BinaryString so adding something like BinaryString2 could add some confusion (to a name that was somewhat confusing in the first place - since node.js has a different encoding with a somewhat similar name).

It is still possible to publish the alternative encoding through this or a separate library. I will consider it in the future. Finding a better name would probably be a good idea. Say something like "Binary15AddOne". (where "AddOne" represents the fact that it starts with 1 instead of 0).

Another concern is the possibility of some other compatibility edge case that hasn't been found yet, and adding yet another variant seems even less likely.

@AugustePop
Copy link
Author

After further thought, this null character problem can be coped with totally in user space.

Detecting browser feature is an easy

try {
  sessionStorage.setItem('test', '\0');
} catch (e) {
  /* browser can not save null character */
}

and converting a string to and from an encoding suitable for sessionStorage can be done simply by

/* to */
str.replace(/\0/g, '\u8002');

/* from */
str.replace(/\u8002/g, '\0');

So, we don't need the new encoding at all. Maybe documenting the null problem somewhere is a better option.

Please don't deprecate BinaryString encoding. 6-bit-per-character base64 encoding has an theoretically asymptotic 0.4 efficiency compared to 15-bit-per-character BinaryString encoding. This means a 100k long BinaryString encoded string would roughly be 250k long in base64 encoding. This as a 150% increase in size. For applications that need to save the compressed string in sessionStorage/localStorage, considering the storage's low capacity at 5M, base64 encoding is very wasteful and with no or very little performance gain.

I have also tried the other library mentioned above. Compared to the other library, lzutf8 is much faster for large strings, e.g. string of length 10M, and has the ability of leveraging browser web worker to compress data in another thread. Thank you for the great work.

@rotemdan
Copy link
Owner

rotemdan commented Aug 3, 2017

I have considered this solution as well, but thought it might also be confusing and error-prone. The problem is I've used \8000 and \8001 as special markers to signify the evenness/oddness of the length of the sequence. I'm afraid people would make the mistake of using one of those characters instead of as, say, \8002 (as you suggested) as a substitute for \0000.

Another problem is that the way the library is built it contains many places that encourage the user to simply set an output/input encoding with a string descriptor like "BinaryString" without giving much thought to it.

It would be difficult to "elegantly" mark in the documentation that when using IE one should add str.replace(/\0/g, '\u8002'); after every point the library outputs a value using this encoding and str.replace(/\u8002/g, '\0') before passing it to a function expecting this encoding.

Since the central use case of the encoding is to work in LocalStorage/SessionStorage I feel that in almost every case that it is used it should make it possible to be stored in the IE browser, so in almost every place that someone uses BinaryString without the modification would be a mistake that would cause their code to be broken in IE. That's the reason I decided to deprecate it.

It is not difficult to add another encoding that would avoid the issue. The only current problem I see is naming. I need to find a better name that wouldn't sound similar to the existing one. Maybe even something like WebStorageBinaryString ("WebStorage" being the generic name for LocalStorage/SessionStorage).

@rotemdan
Copy link
Owner

rotemdan commented Aug 3, 2017

A similar issue seems to also exists in WebSQL with browsers like older Safari/Chrome:

https://github.com/nolanlawson/state-of-binary-data-in-the-browser#safariios

The funny thing I've known about this article for years. But for some reason forgot about it in this context.
(edit: now I see it was published in 2015, about a year after I initially published the library)

Also, I need a significantly better name, for many other reasons as well (including conflict with a different idea of 'binary string' - which signifies simply having an 8-bit binary value at each character).

@hunkydoryrepair
Copy link

Deprecating BinaryString makes sense if you provide a replacement that is compatible with webstorage. Would need to be valid UTF strings. UTF16 makes sense as that is how IE will store it anyway. I like the name WebStorageString or CompactString

@rotemdan
Copy link
Owner

rotemdan commented Mar 15, 2018

I had plans to add a new encoding that avoids the problematic '0' codepoint. Names like WideBinaryString, BinaryString15 or StorageBinaryString might work.

The thing is, despite appearing relatively simple, this requires touching some relatively old code, and writing some new tests, in order to support mostly legacy browsers which are becoming so outdated many casual websites won't even run on them anymore (Bootstrap 4.0 now requires FlexBox, for example, which is not supported by Android browsers older than 5.0. Even IE11 is now almost 4.5 years old.)

More generally, this library (which was initially written in 2014-2015, if I recall correctly) is becoming less and less ideal choice for compression in the browser. With WebAssembly it is now possible to port more standard algorithms like gzip, deflate, Brotli, and Zopfli, (first 3 are now supported in all major browsers for server responses), and possibly get equal or better speed, and certainly better compression ratio (Moreover, since WebAssembly compiled code doesn't use garbage collection it might get significantly better performance when compressing many short strings - which this library currently does relatively slowly).

I will consider going back to work on this feature If I find the time, as it may be useful for other compression formats. I do feel that the effort would better put to port more standardized algorithms (I am also considering to, possibly someday, rewrite lzutf8 in C/C++ and port it back through WebAssembly/EmScripten, though I'm not sure how practically useful the result would be).

@rotemdan
Copy link
Owner

rotemdan commented Mar 15, 2018

@hunkydoryrepair

I've decided to implement the secondary encoding with the name StorageBinaryString. I'm currently working it. It will be released soon, along with version 0.5.0 of the library with some updated dependencies.

@rotemdan
Copy link
Owner

rotemdan commented Mar 15, 2018

Fixed by commit a431413.

The patch also includes a specialized test to verify the modified encoding does work with Web Storage on IE (the same test fails with the old "BinaryString" encoding).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants