Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IRC to Discord: ISO 8859-1 encoded umlauts (at least ä, ö) are corrupted #237

Closed
redfellow opened this issue May 12, 2017 · 14 comments
Closed

Comments

@redfellow
Copy link

Some ISO 8859-1 output from old IRC clients gets pushed into Discord like this:

[3:44 PM] BOT IRC: <User> ny l�htis l�mmitt��n saunan

@redfellow
Copy link
Author

Any plans to do something to this? It does look quite nasty and it's difficult to get people to change their encodings as most IRC clients are capable of reading messages both in ISO 8859-1 or UTF-8 in the same channel, heuristically autodetecting which encoding is used.

"It's worked for years, you are the only one who complains."

@Throne3d
Copy link
Collaborator

I'm not particularly sure where the issue would be arising in the process but it's plausible it'd be in our IRC library, which unfortunately doesn't appear to be very well maintained and as such it may be a while before it's fixed and would be an upstream issue. It might not be that it's a problem with node-irc, but if it is, it'd go with the list of other problems associated with that library. I can't currently investigate as I'm busy but I'll try to remember to do so in about a week or two, if nobody else does first?

@Throne3d
Copy link
Collaborator

Throne3d commented Jul 1, 2017

I don't suppose you could give me a server this happens on, or a client that would produce this problem? It might help track it down. It also seems like a thing to be fixed in the upstream library, irc-upd, which I've now taken over.

@redfellow
Copy link
Author

Sure I can. I'll need to ask around a bit first, but I suspect mIRC prior to 7.0 will work right out the bat. I'll update with a reproductible testcase asap.

@Throne3d
Copy link
Collaborator

It looks like the upstream library actually converts these (and displays them) just fine if you add "encoding": "utf-8" to the ircOptions object (and have iconv and node-icu-charset-detector installed). At least, it did when I tried it and mIRC with Unicode mode disabled (and the characters appeared corrupted when I removed the encoding option from the config).

Could you try that and get back?

@redfellow
Copy link
Author

redfellow commented Aug 13, 2017

Ubuntu Server 14.04:

sudo apt-get -y install libicu-dev
npm install -g iconv
npm install -g node-icu-charset-detector
npm install

Bot is running with utf-8 enabled under ircOptions. Will report if I still see errors.

@Merri
Copy link

Merri commented Aug 14, 2017

Fixes the issue. Should be the default setting. (Edit: and I'm tired atm so I go with few words.)

@redfellow
Copy link
Author

Fixed.

The three affected users my channel had are no longer pushing � characters into Discord.

@Throne3d
Copy link
Collaborator

If it's the default setting it seems likely to cause some problems with how often it fails to install. I'll suggest a note in the README, instead, then users can set it up themselves?

(libicu-dev fixes it on Ubuntu, and Arch Linux needs icu, but I had trouble getting it on Windows last I tried. It seems like a bad idea to have it be default on platforms when they need to install specific packages first else it throws errors.)

@Merri
Copy link

Merri commented Aug 14, 2017

The problem hits on all non-English speaking channels so for them there is no option: they must have icu anyway. Thus instead I'd work on README (or add a separate documentation) so that it has good instructions on how to check if icu is installed and how to install it if it is missing. Also, the software could check itself whether icu exists and show a message with link to documentation, and set the default encoding value to false when icu is missing.

@Throne3d
Copy link
Collaborator

It only affects those that don't use Unicode encoding, so shouldn't (I think) affect eg Korean speakers.

I was going to add a section to the README to check out the irc-upd documentation on how to install, then add a section there about how to install icu on whatever platform you're using (or point to another piece of documentation that explains that). Checking for the existence of icu could also be done, in which case it should probably be done in discord-irc since it has the specific need to convert to UTF-8 for Discord (whereas the IRC library does not necessarily need to do that).

@Merri
Copy link

Merri commented Aug 14, 2017

Yes, icu check should be on discord-irc's side. However irc-upd could expose a helper function that does the check (so that other implementations could easily use it).

@Merri
Copy link

Merri commented Aug 14, 2017

While slightly offtopic, here is an example on how much friction there still is to move everything to UTF-8 on IRC: irssi/irssi#671

I was slightly surprised to see that there really is a Japanese server that serves everything in ISO-2022-JP, including channels and nicknames.

Throne3d added a commit to Throne3d/discord-irc that referenced this issue Aug 14, 2017
@Throne3d
Copy link
Collaborator

Throne3d commented Aug 14, 2017

I've started doing a thing here and here which should allow for this.

(It bugs me that more things don't default to UTF-8. Oh well. :P)

Edit: Merged into irc-upd master, yet to push a version or PR here.

Throne3d added a commit to Throne3d/discord-irc that referenced this issue Aug 21, 2017
Throne3d added a commit to Throne3d/discord-irc that referenced this issue Sep 4, 2017
Throne3d added a commit to Throne3d/discord-irc that referenced this issue Sep 28, 2017
Throne3d added a commit to Throne3d/discord-irc that referenced this issue Sep 30, 2017
ekmartin pushed a commit that referenced this issue Sep 30, 2017
* Add a note about how to install optional charset converter dependencies

Fixes part of #252 and #237.

* Default ircOptions.encoding to utf-8 if node-irc can convert encodings

Also warn when started if the IRC library cannot convert between
encodings, in case users rely on this behavior.

* Improve config check and warning message around encodings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants