-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IDNA support and integrate with DNS lookup #2543
Changes from 11 commits
d8073f9
e7a17cb
473ff65
96ca486
64a7fda
86ac0d7
19ab143
52dcf74
2c03615
856b094
66c3b7d
3864216
8ed0777
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
require "spec" | ||
require "uri/punycode" | ||
|
||
describe URI::Punycode do | ||
[ | ||
{"3年B組金八先生", "3B-ww4c5e180e575a65lsy2b"}, | ||
{"安室奈美恵-with-SUPER-MONKEYS", "-with-SUPER-MONKEYS-pc58ag80a8qai00g7n9n"}, | ||
{"Hello-Another-Way-それぞれの場所", "Hello-Another-Way--fc4qua05auwb3674vfr0b"}, | ||
{"ひとつ屋根の下2", "2-u9tlzr9756bt3uc0v"}, | ||
{"MajiでKoiする5秒前", "MajiKoi5-783gue6qz075azm5e"}, | ||
{"パフィーdeルンバ", "de-jg4avhby1noc0d"}, | ||
{"そのスピードで", "d9juau41awczczp"}, | ||
{"Hello-Another-Way-それぞれ", "Hello-Another-Way--fc4qua97gba"}, | ||
].each do |example| | ||
dec, enc = example | ||
|
||
it "encodes #{dec} to #{enc}" do | ||
URI::Punycode.encode(dec).should eq enc | ||
end | ||
|
||
it "decodes #{enc} to #{dec}" do | ||
URI::Punycode.decode(enc).should eq dec | ||
end | ||
end | ||
|
||
it "translate to ascii only host name" do | ||
URI::Punycode.to_ascii("test.テスト.テスト").should eq "test.xn--zckzah.xn--zckzah" | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
# `Punycode` provides an interface for IDNA encoding (RFC 5980), | ||
# which is defined in RFC 3493 | ||
# | ||
# Implementation based on Mathias Bynens `punnycode.js` project | ||
# https://github.com/bestiejs/punycode.js/ | ||
# | ||
# RFC 3492: | ||
# Method to use non-ascii characters as host name of URI | ||
# https://www.ietf.org/rfc/rfc3492.txt | ||
# | ||
# RFC 5980: | ||
# Internationalized Domain Names in Application | ||
# https://www.ietf.org/rfc/rfc5980.txt | ||
class URI | ||
class Punycode | ||
private BASE = 36 | ||
private TMIN = 1 | ||
private TMAX = 26 | ||
private SKEW = 38 | ||
private DAMP = 700 | ||
private INITIAL_BIAS = 72 | ||
private INITIAL_N = 128 | ||
|
||
private DELIMITER = '-' | ||
|
||
private BASE36 = "abcdefghijklmnopqrstuvwxyz0123456789" | ||
|
||
private def self.adapt(delta, numpoints, firsttime) | ||
delta /= firsttime ? DAMP : 2 | ||
delta += delta / numpoints | ||
k = 0 | ||
while delta > ((BASE - TMIN) * TMAX) / 2 | ||
delta /= BASE - TMIN | ||
k += BASE | ||
end | ||
k + (((BASE - TMIN + 1) * delta) / (delta + SKEW)) | ||
end | ||
|
||
def self.encode(string) | ||
String.build { |io| encode string, io } | ||
end | ||
|
||
def self.encode(string, io) | ||
h = 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be great if these variables could be more descriptive. What does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can't describe this correctly (because the date I wrote this is 2 years ago.) Sorry 🙇 . But |
||
all = true | ||
others = [] of Char | ||
|
||
string.each_char do |c| | ||
if c < '\u0080' | ||
h += 1 | ||
io << c | ||
all = false | ||
else | ||
others.push c | ||
end | ||
end | ||
|
||
return if others.empty? | ||
others.sort! | ||
io << DELIMITER unless all | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think you need There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, you are right. |
||
|
||
delta = 0_u32 | ||
n = INITIAL_N | ||
bias = INITIAL_BIAS | ||
firsttime = true | ||
prev = nil | ||
|
||
h += 1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could probably set the value to |
||
others.each do |m| | ||
next if m == prev | ||
prev = m | ||
|
||
raise Exception.new("Overflow: input needs wider integers to process") if m.ord - n > (Int32::MAX - delta) / h | ||
delta += (m.ord - n) * h | ||
n = m.ord + 1 | ||
|
||
string.each_char do |c| | ||
if c < m | ||
raise Exception.new("Overflow: input needs wider integers to process") if delta > Int32::MAX - 1 | ||
delta += 1 | ||
elsif c == m | ||
q = delta | ||
k = BASE | ||
loop do | ||
t = k <= bias ? TMIN : k >= bias + TMAX ? TMAX : k - bias | ||
break if q < t | ||
io << BASE36[t + ((q - t) % (BASE - t))] | ||
q = (q - t) / (BASE - t) | ||
k += BASE | ||
end | ||
io << BASE36[q] | ||
|
||
bias = adapt delta, h, firsttime | ||
delta = 0 | ||
h += 1 | ||
firsttime = false | ||
end | ||
end | ||
delta += 1 | ||
end | ||
end | ||
|
||
def self.decode(string) | ||
if delim = string.rindex(DELIMITER) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could just replace this entire conditional with rest, _, output = string.rpartition(DELIMITER)
output = output.chars
# and later loop over `rest.each_char`:
rest.each_char do |c| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good, but correct code is:
|
||
output = string[0...delim].chars | ||
delim += 1 | ||
else | ||
output = [] of Char | ||
delim = 0 | ||
end | ||
|
||
n = INITIAL_N | ||
bias = INITIAL_BIAS | ||
i = 0 | ||
init = true | ||
w = oldi = k = 0 | ||
|
||
string[delim..-1].each_char do |c| | ||
if init | ||
w = 1 | ||
oldi = i | ||
k = BASE | ||
init = false | ||
end | ||
|
||
digit = case c | ||
when .ascii_lowercase? | ||
c.ord - 0x61 | ||
when .ascii_uppercase? | ||
c.ord - 0x41 | ||
when .ascii_number? | ||
c.ord - 0x30 + 26 | ||
else | ||
raise ArgumentError.new("Invalid input") | ||
end | ||
|
||
i += digit * w | ||
t = k <= bias ? TMIN : k >= bias + TMAX ? TMAX : k - bias | ||
|
||
unless digit < t | ||
w *= BASE - t | ||
k += BASE | ||
else | ||
outsize = output.size + 1 | ||
bias = adapt i - oldi, outsize, oldi == 0 | ||
n += i / outsize | ||
i %= outsize | ||
output.insert i, n.chr | ||
i += 1 | ||
init = true | ||
end | ||
end | ||
|
||
raise ArgumentError.new("Invalid input") unless init | ||
|
||
output.join | ||
end | ||
|
||
def self.to_ascii(string) | ||
return string if string.ascii_only? | ||
|
||
String.build do |io| | ||
first = true | ||
string.split('.') do |part| | ||
unless first | ||
io << "." | ||
end | ||
|
||
if part.ascii_only? | ||
io << part | ||
else | ||
io << "xn--" | ||
encode part, io | ||
end | ||
|
||
first = false | ||
end | ||
end | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor nit bit: methods receiving an optional
IO
should have it prepended as first argument. This makes it easier if maybe additional arguments are added later to keep both signatures similar. Even if that's unlikely I'd recommend to stick with this strategy.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
URI.encode
is also acceptString
as first argument andIO
as second argument. I think no problem.