-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IDNA support and integrate with DNS lookup #2543
Merged
Merged
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
d8073f9
Add punycode support and integrate with DNS lookup
makenowjust e7a17cb
Add punycode support and integrate with DNS lookup
makenowjust 473ff65
Add top level documentation
epergo 96ca486
Remove custom Exception class and use ArgumentError/Exception
epergo 64a7fda
Refactor complex ternary operator into if/else
epergo 86ac0d7
Move punycode to uri namespace
epergo 19ab143
Remove char array version `encode` method
makenowjust 52dcf74
Replace `split('.').each do` to `split('.') do`
makenowjust 2c03615
Refactor complex if/else into case
makenowjust 856b094
Mark constants as 'private'
makenowjust 66c3b7d
Use string.each_char with block instead of iterator
makenowjust 3864216
Refactor using rpartition instead of rsplit
makenowjust 8ed0777
Refactor about variable 'h'
makenowjust File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
require "spec" | ||
require "uri/punycode" | ||
|
||
describe URI::Punycode do | ||
[ | ||
{"3年B組金八先生", "3B-ww4c5e180e575a65lsy2b"}, | ||
{"安室奈美恵-with-SUPER-MONKEYS", "-with-SUPER-MONKEYS-pc58ag80a8qai00g7n9n"}, | ||
{"Hello-Another-Way-それぞれの場所", "Hello-Another-Way--fc4qua05auwb3674vfr0b"}, | ||
{"ひとつ屋根の下2", "2-u9tlzr9756bt3uc0v"}, | ||
{"MajiでKoiする5秒前", "MajiKoi5-783gue6qz075azm5e"}, | ||
{"パフィーdeルンバ", "de-jg4avhby1noc0d"}, | ||
{"そのスピードで", "d9juau41awczczp"}, | ||
{"Hello-Another-Way-それぞれ", "Hello-Another-Way--fc4qua97gba"}, | ||
].each do |example| | ||
dec, enc = example | ||
|
||
it "encodes #{dec} to #{enc}" do | ||
URI::Punycode.encode(dec).should eq enc | ||
end | ||
|
||
it "decodes #{enc} to #{dec}" do | ||
URI::Punycode.decode(enc).should eq dec | ||
end | ||
end | ||
|
||
it "translate to ascii only host name" do | ||
URI::Punycode.to_ascii("test.テスト.テスト").should eq "test.xn--zckzah.xn--zckzah" | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,173 @@ | ||
# `Punycode` provides an interface for IDNA encoding (RFC 5980), | ||
# which is defined in RFC 3493 | ||
# | ||
# Implementation based on Mathias Bynens `punnycode.js` project | ||
# https://github.com/bestiejs/punycode.js/ | ||
# | ||
# RFC 3492: | ||
# Method to use non-ascii characters as host name of URI | ||
# https://www.ietf.org/rfc/rfc3492.txt | ||
# | ||
# RFC 5980: | ||
# Internationalized Domain Names in Application | ||
# https://www.ietf.org/rfc/rfc5980.txt | ||
class URI | ||
class Punycode | ||
private BASE = 36 | ||
private TMIN = 1 | ||
private TMAX = 26 | ||
private SKEW = 38 | ||
private DAMP = 700 | ||
private INITIAL_BIAS = 72 | ||
private INITIAL_N = 128 | ||
|
||
private DELIMITER = '-' | ||
|
||
private BASE36 = "abcdefghijklmnopqrstuvwxyz0123456789" | ||
|
||
private def self.adapt(delta, numpoints, firsttime) | ||
delta /= firsttime ? DAMP : 2 | ||
delta += delta / numpoints | ||
k = 0 | ||
while delta > ((BASE - TMIN) * TMAX) / 2 | ||
delta /= BASE - TMIN | ||
k += BASE | ||
end | ||
k + (((BASE - TMIN + 1) * delta) / (delta + SKEW)) | ||
end | ||
|
||
def self.encode(string) | ||
String.build { |io| encode string, io } | ||
end | ||
|
||
def self.encode(string, io) | ||
others = [] of Char | ||
|
||
string.each_char do |c| | ||
if c < '\u0080' | ||
io << c | ||
else | ||
others.push c | ||
end | ||
end | ||
|
||
return if others.empty? | ||
others.sort! | ||
|
||
h = string.size - others.size + 1 | ||
delta = 0_u32 | ||
n = INITIAL_N | ||
bias = INITIAL_BIAS | ||
firsttime = true | ||
prev = nil | ||
|
||
io << DELIMITER if h > 1 | ||
|
||
others.each do |m| | ||
next if m == prev | ||
prev = m | ||
|
||
raise Exception.new("Overflow: input needs wider integers to process") if m.ord - n > (Int32::MAX - delta) / h | ||
delta += (m.ord - n) * h | ||
n = m.ord + 1 | ||
|
||
string.each_char do |c| | ||
if c < m | ||
raise Exception.new("Overflow: input needs wider integers to process") if delta > Int32::MAX - 1 | ||
delta += 1 | ||
elsif c == m | ||
q = delta | ||
k = BASE | ||
loop do | ||
t = k <= bias ? TMIN : k >= bias + TMAX ? TMAX : k - bias | ||
break if q < t | ||
io << BASE36[t + ((q - t) % (BASE - t))] | ||
q = (q - t) / (BASE - t) | ||
k += BASE | ||
end | ||
io << BASE36[q] | ||
|
||
bias = adapt delta, h, firsttime | ||
delta = 0 | ||
h += 1 | ||
firsttime = false | ||
end | ||
end | ||
delta += 1 | ||
end | ||
end | ||
|
||
def self.decode(string) | ||
output, _, rest = string.rpartition(DELIMITER) | ||
output = output.chars | ||
|
||
n = INITIAL_N | ||
bias = INITIAL_BIAS | ||
i = 0 | ||
init = true | ||
w = oldi = k = 0 | ||
|
||
rest.each_char do |c| | ||
if init | ||
w = 1 | ||
oldi = i | ||
k = BASE | ||
init = false | ||
end | ||
|
||
digit = case c | ||
when .ascii_lowercase? | ||
c.ord - 0x61 | ||
when .ascii_uppercase? | ||
c.ord - 0x41 | ||
when .ascii_number? | ||
c.ord - 0x30 + 26 | ||
else | ||
raise ArgumentError.new("Invalid input") | ||
end | ||
|
||
i += digit * w | ||
t = k <= bias ? TMIN : k >= bias + TMAX ? TMAX : k - bias | ||
|
||
unless digit < t | ||
w *= BASE - t | ||
k += BASE | ||
else | ||
outsize = output.size + 1 | ||
bias = adapt i - oldi, outsize, oldi == 0 | ||
n += i / outsize | ||
i %= outsize | ||
output.insert i, n.chr | ||
i += 1 | ||
init = true | ||
end | ||
end | ||
|
||
raise ArgumentError.new("Invalid input") unless init | ||
|
||
output.join | ||
end | ||
|
||
def self.to_ascii(string) | ||
return string if string.ascii_only? | ||
|
||
String.build do |io| | ||
first = true | ||
string.split('.') do |part| | ||
unless first | ||
io << "." | ||
end | ||
|
||
if part.ascii_only? | ||
io << part | ||
else | ||
io << "xn--" | ||
encode part, io | ||
end | ||
|
||
first = false | ||
end | ||
end | ||
end | ||
end | ||
end |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor nit bit: methods receiving an optional
IO
should have it prepended as first argument. This makes it easier if maybe additional arguments are added later to keep both signatures similar. Even if that's unlikely I'd recommend to stick with this strategy.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
URI.encode
is also acceptString
as first argument andIO
as second argument. I think no problem.