Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hex2bytes for Vector{UInt8} #23267

Merged
merged 18 commits into from
Aug 22, 2017
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions base/exports.jl
Original file line number Diff line number Diff line change
Expand Up @@ -762,6 +762,7 @@ export
graphemes,
hex,
hex2bytes,
hex2bytes!,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will need to be added to the stdlib doc index somewhere to show up in the rendered docs https://github.com/JuliaLang/julia/blob/master/CONTRIBUTING.md#adding-a-new-docstring-to-base

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

ind2chr,
info,
is_assigned_char,
Expand Down
94 changes: 93 additions & 1 deletion base/strings/util.jl
Original file line number Diff line number Diff line change
Expand Up @@ -463,12 +463,104 @@ function hex2bytes(s::AbstractString)
return a
end

"""
hex2bytes(s::Vector{UInt8})

Convert the hexadecimal bytes array to its binary representation. Returns an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns -> return

`Array{UInt8,1}`, i.e. an array of bytes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps use Vector{UInt8} to match the above signature?

"""
@inline function hex2bytes(s::Vector{UInt8})
d = Vector{UInt8}(div(length(s), 2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably error here if the length is not even?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed as the underlying method will throw exception. Per @StefanKarpinski all the interfaces are changed to AbstractVector. Many of comments may not be relevant in that case.

hex2bytes!(d, s)
return d
end

"""
hex2bytes!(d::Vector{UInt8}, s::Vector{UInt8}, n::Int=length(s))

Convert the first `n` hexadecimal bytes array to its binary representation. The
results are populated into a destination array. The function returns the number of bytes
copied into the destination array. The size of destination array must be at least half
of the `n` parameter.

# Examples
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```jldoctest

julia> s = UInt8["01abEF"...]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no indentation here and below

6-element Array{UInt8,1}:
0x30
0x31
0x61
0x62
0x45
0x46

julia> d =zeros(UInt8, 3)
3-element Array{UInt8,1}:
0x00
0x00
0x00

julia> hex2bytes!(d, s, 6)
3

julia> d
3-element Array{UInt8,1}:
0x01
0xab
0xef
```
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use and example here:


# Examples
```jldoctest
julia> hex2bytes(...)
end
```

Copy link
Member

@stevengj stevengj Aug 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hex2bytes now has an example, moved from hex2bytes!.

Typically, when we have both an in-place and out-of-place version of a function, we only give examples in the simpler out-of-place version.

function hex2bytes!(d::Vector{UInt8}, s::Vector{UInt8}, n::Int=length(s))
if isodd(n)
throw(ArgumentError("Input data length should be even"))
end

if n > length(s)
throw(ArgumentError("Input data length should not exceed array length"))
end

len2 = div(n, 2)

if length(d) < len2
throw(ArgumentError("Destination data buffer should be sufficiently large"))
end

i = j = 0
# This line is important as this ensures computation happens on word boundary and
# not byte boundary. Byte boundary computation can be almost 10 times slower.
n = c1 = c2 = UInt(0)
for j = 1:len2
num = 0
@inbounds c1 = UInt(s[i+=1])
@inbounds c2 = UInt(s[i+=1])
num = number_from_hex(c1)
num <<= 4
num += number_from_hex(c2)
@inbounds d[j] = (num & 0xFF)
end
return j
end

@inline function number_from_hex(c::UInt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this UInt and not UInt8?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it seems like this function should return a UInt8 as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-word boundary computation can be 10 times slower or that's what was observed.

DIGIT_ZERO = UInt('0')
DIGIT_NINE = UInt('9')
LATIN_UPPER_A = UInt('A')
LATIN_UPPER_F = UInt('F')
LATIN_A = UInt('a')
LATIN_F = UInt('f')
Copy link
Member

@stevengj stevengj Aug 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think defining these constants adds much to the clarity (on the contrary, it makes the code longer and hence harder to read); it would be more compact and at least as readable if you used UInt8('0') etcetera directly in the code.

Copy link
Contributor Author

@sambitdash sambitdash Aug 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is again an issue of style and personal choice. Some people find balancing parenthesis an unnecessary overhead in code. Secondly, it gives the flexibility of changing the datatypes for someone at one place than carrying out the changes in code blocks. I could change all datatypes to UInt8 very easily if I have code written this way.


return (DIGIT_ZERO <= c <= DIGIT_NINE) ? c - DIGIT_ZERO :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 nested tertiaries is a bit hard to reason about. Perhaps something like

DIGIT_ZERO    <= c <= DIGIT_NINE    && return c - DIGIT_ZERO
LATIN_UPPER_A <= c <= LATIN_UPPER_F && return c - LATIN_UPPER_A + 10
LATIN_A       <= c <= LATIN_F       && return c - LATIN_A + 10
throw(ArgumentError("not a hexadecimal number: '$(Char(c))'"))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a matter of personal preference and choice. A few lines above in hex2bytes(AbstractVector) has 3 nested tertiaries. Hence ignoring this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a matter of personal preference and choice. A few lines above in hex2bytes(AbstractVector) has 3 nested tertiaries. Hence ignoring this comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can fix it up at some other point, doesn't need to block this PR.

(LATIN_UPPER_A <= c <= LATIN_UPPER_F) ? c - LATIN_UPPER_A + 10 :
(LATIN_A <= c <= LATIN_F) ? c - LATIN_A + 10 :
throw(ArgumentError("Not a hexadecimal number"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps "$c is not a hexadecimal number"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

end

"""
bytes2hex(bin_arr::Array{UInt8, 1}) -> String

Convert an array of bytes to its hexadecimal representation.
All characters are in lower-case.

it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

# Examples
```jldoctest
julia> a = hex(12345)
Expand Down
21 changes: 21 additions & 0 deletions test/strings/util.jl
Original file line number Diff line number Diff line change
Expand Up @@ -247,3 +247,24 @@ bin_val = hex2bytes("07bf")

#non-hex characters
@test_throws ArgumentError hex2bytes("0123456789abcdefABCDEFGH")

function test_23161()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps rewrite as a @testset?

arr = UInt8["0123456789abcdefABCDEF"...]
arr1 = Vector{UInt8}(11)
@test hex2bytes!(arr1, arr) == 11
@test "0123456789abcdefabcdef" == bytes2hex(arr1)
@test hex2bytes("0123456789abcdefABCDEF") == hex2bytes(arr)
@test hex2bytes!(arr1, UInt8[""...]) == 0
@test hex2bytes(UInt8[""...]) == UInt8[]

# odd size
@test_throws ArgumentError hex2bytes(UInt8["0123456789abcdefABCDEF0"...])

# Input array size smaller than the number of bytes to be converted.
@test_throws ArgumentError hex2bytes!(arr1, arr, 24)

#non-hex characters
@test_throws ArgumentError hex2bytes(UInt8["0123456789abcdefABCDEFGH"...])
end

test_23161()