-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hex2bytes for Vector{UInt8} #23267
hex2bytes for Vector{UInt8} #23267
Changes from 3 commits
82e337d
1d1c0f2
2165807
3db1d84
3312ac0
cae96cd
b8a48af
7d6dd61
ce6abc9
1fc8eb4
c60e85d
4c2f17a
d6f2582
5c9fbd5
6f5fe83
8cd51b5
71c7d60
f09607a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -762,6 +762,7 @@ export | |
graphemes, | ||
hex, | ||
hex2bytes, | ||
hex2bytes!, | ||
ind2chr, | ||
info, | ||
is_assigned_char, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -463,12 +463,104 @@ function hex2bytes(s::AbstractString) | |
return a | ||
end | ||
|
||
""" | ||
hex2bytes(s::Vector{UInt8}) | ||
|
||
Convert the hexadecimal bytes array to its binary representation. Returns an | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
`Array{UInt8,1}`, i.e. an array of bytes. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps use |
||
""" | ||
@inline function hex2bytes(s::Vector{UInt8}) | ||
d = Vector{UInt8}(div(length(s), 2)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably error here if the length is not even? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not needed as the underlying method will throw exception. Per @StefanKarpinski all the interfaces are changed to AbstractVector. Many of comments may not be relevant in that case. |
||
hex2bytes!(d, s) | ||
return d | ||
end | ||
|
||
""" | ||
hex2bytes!(d::Vector{UInt8}, s::Vector{UInt8}, n::Int=length(s)) | ||
|
||
Convert the first `n` hexadecimal bytes array to its binary representation. The | ||
results are populated into a destination array. The function returns the number of bytes | ||
copied into the destination array. The size of destination array must be at least half | ||
of the `n` parameter. | ||
|
||
# Examples | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
julia> s = UInt8["01abEF"...] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no indentation here and below |
||
6-element Array{UInt8,1}: | ||
0x30 | ||
0x31 | ||
0x61 | ||
0x62 | ||
0x45 | ||
0x46 | ||
|
||
julia> d =zeros(UInt8, 3) | ||
3-element Array{UInt8,1}: | ||
0x00 | ||
0x00 | ||
0x00 | ||
|
||
julia> hex2bytes!(d, s, 6) | ||
3 | ||
|
||
julia> d | ||
3-element Array{UInt8,1}: | ||
0x01 | ||
0xab | ||
0xef | ||
``` | ||
""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could use and example here:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Typically, when we have both an in-place and out-of-place version of a function, we only give examples in the simpler out-of-place version. |
||
function hex2bytes!(d::Vector{UInt8}, s::Vector{UInt8}, n::Int=length(s)) | ||
if isodd(n) | ||
throw(ArgumentError("Input data length should be even")) | ||
end | ||
|
||
if n > length(s) | ||
throw(ArgumentError("Input data length should not exceed array length")) | ||
end | ||
|
||
len2 = div(n, 2) | ||
|
||
if length(d) < len2 | ||
throw(ArgumentError("Destination data buffer should be sufficiently large")) | ||
end | ||
|
||
i = j = 0 | ||
# This line is important as this ensures computation happens on word boundary and | ||
# not byte boundary. Byte boundary computation can be almost 10 times slower. | ||
n = c1 = c2 = UInt(0) | ||
for j = 1:len2 | ||
num = 0 | ||
@inbounds c1 = UInt(s[i+=1]) | ||
@inbounds c2 = UInt(s[i+=1]) | ||
num = number_from_hex(c1) | ||
num <<= 4 | ||
num += number_from_hex(c2) | ||
@inbounds d[j] = (num & 0xFF) | ||
end | ||
return j | ||
end | ||
|
||
@inline function number_from_hex(c::UInt) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And it seems like this function should return a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Non-word boundary computation can be 10 times slower or that's what was observed. |
||
DIGIT_ZERO = UInt('0') | ||
DIGIT_NINE = UInt('9') | ||
LATIN_UPPER_A = UInt('A') | ||
LATIN_UPPER_F = UInt('F') | ||
LATIN_A = UInt('a') | ||
LATIN_F = UInt('f') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think defining these constants adds much to the clarity (on the contrary, it makes the code longer and hence harder to read); it would be more compact and at least as readable if you used There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is again an issue of style and personal choice. Some people find balancing parenthesis an unnecessary overhead in code. Secondly, it gives the flexibility of changing the datatypes for someone at one place than carrying out the changes in code blocks. I could change all datatypes to UInt8 very easily if I have code written this way. |
||
|
||
return (DIGIT_ZERO <= c <= DIGIT_NINE) ? c - DIGIT_ZERO : | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 3 nested tertiaries is a bit hard to reason about. Perhaps something like DIGIT_ZERO <= c <= DIGIT_NINE && return c - DIGIT_ZERO
LATIN_UPPER_A <= c <= LATIN_UPPER_F && return c - LATIN_UPPER_A + 10
LATIN_A <= c <= LATIN_F && return c - LATIN_A + 10
throw(ArgumentError("not a hexadecimal number: '$(Char(c))'")) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a matter of personal preference and choice. A few lines above in hex2bytes(AbstractVector) has 3 nested tertiaries. Hence ignoring this comment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a matter of personal preference and choice. A few lines above in hex2bytes(AbstractVector) has 3 nested tertiaries. Hence ignoring this comment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, we can fix it up at some other point, doesn't need to block this PR. |
||
(LATIN_UPPER_A <= c <= LATIN_UPPER_F) ? c - LATIN_UPPER_A + 10 : | ||
(LATIN_A <= c <= LATIN_F) ? c - LATIN_A + 10 : | ||
throw(ArgumentError("Not a hexadecimal number")) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. perhaps There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
end | ||
|
||
""" | ||
bytes2hex(bin_arr::Array{UInt8, 1}) -> String | ||
|
||
Convert an array of bytes to its hexadecimal representation. | ||
All characters are in lower-case. | ||
|
||
it | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ? |
||
# Examples | ||
```jldoctest | ||
julia> a = hex(12345) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -247,3 +247,24 @@ bin_val = hex2bytes("07bf") | |
|
||
#non-hex characters | ||
@test_throws ArgumentError hex2bytes("0123456789abcdefABCDEFGH") | ||
|
||
function test_23161() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. perhaps rewrite as a |
||
arr = UInt8["0123456789abcdefABCDEF"...] | ||
arr1 = Vector{UInt8}(11) | ||
@test hex2bytes!(arr1, arr) == 11 | ||
@test "0123456789abcdefabcdef" == bytes2hex(arr1) | ||
@test hex2bytes("0123456789abcdefABCDEF") == hex2bytes(arr) | ||
@test hex2bytes!(arr1, UInt8[""...]) == 0 | ||
@test hex2bytes(UInt8[""...]) == UInt8[] | ||
|
||
# odd size | ||
@test_throws ArgumentError hex2bytes(UInt8["0123456789abcdefABCDEF0"...]) | ||
|
||
# Input array size smaller than the number of bytes to be converted. | ||
@test_throws ArgumentError hex2bytes!(arr1, arr, 24) | ||
|
||
#non-hex characters | ||
@test_throws ArgumentError hex2bytes(UInt8["0123456789abcdefABCDEFGH"...]) | ||
end | ||
|
||
test_23161() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will need to be added to the stdlib doc index somewhere to show up in the rendered docs https://github.com/JuliaLang/julia/blob/master/CONTRIBUTING.md#adding-a-new-docstring-to-base
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.