-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hex2bytes for Vector{UInt8} #23267
hex2bytes for Vector{UInt8} #23267
Changes from 7 commits
82e337d
1d1c0f2
2165807
3db1d84
3312ac0
cae96cd
b8a48af
7d6dd61
ce6abc9
1fc8eb4
c60e85d
4c2f17a
d6f2582
5c9fbd5
6f5fe83
8cd51b5
71c7d60
f09607a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -762,6 +762,7 @@ export | |
graphemes, | ||
hex, | ||
hex2bytes, | ||
hex2bytes!, | ||
ind2chr, | ||
info, | ||
is_assigned_char, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -463,12 +463,88 @@ function hex2bytes(s::AbstractString) | |
return a | ||
end | ||
|
||
""" | ||
hex2bytes(s::AbstractVector{UInt8}) | ||
|
||
Convert the hexadecimal bytes array to its binary representation. Returns | ||
`Vector{UInt8}`, i.e. a vector of bytes. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is pretty unclear on what a "hexadecimal bytes vector" and "its binary representation" is. I would say something like:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. change incorporated. |
||
""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could use and example here:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Typically, when we have both an in-place and out-of-place version of a function, we only give examples in the simpler out-of-place version. |
||
@inline function hex2bytes(s::AbstractVector{UInt8}) | ||
d = zeros(UInt8, div(endof(s), 2)) | ||
return hex2bytes!(d, s) | ||
end | ||
|
||
""" | ||
hex2bytes!(d::AbstractVector{UInt8}, s::AbstractVector{UInt8}) | ||
|
||
Convert the hexadecimal bytes vector to its binary representation. The results are | ||
populated into a destination vector. The function returns the destination vector. | ||
|
||
# Examples | ||
```jldoctest | ||
julia> s = UInt8["01abEF"...] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The slightly more verbose way would be |
||
6-element Array{UInt8,1}: | ||
0x30 | ||
0x31 | ||
0x61 | ||
0x62 | ||
0x45 | ||
0x46 | ||
|
||
julia> d =zeros(UInt8, 3) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. missing space There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed in last check-in. |
||
3-element Array{UInt8,1}: | ||
0x00 | ||
0x00 | ||
0x00 | ||
|
||
julia> hex2bytes!(d, s) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This returns There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Example updated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
||
3-element Array{UInt8,1}: | ||
0x01 | ||
0xab | ||
0xef | ||
``` | ||
""" | ||
function hex2bytes!(d::AbstractVector{UInt8}, s::AbstractVector{UInt8}) | ||
i, j = start(s), 0 | ||
# This line is important as this ensures computation happens in word boundary and not | ||
# byte boundary. Boundary computation can be almost 10 times slower | ||
n::UInt = 0 | ||
c1::UInt = 0 | ||
c2::UInt = 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do you need explicit type declarations here? Why do you need to initialize There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because we want to force computation to word boundary. |
||
while !done(s, i) | ||
n = 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why initialize |
||
c1, i = next(s, i) | ||
done(s, i) && throw(ArgumentError( | ||
"string length must be even: length($(repr(s))) == $(length(s))")) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can this check be done on entry? Should probably not print the string, enough with the length. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This check on beginning means you have a potential case of multiple passes of the buffer as there is no clarity in a AbstractVector if the size is pre-computed. And depend on the implementation details. Is there a significant benefit in precheck? Secondly the functionality cannot be achieved without a full scan due to the nature of the alogorithm used. Hence, my personal preference will be react only on failure and not check specifically for input. Since, these are low level functions my normal approach will be to normalize or sanitize the data such that array length is made even by adding an extra zero. And bound the data btw "0x0-0xf" by applying proper filters than exception on failure. But that's a separate discussion. |
||
c2, i = next(s, i) | ||
n = number_from_hex(c1) | ||
n <<= 4 | ||
n += number_from_hex(c2) | ||
d[j+=1] = (n & 0xFF) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like it would be clearer to ditch the
I don't see why the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The whole purpose of introducing UInt8 method was not to deteriorate performance significantly over the hex2bytes(s::AbstractString) method. And in the end it beats its performance by 10-15% at least. I guess before doing any changes to the code in terms of style and coding practices performance benchmarking be carried out. At this point I will not be able to invest further effort on this. But some of these typically compilers automatically identify and optimize but did not seem to be byte boundary computation was one of them. The ideal code for this method should be written by taking 4 bytes at a time (sanitize the input rather than error check the input) and optimize the code for SIMD as this is supposed to be a very low level function for IO operations. Hence, ignoring this comment unless there is a supporting benchmark data. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @stevengj ideally your code and my code should generate the same code with most modern compilers. With the code I have written (n & 0xff) was a guidance to compiler not to create an |
||
end | ||
resize!(d, j) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This won't work with arbitrary There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. dependency on length removed. The array should not be printed as it an be large. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. julia> x = Vector{UInt8}(10); resize!(view(x, 1:10), 8)
ERROR: MethodError: no method matching resize!(::SubArray{UInt8,1,Array{UInt8,1},Tuple{UnitRange{Int64}},true}, ::Int64)
Closest candidates are:
resize!(::Array{T,1} where T, ::Integer) at array.jl:1020
resize!(::BitArray{1}, ::Integer) at bitarray.jl:836 |
||
return d | ||
end | ||
|
||
@inline function number_from_hex(c::UInt) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And it seems like this function should return a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Non-word boundary computation can be 10 times slower or that's what was observed. |
||
DIGIT_ZERO = UInt('0') | ||
DIGIT_NINE = UInt('9') | ||
LATIN_UPPER_A = UInt('A') | ||
LATIN_UPPER_F = UInt('F') | ||
LATIN_A = UInt('a') | ||
LATIN_F = UInt('f') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think defining these constants adds much to the clarity (on the contrary, it makes the code longer and hence harder to read); it would be more compact and at least as readable if you used There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is again an issue of style and personal choice. Some people find balancing parenthesis an unnecessary overhead in code. Secondly, it gives the flexibility of changing the datatypes for someone at one place than carrying out the changes in code blocks. I could change all datatypes to UInt8 very easily if I have code written this way. |
||
|
||
return (DIGIT_ZERO <= c <= DIGIT_NINE) ? c - DIGIT_ZERO : | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 3 nested tertiaries is a bit hard to reason about. Perhaps something like DIGIT_ZERO <= c <= DIGIT_NINE && return c - DIGIT_ZERO
LATIN_UPPER_A <= c <= LATIN_UPPER_F && return c - LATIN_UPPER_A + 10
LATIN_A <= c <= LATIN_F && return c - LATIN_A + 10
throw(ArgumentError("not a hexadecimal number: '$(Char(c))'")) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a matter of personal preference and choice. A few lines above in hex2bytes(AbstractVector) has 3 nested tertiaries. Hence ignoring this comment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a matter of personal preference and choice. A few lines above in hex2bytes(AbstractVector) has 3 nested tertiaries. Hence ignoring this comment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, we can fix it up at some other point, doesn't need to block this PR. |
||
(LATIN_UPPER_A <= c <= LATIN_UPPER_F) ? c - LATIN_UPPER_A + 10 : | ||
(LATIN_A <= c <= LATIN_F) ? c - LATIN_A + 10 : | ||
throw(ArgumentError("Not a hexadecimal number")) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. perhaps There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
end | ||
|
||
""" | ||
bytes2hex(bin_arr::Array{UInt8, 1}) -> String | ||
|
||
Convert an array of bytes to its hexadecimal representation. | ||
All characters are in lower-case. | ||
|
||
# Examples | ||
```jldoctest | ||
julia> a = hex(12345) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will need to be added to the stdlib doc index somewhere to show up in the rendered docs https://github.com/JuliaLang/julia/blob/master/CONTRIBUTING.md#adding-a-new-docstring-to-base
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.