-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial support for H5Dchunk_iter #1031
Conversation
This looks promising. julia> struct ChunkInfo{N}
offset::NTuple{N, Int}
filter_mask::UInt32
addr::UInt64
size::UInt64
end
julia> Base.show(io::IO, ::MIME"text/plain", ci::ChunkInfo) = print(io, @sprintf("%10s", ci.offset), "\t", ci.filter_mask, "\t", ci.addr, "\t", ci.size)
julia> h5open("simplechunked2d.h5","w", libver_bounds=v"1.8", meta_block_size=4096) do h5f
d = create_dataset(h5f, "test", UInt8, (256,256), alloc_time=:early, chunk=(16,16))
for (i, ci) in enumerate(CartesianIndices((16, 16)))
# Number the chunks
d[(ci[1]-1)*16 .+ (1:16), (ci[2]-1)*16 .+ (1:16)] = ones(UInt8, 16, 16) * (i-1)
end
end;
julia> out = h5open("simplechunked2d.h5") do h5f
h5d = h5f["test"]
info = ChunkInfo{2}[]
HDF5.API.h5d_chunk_iter(h5d,0) do offset, filter_mask, addr, size
push!(info, ChunkInfo{2}(unsafe_load(Ptr{Tuple{Int, Int}}(offset)), filter_mask, addr, size))
return nothing
end
return info
end
256-element Vector{ChunkInfo{2}}:
(0, 0) 0 4096 256
(0, 16) 0 4352 256
(0, 32) 0 4608 256
(0, 48) 0 4864 256
(0, 64) 0 5120 256
(0, 80) 0 5376 256
(0, 96) 0 5632 256
(0, 112) 0 5888 256
(0, 128) 0 6144 256
(0, 144) 0 6400 256
(0, 160) 0 6656 256
(0, 176) 0 6912 256
(0, 192) 0 7168 256
(0, 208) 0 7424 256
(0, 224) 0 7680 256
(0, 240) 0 7936 256
(16, 0) 0 8192 256
(16, 16) 0 8448 256
(16, 32) 0 8704 256
(16, 48) 0 8960 256
(16, 64) 0 9216 256
(16, 80) 0 9472 256
(16, 96) 0 9728 256
(16, 112) 0 9984 256
⋮
(224, 128) 0 76912 256
(224, 144) 0 77168 256
(224, 160) 0 77424 256
(224, 176) 0 77680 256
(224, 192) 0 82032 256
(224, 208) 0 82288 256
(224, 224) 0 82544 256
(224, 240) 0 82800 256
(240, 0) 0 83056 256
(240, 16) 0 83312 256
(240, 32) 0 83568 256
(240, 48) 0 83824 256
(240, 64) 0 84080 256
(240, 80) 0 84336 256
(240, 96) 0 84592 256
(240, 112) 0 84848 256
(240, 128) 0 85104 256
(240, 144) 0 85360 256
(240, 160) 0 85616 256
(240, 176) 0 85872 256
(240, 192) 0 86128 256
(240, 208) 0 86384 256
(240, 224) 0 86640 256
(240, 240) 0 86896 256
julia> open("simplechunked2d.h5") do f
seek(f, out[60].addr)
data = unique(read(f, 256))
println(only(data))
end
59
julia> open("simplechunked2d.h5") do f
for info in out
seek(f, info.addr)
data = unique(read(f, 256))
print(only(data), " ")
end
end
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 |
I had to work on the tests a bit to get the tests to pass.
Upstream packaging efforts seem to be proceeding for HDF5 1.14.0: |
@simonbyrne @musm could you take a look this? These changes allow tests to pass with HDF5 1.14.0, and I've implemented some basic functionality for |
src/api/helpers.jl
Outdated
isnothing(retval) && return H5_ITER_CONT | ||
return H5_iter_t(retval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the other iterators we require the functions to return ints or true
/false
. Does it make sense to allow nothing
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This iterator can return 0, 1 or -1.
https://docs.hdfgroup.org/hdf5/develop/_h5_dpublic_8h.html#a7a2008481c5cef463cbd0943a7042609
Returning nothing is quite common in Julia though. Translating nothing as continue iterating might be reasonable in other contexts as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions are a bit weird though, and it does break the convention we use elsewhere, e.g.
Lines 55 to 104 in 9a90e4e
function h5a_iterate_helper( | |
loc_id::hid_t, attr_name::Ptr{Cchar}, ainfo::Ptr{H5A_info_t}, @nospecialize(data::Any) | |
)::herr_t | |
f, err_ref = data | |
try | |
return herr_t(f(loc_id, attr_name, ainfo)) | |
catch err | |
err_ref[] = err | |
return herr_t(-1) | |
end | |
end | |
""" | |
h5a_iterate(f, loc_id, idx_type, order, idx = 0) -> hsize_t | |
Executes [`h5a_iterate`](@ref h5a_iterate(::hid_t, ::Cint, ::Cint, | |
::Ptr{hsize_t}, ::Ptr{Cvoid}, ::Ptr{Cvoid})) with the user-provided callback | |
function `f`, returning the index where iteration ends. | |
The callback function must correspond to the signature | |
``` | |
f(loc::HDF5.API.hid_t, name::Ptr{Cchar}, info::Ptr{HDF5.API.H5A_info_t}) -> Union{Bool, Integer} | |
``` | |
where a negative return value halts iteration abnormally (triggering an error), | |
a `true` or a positive value halts iteration successfully, and `false` or zero | |
continues iteration. | |
# Examples | |
```julia-repl | |
julia> HDF5.API.h5a_iterate(obj, HDF5.API.H5_INDEX_NAME, HDF5.API.H5_ITER_INC) do loc, name, info | |
println(unsafe_string(name)) | |
return false | |
end | |
``` | |
""" | |
function h5a_iterate(@nospecialize(f), obj_id, idx_type, order, idx=0) | |
err_ref = Ref{Any}(nothing) | |
idxref = Ref{hsize_t}(idx) | |
fptr = @cfunction(h5a_iterate_helper, herr_t, (hid_t, Ptr{Cchar}, Ptr{H5A_info_t}, Any)) | |
try | |
h5a_iterate(obj_id, idx_type, order, idxref, fptr, (f, err_ref)) | |
catch h5err | |
jlerr = err_ref[] | |
if !isnothing(jlerr) | |
rethrow(jlerr) | |
end | |
rethrow(h5err) | |
end | |
return idxref[] | |
end |
Since these are really internal-only functions, I think the extra verbosity is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in
494b613
src/datasets.jl
Outdated
addr::API.haddr_t | ||
size::API.hsize_t | ||
end | ||
# Base.show(io::IO, ::MIME"text/plain", ci::ChunkInfo) = print(io, @sprintf("%10s", ci.offset), "\t", ci.filter_mask, "\t", ci.addr, "\t", ci.size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment?
src/datasets.jl
Outdated
end | ||
|
||
""" | ||
HDF5.get_all_chunk_info(dataset, [dxpl]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add to docs/src/datasets.md
info, | ||
ChunkInfo{N}(unsafe_load(Ptr{NTuple{N,Int}}(offset)), filter_mask, addr, size) | ||
) | ||
return HDF5.API.H5_ITER_CONT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return HDF5.API.H5_ITER_CONT | |
return false |
I did some performance testing and the scaling is better with the
|
Yeah, I've noticed the iter funcs are much more efficient in other cases. It's a shame, since they're kind of clunky. |
Is there a way to turn these into a Julia iterator? |
Not that I can think of. |
What if we launch a task to call the HDF5 iterate function and then wait to take from a Channel? When we get back to Julia in the callback, we push the iterable information into the channel and iterate. We may need a second channel for the callback to wait on and possibly receive a signal to stop iteration if the iterable object gets garbage collected. |
@musm please review and merge when ready. |
9d95a80
to
9f3de93
Compare
I rebased this on top of #1054 |
Co-authored-by: Simon Byrne <simonbyrne@gmail.com>
c3799e5
to
c19f096
Compare
* Fix _typed_load fast path for Julia 1.10 (currently nightly) (#1075) * Fix _typed_load fast path for Julia 1.10 (currently nightly) by thresholding on `1.10.0-DEV.1390` * Use `Libc.memcpy` after that threshold * Add initial support for H5Dchunk_iter (#1031) * Add initial support for H5Dchunk_iter * Implement h5d_chunk_iter_helper * Implement HDF5.get_all_chunk_info * Make tests pass via HDF5 1.14.0 * Apply formatting * Test filters with filter_mask via H5Dchunk_iter * Require functions to return an integer * Provide index based chunk iteration, rename to HDF5.get_chunk_info_all * Fix formatting * Fix documentation * Fix documentation * Improve testing * Always define _get_chunk_info_all_by_iter for documenter * Update src/datasets.jl Co-authored-by: Simon Byrne <simonbyrne@gmail.com> * Precompile get_chunk_info_all implementations before benchmarking * Fix documentation * Fix tests * Formatting --------- Co-authored-by: Simon Byrne <simonbyrne@gmail.com> * Simplify formatter check (#1078) This will display the diff, and return a non-zero exit code if there are changes * Fix #1083 maxdims -> max_dims in dataspace documentation (#1085) * Update light and dark logos for readme (#1087) * Fixup readme logo links and readme style (#1088) * Fixup readme logo links and readme style * Tweak * Tweak logo centering * Upload curves directly instead of fonts for logo independence (#1089) * Allow create_dataset to take a Type and Dataspace, Fix #1084 (#1086) * Fix #1083 maxdims -> max_dims in dataspace documentation * Fix #1084, create_dataset with Type and Dataspace Also expand definitions for dataspace with two positional arguments. Pin Blosc_jll to 1.21.2 if AVX2 is not detected. See Blosc/c-blosc#371 * Formatter * Attempt to fix documentation (#1091) * doc typo: attributes(parent), not attribute(parent) (#1095) * Fix tests for Julia 1.3 * Fix tests for Julia 1.3, Windows SZIP is broken again * Use static if block to fix 1.3 * Fix version and formatting * Fix tests * Bump version to 0.16.16 * Formatting --------- Co-authored-by: Simon Byrne <simonbyrne@gmail.com> Co-authored-by: Mustafa M <mus-m@outlook.com> Co-authored-by: Steven G. Johnson <stevenj@mit.edu>
This adds support for
H5Dchunk_iter
which allows for efficient access to chunk information. This is particularly useful when trying to obtain all the chunk information for a dataset. It scales significantly better than trying to query the chunk information by index.Also this adds basic HDF5 1.14 support. The documentation is also improved along the way.