Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HDF5 and NWB get_frames behavior #174

Merged
merged 19 commits into from
Feb 1, 2024
Merged

Conversation

bendichter
Copy link
Contributor

@bendichter bendichter commented Jul 8, 2022

  • fix get_frames. It was previously assuming contiguous frames
  • improve get_video. It was previously reading frame-by-frame, which is not optimal for HDF5
    fix hdf5 dataset reader #173

* fix get_frames. It was previously assuming contiguous frames
* improve get_video. It was previously reading frame-by-frame, which is not optimal for HDF5
@bendichter bendichter requested a review from h-mayorquin July 8, 2022 13:53
Copy link
Collaborator

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I just checked this and I am thinking on why the tests fail. I am playing with the test data to understand the errors. The following gives an error and it should not:

from pathlib import Path
from roiextractors import Hdf5ImagingExtractor

file_path = Path("/home/heberto/ophys_testing_data/imaging_datasets/hdf5/demoMovie.hdf5")

extractor = Hdf5ImagingExtractor(file_path=file_path)
extractor._video[[0, 1], :, :, 0]  # Note this is what get frames does

Note that the extractor is of type DatasetViewh5py.

The error is:

TypeError                                 Traceback (most recent call last)
/home/heberto/development/roiextractors/dev_notebook.ipynb Cell 5 in <cell line: 7>()
      [4](vscode-notebook-cell:/home/heberto/development/roiextractors/dev_notebook.ipynb#ch0000117?line=3) file_path = Path("/home/heberto/ophys_testing_data/imaging_datasets/hdf5/demoMovie.hdf5")
      [6](vscode-notebook-cell:/home/heberto/development/roiextractors/dev_notebook.ipynb#ch0000117?line=5) extractor = Hdf5ImagingExtractor(file_path=file_path)
----> [7](vscode-notebook-cell:/home/heberto/development/roiextractors/dev_notebook.ipynb#ch0000117?line=6) extractor._video[[0, 1], :, :, 0]

File ~/miniconda3/envs/roi_extractors_env/lib/python3.9/site-packages/lazy_ops/lazy_loading.py:174, in DatasetView.__getitem__(self, new_slice)
    172     self._lazy_slice_call = False
    173     return DatasetView(self.dataset, (key_reinit, self._int_index), self.axis_order)
--> 174 return DatasetView(self.dataset, (key_reinit, self._int_index), self.axis_order).dsetread()

File ~/miniconda3/envs/roi_extractors_env/lib/python3.9/site-packages/lazy_ops/lazy_loading.py:208, in DatasetView.dsetread(self)
    205 reversed_axis_order_read = sorted(range(len(self.axis_order)), key=lambda i: self.axis_order[i])
    206 axis_order_read = sorted(range(len(self.axis_order)), key=lambda i: reversed_axis_order_read[i])
--> 208 return self.dataset[reversed_slice_key].transpose(axis_order_read)

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File ~/miniconda3/envs/roi_extractors_env/lib/python3.9/site-packages/h5py/_hl/dataset.py:783, in Dataset.__getitem__(self, args, new_dtype)
    778     return arr
    780 # === Everything else ===================
    781 
...
File h5py/_selector.pyx:272, in h5py._selector.Selector.make_selection()

File h5py/_selector.pyx:213, in h5py._selector.Selector.apply_args()

TypeError: not all arguments converted during string formatting

But this works just fine with slicing in h5py as the following returns the expected result.

extractor._file[extractor._mov_field][0, [0, 1], :, :]  # This is imitating what get frames does before remapping axis.

This does not produce an error, so maybe there is an error/bug in lazy ops? what do you think @bendichter?

@h-mayorquin
Copy link
Collaborator

As an aside, when loaded with h5py the testing data type is characters (u2):

from pathlib import Path
import h5py

file_path = Path("/home/heberto/ophys_testing_data/imaging_datasets/hdf5/demoMovie.hdf5")

hdf5_file = h5py.File(file_path, "r")
hdf5_file["mov"]

Outputs:

<HDF5 dataset "mov": shape (1, 2000, 60, 80), type ">u2">

Which I think might be related to the error above. Something about implicit conversion of types that h5py handles and lazy ops does not? Not really sure.

@h-mayorquin h-mayorquin mentioned this pull request Jul 19, 2022
@pauladkisson
Copy link
Member

Checking in on this old PR. Since get_frames is defined in ImagingExtractor, we should be able to just not defined it for hdf5 as long as we define the get_video function appropriately. Then this extractor will inherit the correct get_frames implementation.

@pauladkisson
Copy link
Member

Checking in on this old PR. Since get_frames is defined in ImagingExtractor, we should be able to just not defined it for hdf5 as long as we define the get_video function appropriately. Then this extractor will inherit the correct get_frames implementation.

On second thought, trying to use the base imaging extractor implementation runs into problems because the old extractors squeeze and some of the new ones do not. So, for this PR I just fixed hdf5's version.

@CodyCBakerPhD
Copy link
Member

TODO: add non-contiguous slice test for HDF5 just to make sure the current fix works as expected, then should be good to go

@CodyCBakerPhD CodyCBakerPhD changed the title for HDF5: Fix HDF5 get_frames behavior Feb 1, 2024
@CodyCBakerPhD
Copy link
Member

Then a follow-up will fix the heterogeneity in squeeze behavior to favor singletons and remove duck typing for index selection

Copy link

codecov bot commented Feb 1, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (c5dedb2) 79.17% compared to head (f929120) 79.38%.
Report is 5 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #174      +/-   ##
==========================================
+ Coverage   79.17%   79.38%   +0.20%     
==========================================
  Files          39       39              
  Lines        3054     3070      +16     
==========================================
+ Hits         2418     2437      +19     
+ Misses        636      633       -3     
Flag Coverage Δ
unittests 79.38% <96.96%> (+0.20%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/roiextractors/__init__.py 72.72% <ø> (ø)
src/roiextractors/example_datasets/__init__.py 100.00% <ø> (ø)
src/roiextractors/example_datasets/toy_example.py 6.66% <ø> (ø)
src/roiextractors/extraction_tools.py 61.81% <ø> (ø)
src/roiextractors/extractorlist.py 100.00% <ø> (ø)
src/roiextractors/extractors/caiman/__init__.py 100.00% <ø> (ø)
...s/extractors/caiman/caimansegmentationextractor.py 84.31% <ø> (ø)
...actors/extractors/hdf5imagingextractor/__init__.py 100.00% <ø> (ø)
...ctors/hdf5imagingextractor/hdf5imagingextractor.py 84.88% <100.00%> (+2.53%) ⬆️
...extractors/extractors/memmapextractors/__init__.py 100.00% <ø> (ø)
... and 27 more

@pauladkisson
Copy link
Member

NWBImagingExtractor had the same problem as hdf5, but it is fixed now, and tested so this should be good to merge.

@pauladkisson pauladkisson changed the title Fix HDF5 get_frames behavior Fix HDF5 and NWB get_frames behavior Feb 1, 2024
@CodyCBakerPhD CodyCBakerPhD merged commit 4c75116 into main Feb 1, 2024
15 checks passed
@CodyCBakerPhD CodyCBakerPhD deleted the fix_hdf5_get_frames branch February 1, 2024 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

hdf5 dataset reader
4 participants