Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas hdf5 file can not be read. #37

Closed
kbruegge opened this issue Nov 6, 2017 · 7 comments
Closed

pandas hdf5 file can not be read. #37

kbruegge opened this issue Nov 6, 2017 · 7 comments
Labels

Comments

@kbruegge
Copy link
Member

kbruegge commented Nov 6, 2017

When calling

	klaas_apply_cuts sst_selection.yml train.csv some_training_subset.hdf --hdf-style pandas -k events

then trying to train a model on that new file using

klaas_train_energy_regressor config_regressor.yaml some_training_subset.hdf predictions_sst.hdf model_sst.pkl

yields

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/fact/io.py", line 204, in read_data
    df = pd.read_hdf(file_path, key=key, columns=columns, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/pytables.py", line 372, in read_hdf
    return store.select(key, auto_close=auto_close, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/pytables.py", line 742, in select
    return it.get_result()
  File "/usr/local/lib/python3.6/site-packages/pandas/io/pytables.py", line 1449, in get_result
    results = self.func(self.start, self.stop, where)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/pytables.py", line 735, in func
    columns=columns, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/pytables.py", line 2871, in read
    kwargs = self.validate_read(kwargs)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/pytables.py", line 2405, in validate_read
    raise TypeError("cannot pass a column specification when reading "
TypeError: cannot pass a column specification when reading a Fixed format store. this store must be selected in its entirety

So are pandas hdfs still supported or not?
And do we event want to support these?

@kbruegge kbruegge added the bug label Nov 6, 2017
@maxnoe
Copy link
Member

maxnoe commented Nov 6, 2017

Pandas hdf5 needs to be stored using format=tables for column selection to work. But maybe we should catch that exception and try again without Columns.

@kbruegge
Copy link
Member Author

kbruegge commented Nov 6, 2017

should this be the default maybe? any disadvantages?

@maxnoe
Copy link
Member

maxnoe commented Nov 6, 2017

From the docs:

The examples above show storing using put, which write the HDF5 to PyTables in a fixed array format, called the fixed format. These types of stores are not appendable once written (though you can simply remove them and rewrite). Nor are they queryable; they must be retrieved in their entirety. They also do not support dataframes with non-unique column names. The fixed format stores offer very fast writing and slightly faster reading than table stores. This format is specified by default when using put or to_hdf or by format='fixed' or format='f'

@kbruegge
Copy link
Member Author

kbruegge commented Nov 7, 2017

So? Do we keep supporting pandas HDF or not?

@kbruegge
Copy link
Member Author

kbruegge commented Nov 7, 2017

So I think that maybe we shouldn't. We have a definition of how the hdf5 files should look for our event representation. No need to support pandas hdf5 in my opinion..

@maxnoe
Copy link
Member

maxnoe commented Nov 7, 2017

I tend to agree

@maxnoe
Copy link
Member

maxnoe commented Jun 22, 2018

We decided to not support this anymore

@maxnoe maxnoe closed this as completed Jun 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants