Replies: 4 comments 3 replies
-
Is the schema the same amongst the different files? |
Beta Was this translation helpful? Give feedback.
-
Just to add a bit of info on how the arrow files were generated. I made an elasticsearch query and stored the results as a pandas dataframe (
This was repeated for different ES query time periods. This was done and completed first using a separate script, before I tried to open the arrow files to store as a vaex dataframe. |
Beta Was this translation helpful? Give feedback.
-
Strange, could you make a reproducible issue, like generate some data and export that, and see how long that takes for you, so we can try the same? |
Beta Was this translation helpful? Give feedback.
-
Thanks for the suggestion on another thread to try exporting the arrow file to hdf5. I tried that, and I can now open the file in less than 300ms, and the memory usage seems to be minimal too. I'll convert all my arrow files to hdf5 then. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I have multiple
.arrow
files, each about 1GB (total filesize is larger than my RAM). I tried to open all of them usingvaex.open_many()
to read them into a single dataframe, and saw that the memory usage was increasing, and it was taking longer than I expected.So I tried just opening one file using the code below.
What I noticed was it takes about 4-5 seconds to open the file, and the free memory (as indicated by the
free
column returned by the commandfree -h
) kept decreasing until it was ~1GB lesser.I thought that when opening the arrow files, vaex would use memory-mapping and thus, won't actually use up so much memory, and it would also be faster. Is my understanding correct, or am I doing something wrong?
ETA: Based on the documentation, I thought the file would open instantly. If I time the cell using
%time
, it does return in microseconds, but the cell continues to run for a few seconds, as shown by%%time
.Beta Was this translation helpful? Give feedback.
All reactions