Skip to content

Best way to process results from Elasticsearch #1827

Answered by yohplala
hjazz6 asked this question in Q&A
Discussion options

You must be logged in to vote

Hello, a pandas dataframe takes RAM (or numpy array, or...), a file not. Your current approach (splitting in chunk, but keeping in RAM as pandas dataframes, or numpy array, or...) does not change memory consumption over querying as a single pandas dataframe.

A vaex dataframe obtained by converting a pandas one also takes RAM (the "same" C array or equivalent is still in RAM).
A vaex dataframe obtained by reading a hdf5 or arrow file does not, as when processing the data, it will process the data in small chunks so that it does not really take RAM.
Bests

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@hjazz6
Comment options

@yohplala
Comment options

Answer selected by hjazz6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants