Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use scipy.sparse for histogram storage. #294

Merged
merged 2 commits into from
May 9, 2024
Merged

Conversation

delucchi-cmu
Copy link
Contributor

@delucchi-cmu delucchi-cmu commented May 8, 2024

Closes #154 using scipy.sparse for representation of healpix histogram.

Using the dense np array on-disk, the intermediate histograms for TIC are 8.5 G. Using sparse array, it's 48 M (around 200X improvement in disk usage). There's no discernible performance penalty.

Copy link

codecov bot commented May 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.52%. Comparing base (7a5c17c) to head (ad989ba).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #294      +/-   ##
==========================================
+ Coverage   99.36%   99.52%   +0.16%     
==========================================
  Files          24       25       +1     
  Lines        1260     1273      +13     
==========================================
+ Hits         1252     1267      +15     
+ Misses          8        6       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@delucchi-cmu delucchi-cmu requested a review from jeremykubica May 8, 2024 18:49
Copy link
Contributor

@jeremykubica jeremykubica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I added a bunch of optional comments and thoughts, but nothing blocking.

src/hipscat_import/catalog/resume_plan.py Outdated Show resolved Hide resolved
@delucchi-cmu delucchi-cmu merged commit e6a1ade into main May 9, 2024
9 checks passed
@delucchi-cmu delucchi-cmu deleted the issue/154/sparse branch May 9, 2024 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sparse representation of intermediate histograms on catalog import
2 participants