Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dump source] query all symbols in one batch, grouped dumps by symbol #36

Merged
merged 1 commit into from
Jan 3, 2024
Merged

[dump source] query all symbols in one batch, grouped dumps by symbol #36

merged 1 commit into from
Jan 3, 2024

Conversation

dmnsn7
Copy link
Contributor

@dmnsn7 dmnsn7 commented Jan 3, 2024

What is changed and why?

  • Per profiling offline, the sql query's are time consuming. This MR makes the query's in one batch to save time.
  • DataFrame.groupby is much faster than df[df.foo == bar] for many times.

How is it tested and caveats.

  • Tested offline, the scripts ran successfully with no error, the output csv files looks good.
  • It takes ~150s to finish the script now on my machine (32g32c, 13900K).
  • The peak mem usage is ~18g by simply observing with htop.

Copy link
Owner

@chenditc chenditc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! That helps unblock the github aciton!

@chenditc chenditc merged commit 0570eb2 into chenditc:main Jan 3, 2024
@dmnsn7
Copy link
Contributor Author

dmnsn7 commented Jan 3, 2024

It all looks all good on my side now! Thank you! 🤝

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants