-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPE-126 Migrate zerion_{chain}.trades
tables
#4437
Conversation
alias = alias('trades') | ||
tags=['dunesql'] | ||
,schema = 'zerion_arbitrum' | ||
,alias = alias('trades') | ||
,partition_by = ['block_date'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can actually drop partitioning on all these tables because they are too small to benefit from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sg, I will remove the partitioning now.
Btw, any colour on best practices for partitioning? It would be cool to have some ballpark numbers for:
- minimum amount of entries where a dataset would benefit from partitioning
- some range or sweetspot numbers for partition sizes
All I know is that block_month
has proven to be a lot better for partitioning than block_date
in some places apparently, but the size of those partitions will vary greatly based on the dataset. If it's a gas-level spell (lots of entries), maybe weekly is better for example 🤷♂️
Without much knowledge on how it looks in the backend, it's hard to assess when it's useful/less
Same thing applies for incremental_predicates
if you have any tips there on when it should/shoudn't be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a great question. block_month
is generally preferable to block_date
but in most cases you could probably start without it all together. Some examples:
- Big tables:
fees_traces
or solana tables ->block_date
(these tend to be > 1TB) - Medium tables: dex.trades ->
block_month
(>50GB) - Small tables: this one -> no need for partitioning
We should use incremental predicates for big and some medium tables but smaller ones shouldn't really need it.
It can also be a bit difficult for you to know how big a table is since we don't really have this data exposed anywhere but trying to give some intuition about this above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ok that's the rough idea I had in mind, still very much a guessing game though.
Isn't there a way to have some hardcoded rules for those to be handled automatically by spellbook based on the number of rows in a table (or whatever other criteria)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not at the moment, I don't think it's easy to dynamically decide on partitioning since sometimes we need to add an extra column for it to work for example.
@@ -1,6 +1,7 @@ | |||
{{ config( | |||
alias = alias('trades') | |||
,partition_by = ['block_date'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove partitioning here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm just saw Alan's comment
zerion_{chain}.trades
tableszerion_{chain}.trades
tables
Thank you for contributing to Spellbook!
Please refer to the top of the
readme
in the root of Spellbook to learn how to contribute to Spellbook on DuneSQL.