Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPE-200 restructuring dex.trades with a macro approach #4533

Merged
merged 142 commits into from
Nov 9, 2023

Conversation

Hosuke
Copy link
Collaborator

@Hosuke Hosuke commented Oct 5, 2023

Things to discuss:

  1. Requirement of Macros/Models: Do we need macros implemented in macros/models/_sector/dex/platforms/uniswap_v2_forked_base_trades.sql?

  2. Consistency with Proposed Ideas: I would like to confirm whether our current implementation aligns with the ideas presented in the [proposal] sector-level spells redesign #4637 and Sector spell design: how to organize the dbt lineage #4625

@jeff-dude
Copy link
Member

@Hosuke (or anyone else interested) this should be ready for review again. i'm comfortable with the current format, with a few minor caveats:

  • do we need to revisit the tests and see why counts are off? we should compare counts of these CI table build to the actual dex.trades with filters as needed
  • naming standards on macros
  • adding new columns (though i'd like to do that in a follow up PR i think)

@grkhr
Copy link
Contributor

grkhr commented Nov 7, 2023

@jeff-dude as far as I understood you are materializing 3 steps:

  1. every {project}{version}{blockchain}.stg_trades
  2. dex.stg_trades (which contains all of the tables above with duplicates_rank filter)
  3. dex.trades (_beta) (which is enriched with metadata)

Is this intended? As step 1 and step 2 are the same data which will be stored twice

@jeff-dude
Copy link
Member

jeff-dude commented Nov 7, 2023

@jeff-dude as far as I understood you are materializing 3 steps:

  1. every {project}{version}{blockchain}.stg_trades
  2. dex.stg_trades (which contains all of the tables above with duplicates_rank filter)
  3. dex.trades (_beta) (which is enriched with metadata)

Is this intended? As step 1 and step 2 are the same data which will be stored twice

let's give it a shot with step 2 as a view, see how performance compares (will push in a sec)

edit:
logs prior to changing to view

22:07:30  18 of 19 START sql incremental model test_schema.git_dunesql_b3c17dea_dex_stg_trades  [RUN]
22:10:07  18 of 19 OK created sql incremental model test_schema.git_dunesql_b3c17dea_dex_stg_trades  [SUCCESS in 156.93s]
22:10:07  19 of 19 START sql incremental model test_schema.git_dunesql_b3c17dea_dex_trades_beta  [RUN]
22:11:34  19 of 19 OK created sql incremental model test_schema.git_dunesql_b3c17dea_dex_trades_beta  [SUCCESS in 87.21s]
22:13:35  11 of 12 START sql incremental model test_schema.git_dunesql_b3c17dea_dex_stg_trades  [RUN]
22:14:50  11 of 12 OK created sql incremental model test_schema.git_dunesql_b3c17dea_dex_stg_trades  [SUCCESS in 74.86s]
22:14:50  12 of 12 START sql incremental model test_schema.git_dunesql_b3c17dea_dex_trades_beta  [RUN]
22:22:20  12 of 12 OK created sql incremental model test_schema.git_dunesql_b3c17dea_dex_trades_beta  [SUCCESS in 450.01s]

edit 2:
seems like minor performance hit when it's a view, ~450s --> ~500s runtime after changing

@Hosuke
Copy link
Collaborator Author

Hosuke commented Nov 8, 2023

@Hosuke (or anyone else interested) this should be ready for review again. i'm comfortable with the current format, with a few minor caveats:

  • do we need to revisit the tests and see why counts are off? we should compare counts of these CI table build to the actual dex.trades with filters as needed

The diff in counts may due to the latest refresh time. If we add an ending time filter, like where block_time < TIMESTAMP '2023-11-01', we may have closer results.

  • naming standards on macros
  • adding new columns (though i'd like to do that in a follow up PR i think)

We may consider if we can add tx_index in dex.trades and add_tx_from_and_to macro.

@jeff-dude
Copy link
Member

@Hosuke (or anyone else interested) this should be ready for review again. i'm comfortable with the current format, with a few minor caveats:

  • do we need to revisit the tests and see why counts are off? we should compare counts of these CI table build to the actual dex.trades with filters as needed

The diff in counts may due to the latest refresh time. If we add an ending time filter, like where block_time < TIMESTAMP '2023-11-01', we may have closer results.

  • naming standards on macros
  • adding new columns (though i'd like to do that in a follow up PR i think)

We may consider if we can add tx_index in dex.trades and add_tx_from_and_to macro.

i think we can feel confident in row counts / raw amount totals:
image

as for tx_index, i'd like to add in follow up PR that is cleaner, since we will need to change that existing macro and how it's used a bit. i may even rename it, since it's now pulling more columns. i don't know the full impact of that, so don't want to introduce here.

@jeff-dude
Copy link
Member

CI tests are looking good too 🔥

run times were improved by adding incremental predicates on the target & then also adding in a missing incremental filter in the enrich_dex_trades macro, now incremental looks much more reasonable

AND p_sold.blockchain = base.blockchain
{% if is_incremental() %}
WHERE
{{ incremental_predicate('base.block_time') }}
Copy link
Contributor

@grkhr grkhr Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you moved this to the upper level it might improve incremental performance more. i mean:

with 

...

stg_cte as (
	select * from {{ stg_trades }}
	{% if is_incremental() %}
		where {{ incremental_predicate('base.block_time') }}
	{% endif %}
)

...

from stg_cte
left join 

...

filtering data as early as possible before joins almost always helps.

P.S: also — base :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can give it a shot, but not sure we'll gain much from what i've seen before.

ugh, why did they have to name their blockchain base, that's such a useful word for aliases and such 🥲

Copy link
Contributor

@grkhr grkhr Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i hope Stargate won't release their STG chain someday haha

Copy link
Contributor

@grkhr grkhr Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we'll gain much from what i've seen before

dex_trades_beta [SUCCESS in 14.06s]dex_trades_beta [SUCCESS in 9.90s]
not much, but still around 1.5x which can easily become x2-x3 on a larger dataset i think

(but it can be just bias)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeff-dude yep it's just bias. Trino's optimizer is pretty smart.

image image

Anyway good to know!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow which tool are you using here, seems pretty helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hosuke explain (format graphviz) select * from ... → any graphviz online tool

@jeff-dude
Copy link
Member

jeff-dude commented Nov 9, 2023

thank you again to all who provided feedback and helped push this beta version of the new dex.trades design 🙏

while everything looked good on the logic within, there was some back and forth on naming standards across PR comments, dm's & internal convo. i've pushed some final commits with what we're sticking with for now:

  • back to base_ prefix on upstream building block spells. while having a base blockchain causes some confusion, this still seems most logical between all the options (stage can be confusing, either different env altogether or a staging table which usually means truncate/load to prep for an incremental load downstream). blockchain name will always be in the schema, and base_ prefix will always be part of table name, so that is our best way to quickly differentiate.
  • stick to uniswap_ name in generic dex macros, but change fork to compatible

i plan to merge this soon, then i can revisit the linked gh issue and summarize our findings within this PR. we can also use that issue to continue conversation as we test out the beta and obtain more feedback.

@jeff-dude jeff-dude merged commit 71ad1a1 into duneanalytics:main Nov 9, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Nov 9, 2023
@Hosuke Hosuke deleted the restructing-dex-dev branch February 2, 2024 09:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
dbt: dex covers the DEX dbt subproject WIP work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants