Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DuneSQL Migration: new macros, new dunesql CI, and translated + legacy evms_blocks model #3570

Merged
merged 41 commits into from
Jun 27, 2023

Conversation

couralex6
Copy link
Contributor

Trino macro prior to trino migration

.github/workflows/dbt_slim_ci.yml Outdated Show resolved Hide resolved
macros/dune/adapters.sql Outdated Show resolved Hide resolved
macros/dune/source.sql Outdated Show resolved Hide resolved
models/uniswap/ethereum/uniswap_ethereum_sources.yml Outdated Show resolved Hide resolved
models/uniswap/ethereum/uniswap_v1_ethereum_trades.sql Outdated Show resolved Hide resolved
models/uniswap/ethereum/uniswap_v1_ethereum_trades.sql Outdated Show resolved Hide resolved
@jeff-dude jeff-dude added WIP work in progress dune team created by dune team labels Jun 21, 2023
@couralex6 couralex6 changed the title [WIP] Trino macros Translating evms_blocks to test dual ci spark-trino Jun 21, 2023
@a-monteiro a-monteiro force-pushed the trino-macros branch 2 times, most recently from dec3bc2 to 13a8682 Compare June 22, 2023 09:33

- name: dbt test incremental model(s) if applicable
if: env.INC_MODEL_COUNT > 0
run: "dbt test --select state:modified,config.materialized:incremental --exclude tag:prod_exclude --defer --state ."
run: "dbt test $PROFILE --select state:modified,config.materialized:incremental,$TAG --exclude $EXCLUDEtag:prod_exclude --defer --state ."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
run: "dbt test $PROFILE --select state:modified,config.materialized:incremental,$TAG --exclude $EXCLUDEtag:prod_exclude --defer --state ."
run: "dbt test $PROFILE --select state:modified,config.materialized:incremental,$TAG --exclude ${EXCLUDE}tag:prod_exclude --defer --state ."

macros/expose_spells.sql Outdated Show resolved Hide resolved
macros/mark_as_spell.sql Outdated Show resolved Hide resolved
.github/workflows/commit_manifest.yml Show resolved Hide resolved
macros/dune/schema.sql Outdated Show resolved Hide resolved
@jeff-dude jeff-dude added ready-for-merging and removed WIP work in progress labels Jun 23, 2023
@couralex6 couralex6 changed the title Translating evms_blocks to test dual ci spark-trino DuneSQL Migration: new macros, new dunesql CI, and translated + legacy evms_blocks model Jun 23, 2023
@jeff-dude
Copy link
Member

taking a few notes here to prep for merge, mostly for myself to follow:

  • modifying existing macros, especially mark_as_spell but also expose_spells to a lesser extent, typically make dbt think an entire full refresh is needed, as they impact all spells -- therefore, i'd recommend we disable the state:modified step in our orchestration for this merge, so we avoid unnecessary full refresh on all 1k+ spells
  • we should monitor PRs closely after merge to ensure commit_manifest workflow runs as expected with engine profiles and env variables added -- no real takeaway here other than be cognizant of PR activity after to ensure we see results as expected, as it'll impact all PR CI tests due to manifest comparisons and running only needed models per PR
  • ensure DBT_ENV_CUSTOM_ENV_S3_BUCKET is set as expected in dbt cloud for new trino project
  • ensure we educate users on how to leverage new alias config property across engines
  • is this our bucket for both dev PRs and prod?

@couralex6 couralex6 merged commit e5fc627 into main Jun 27, 2023
@couralex6 couralex6 deleted the trino-macros branch June 27, 2023 17:57
@jeff-dude
Copy link
Member

jeff-dude commented Jun 27, 2023

notes post-merge

critical path

  • commit manifest action failed due to permissions bug, access denied on dunesql run
  • dbt cloud orchestration is adding prod_ prefix to schema names when it should not
    • this has been fixed, the prod value for target in dbt cloud hourly incremental job was missing, it is now added and schemas write as expected

not critical path, but maybe nice-to-have:

  • for the matrix engine, have both jobs (spark and dunesql) run simultaneously rather than sequential?
    • should we actually only be running one or the other? if tag:dunesql is applied, only run dunesql ci? if no tag, only run spark? otherwise, won't we see one or the other fail each time on syntax errors?
    • example PR here. it's editing existing spark spells, but runs dunesql ci test as it's first in the matrix engine

potential clean-up:

  • drop the prod_evms.blocks view created in new hive metastore, since it was incorrectly built and added the prod_ prefix? i belive it will persist without manual drop

@dot2dotseurat
Copy link
Collaborator

regarding permissions bug, this is the PR from the s3 permissions I initially set up on the original manifest bucket. have we made a similar pr for the new bucket manifest-spellbook-dunesql?

@hildobby
Copy link
Collaborator

hildobby commented Jun 28, 2023

Hey @couralex6, I'm looking into adding Celo to the tables in the evms sector. What are the reasons for the creation of evms_blocks_legacy.sql as a separate file from evms_blocks.sql? Seems like the added 'dunesql' tag and the altered alias are the only two differences. What is the purpose of this change and why this specific model? Which file should I build on top of now?

Screenshot 2023-06-28 at 16 27 54

@jeff-dude
Copy link
Member

Hey @couralex6, I'm looking into adding Celo to the tables in the evms sector. What are the reasons for the creation of evms_blocks_legacy.sql as a separate file from evms_blocks.sql? Seems like the added 'dunesql' tag and the altered alias are the only two differences. What is the purpose of this change and why this specific model? Which file should I build on top of now?

Screenshot 2023-06-28 at 16 27 54

we are still working on an updated contribution guide that will include most of these details for yourself and others. if you're looking to get ahead, here are some answers:

  • _legacy.sql files will maintain the spark logic and continue to run on spark engine, so the data maintains in sync until we deprecate spark completely. also within legacy file will be updated alias property in config, to match new alias macro
  • the translated to dunesql files will not contain _legacy and add a few things to help orchestration and table naming. anything written in dunesql syntax will need that tag for all orchestration needs. the alias macro (i.e. what the property you're referring to calls) has been edited to accept more params, including legacy flag, to help name tables with or without the legacy suffix.
  • this specific model was chosen since it's a view and quick to run in CI tests and was top of mind for me :)
  • build on both files for the time being, hopefully that will be short lived process to edit both

we will ensure to look closely in review on new PRs, then can use findings to help improve contribution guide

@hildobby
Copy link
Collaborator

Hey @couralex6, I'm looking into adding Celo to the tables in the evms sector. What are the reasons for the creation of evms_blocks_legacy.sql as a separate file from evms_blocks.sql? Seems like the added 'dunesql' tag and the altered alias are the only two differences. What is the purpose of this change and why this specific model? Which file should I build on top of now?
Screenshot 2023-06-28 at 16 27 54

we are still working on an updated contribution guide that will include most of these details for yourself and others. if you're looking to get ahead, here are some answers:

  • _legacy.sql files will maintain the spark logic and continue to run on spark engine, so the data maintains in sync until we deprecate spark completely. also within legacy file will be updated alias property in config, to match new alias macro
  • the translated to dunesql files will not contain _legacy and add a few things to help orchestration and table naming. anything written in dunesql syntax will need that tag for all orchestration needs. the alias macro (i.e. what the property you're referring to calls) has been edited to accept more params, including legacy flag, to help name tables with or without the legacy suffix.
  • this specific model was chosen since it's a view and quick to run in CI tests and was top of mind for me :)
  • build on both files for the time being, hopefully that will be short lived process to edit both

we will ensure to look closely in review on new PRs, then can use findings to help improve contribution guide

Ah alright makes sense! I made a PR to add Celo but looks like it passed tests without needing me to touch anything new, lmk if there's something new that ci tests dont account for! #3640

@jeff-dude
Copy link
Member

a PR to add Celo but looks like it passed tests without needing me to touc

will follow up on that PR when i can 🙏

olgafetisova pushed a commit to olgafetisova/spellbook that referenced this pull request Nov 14, 2023
…y evms_blocks model (duneanalytics#3570)

* create schema macro for trino

* trino create table

* Adds a translated spell to test trino selector

* changes tag name

* Adds exclude tag names from CI workflow definition

* Adds missing comma

* Run alter table on every run

* Adds delta_prod as database for source

* DBT macro to override source database

* Only add delta_prod to source if we are running on trino

* Updates expose spell and mark as spell macros

* Jonas' comments

* revert uniswap changes

* Adds dbt trino to pipfile

* alter view fix

* adds dunesql tag to evms_blocks model

* use tags instead of env var in macros to detect dunesql models

* Adds legacy evms_block file which contains the original spark model

* Add dbt matrix strategy for spark and dunesql

* Add alias macro

* Use alias macro in evms blocks

* Fix select/excludes

* Run sequentially

* Fix exclude tag

* Test fix for broken spark selector

* fixing bug where compile on spark had an empty select flag

* Refactor dbt compile

* Updates commit manifest action to generate both dunsql and spark manifests

The existing logic for spark manifest is unchanged, we still upload the spark manifest
to the same location. We run the same logic to upload the dunesql manifest to a different
s3 manifest.

* Download manifest from correct location for dunesql runs

* simplifies copy manifest logic

* only run dunesql

* Fix commit manifest target location

* dunesql_check CI fix

* Only select state:new,tag:dunesql

* Move comma inside of $TAG. state:modified != state:modified,

* Removes 'or True' from expose_spell and mark_as_spell macros prior to merging

* Test CI run on different s3 location

* Revert "Test CI run on different s3 location"

This reverts commit 17dab72.

* prod s3 bucket

* trailing /

* Prod location again

---------

Co-authored-by: André Monteiro <andre@dune.com>
Co-authored-by: Jonas Irgens Kylling <jonas@dune.xyz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dune team created by dune team ready-for-merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants