-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADAP-379] [Feature] Improve startup performance #528
Comments
It's a fair question whether it's faster to run I think that would look like a change to the (An older issue in a similar vein, since it relates to the caching queries that run at startup, which tend to be among the more expensive: #83) |
FYI in my tests, it was about an order of magnitude faster to run |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
Is this your first time submitting a feature request?
Describe the feature
Dbt is fairly slow in general for Snowflake (especially on small datasets) - and this can be particularly annoying at startup. This is largely because DBT issues a lot of metadata queries, and metadata queries are orders of magnitude slower in Snowflake than "direct queries" - even if those queries would need to look at the same metadata internally (this is I guess due to how Snowflake implements things, they chose to make explicit queries to INFORMATION_SCHEMA be much slower).
Take for example this DBT run:
26s total runtime, 2s "useful" runtime. Now, this was an extreme case, but it always takes a few seconds just to start. Why? Here's a hint (I modified dbt to print the queries as they are issued)
First thing it does, is "create_schemas()"; in order to see what schemas are already created, it lists existing schemas in the database. We actually can have A TON of schemas in the database - and this generally works fine with snowflake, but makes the DBT startup slow (also, if we were to have more than 10k schemas, which is not completely unimaginable, dbt would break down).
Thing is - it's completely avoidable; create_schemas() could simply "CREATE SCHEMA IF NOT EXISTS", there's no need to use "reflection"/ to list all the schemas in order to find whether the one that we're interested in exists or not.
Describe alternatives you've considered
No response
Who will this benefit?
Everybody using dbt with snowflake for development - it would make dbt builds so much faster, especially for small datasets.
Are you interested in contributing this feature?
maybe; but I don't have lots of time right now.
Anything else?
In general, it's way faster for snowflake to attempt to execute the query rather than to query the information schema explicitly. Unfortunately, from a quick glance, it seems that dbt implementation is heavily skewed towards querying the information schemas - so I don't think this is easily fixable in dbt-snowflake (and I'm not sure anybody would want to "fix"/ change the approach in dbt-core).
The text was updated successfully, but these errors were encountered: