SQL Endpoint only supports merge
incremental strategy [and still doesn't yet]
#138
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
resolves #133
Disable
insert_overwrite
for endpointIn order for the
insert_overwrite
incremental strategy to work as expected, we set two properties:set spark.sql.sources.partitionOverwriteMode = DYNAMIC
so that Spark will dynamically determine which partitions have new data, and then replace only those, leaving in place any without new data. If the table is not partitioned, Spark will replace the entire table, equivalent to an atomictruncate
+insert
.spark.sql.hive.convertMetastoreParquet = false
(docs): I honestly don't remember the exact reasons, but this has been here since the earliest days of incremental models in dbt-spark (c13b20a)Several weeks ago, we found that the new SQL Endpoints started returning errors when dbt tried to run
set
statements. Following discussion with our contacts at Databricks, we found out that this support was never intended:This PR therefore:
set
statements to run IFF the incremental strategy isinsert_overwrite
insert_overwrite
strategy on the SQL Endpoint:This may feel a bit silly, given that
insert_overwrite
is the default, andmerge
requires two additional configs. Should we change the defaults depending on the connection type? i.e. Default toincremental_strategy: merge
andfile_format: delta
(instead ofparquet
) if the user has an ODBC connection to Databricks.Whereas
merge
should work... soonThere's one other issue that currently prevents incremental models, even
merge
-strategy ones, from running on the SQL Analytics endpoint:create temp view
is not yet supported. In that case, crucially, Databricks intends to eventually support it:Checklist
CHANGELOG.md
and added information about my change to the "dbt next" section.