Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix/disable-search-service #7

Merged
merged 6 commits into from
Oct 28, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
# dbt_unified_rag v0.1.0
# dbt_unified_rag v0.1.0-a2

## Bug Fixes
- For Snowflake destinations, we have removed the post-hook from the `rag__unified_document` which generated the `rag__unified_search` Cortex Search Service.
- While the Search Service worked when deployed locally, there were issues identified when deploying and running via Fivetran Quickstart. In order to ensure Snowflake users are still able to take advantage of the `rag__unified_document` end model, we have removed the search service from execution until we are able to verify it works as expected on all supported orchestration methods.
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved

# dbt_unified_rag v0.1.0-a1

This is the initial release of the Unified RAG dbt package!

Expand Down
11 changes: 2 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
## What does this dbt package do?

<!--section="unified_rag_transformation_model"-->
The main focus of this dbt package is to generate an end model and [Cortex Search Service](https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) (for Snowflake destinations only) which contains the below relevant unstructured document data to be used for Retrieval Augmented Generation (RAG) applications leveraging Large Language Models (LLMs):
The main focus of this dbt package is to generate an end model and which contains the below relevant unstructured document data to be used for Retrieval Augmented Generation (RAG) applications leveraging Large Language Models (LLMs):
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved
- [HubSpot](https://fivetran.com/docs/connectors/applications/hubspot): Deals
- [Jira](https://fivetran.com/docs/connectors/applications/jira): Issues
- [Zendesk](https://fivetran.com/docs/connectors/applications/zendesk): Tickets
Expand All @@ -26,12 +26,6 @@ The following table provides a detailed list of all models materialized within t
| **Table** | **Description** |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| [rag__unified_document](https://fivetran.github.io/dbt_unified_rag/#!/model/model.unified_rag.rag__unified_document) | Each record represents a chunk of text prepared for semantic-search and additional fields for use in LLM workflows. |

Additionally, for **Snowflake** destinations, a [Cortex Search Service](https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) will be generated as a result of this data model. The Cortex Search Service uses the results of the `rag__unified_document` and enables Snowflake users to take advantage of low-latency, high quality "fuzzy" search over their data for use in RAG applications leveraging LLMs. See the below table for details.

| **Snowflake Cortex Search Service** | **Description** |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| [rag__unified_search](https://github.com/fivetran/dbt_unified_rag/blob/main/macros/search_generation.sql) | Generates a Snowflake Cortex Search service via the [search_generation](https://github.com/fivetran/dbt_unified_rag/blob/main/macros/search_generation.sql) macro as a post-hook for Snowflake destinations. This Cortex Search Service is currently configured with a target lag of 1 day. **Please be aware that this search service will refresh automatically once a day even outside of this data model execution.** To understand more about the Cortex Search Service, you can run `SHOW CORTEX SEARCH SERVICES` in the respective Snowflake database.schema which the `rag__unified_document` is materialized. See [here](https://docs.snowflake.com/en/sql-reference/commands-cortex-search) for other relevant commands to use for understanding the nature of the Search Service, and [here](https://docs.snowflake.com/en/sql-reference/functions/search_preview-snowflake-cortex) for helpful commands to use when leveraging the results of the Cortex Search Service in your LLM applications. |
<!--section-end-->

## How do I use the dbt package?
Expand All @@ -44,7 +38,6 @@ To use this dbt package, you must have the following:
- [Jira](https://fivetran.com/docs/connectors/applications/jira)
- [Zendesk Support](https://fivetran.com/docs/connectors/applications/zendesk)
- A **Snowflake**, **BigQuery**, **Databricks**, or **PostgreSQL** destination.
- Please note, the Cortex Search Service will only be generated for Snowflake destinations.
- Redshift destinations are not currently supported due to the stringent character limitations within string datatypes. If you would like Redshift destinations to be supported, please comment within our logged [Feature Request](https://github.com/fivetran/dbt_unified_rag/issues/3).

### Step 2: Install the package
Expand All @@ -53,7 +46,7 @@ Include the following package_display_name package version in your `packages.yml
```yml
packages:
- package: fivetran/unified_rag
version: 0.1.0-a1
version: 0.1.0-a2
```

### Step 3: Define database and schema variables
Expand Down
3 changes: 1 addition & 2 deletions models/rag__unified_document.sql
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@
cluster_by = ['unique_id'],
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved
unique_key='unique_id',
incremental_strategy = 'insert_overwrite' if target.type in ('bigquery', 'databricks', 'spark') else 'delete+insert',
file_format='delta' if unified_rag.is_databricks_sql_warehouse() else 'parquet',
post_hook=["{{ unified_rag.search_generation(this,'rag__unified_search') }}"] if target.type == 'snowflake' else []
file_format='delta' if unified_rag.is_databricks_sql_warehouse() else 'parquet'
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved
)
}}

Expand Down