Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): Add support for single embedding retrieval with PGVector #406

Merged
merged 18 commits into from
Dec 4, 2024

Conversation

shamb0
Copy link
Contributor

@shamb0 shamb0 commented Oct 22, 2024

Part 2 of PR: swiftide/pull/392

  • Added retrieval functionality to pgvector.
  • Verified indexing and querying through unit tests and example demo applications. Integration works as expected.

- Prototype pipeline for loading, chunking, enhancing, embedding, and storing markdown content in pgvector.

Signed-off-by: shamb0 <r.raajey@gmail.com>
- added Postgres test_util,
- completed unit tests for persist and retrieval

Signed-off-by: shamb0 <r.raajey@gmail.com>
Signed-off-by: shamb0 <r.raajey@gmail.com>
@shamb0 shamb0 force-pushed the feat/indexing-retrieval-pgvector branch from 81ef43b to a9e2a9e Compare November 21, 2024 16:48
@shamb0 shamb0 marked this pull request as ready for review November 21, 2024 16:49
@shamb0
Copy link
Contributor Author

shamb0 commented Nov 21, 2024

Hi @timonv

I’m reaching out regarding some unit test failures related to the retrieval process. Here are the specific scenarios that are failing:

failures:  
    pgvector::tests::test_persist_nodes::both_mode_with_metadata  
    pgvector::tests::test_persist_nodes::both_mode_without_metadata  
    pgvector::tests::test_persist_nodes::perfield_mode_with_metadata  
    pgvector::tests::test_persist_nodes::perfield_mode_without_metadata  

The failures seem to stem from the complexity of managing multiple columns of vector embeddings. The current implementation of Query<T> has limitations when querying individual vector embedding columns.

Key Issue

To ensure proper functionality, both the Embedding value and its corresponding EmbeddedField type must be specified in the query. However, it seems some field variables are missing these options:

  • Query<T>::embedding
  • Query<T>::sparse_embedding

To reproduce the issue for the above scenarios, set the use_adv_embedding_query flag to false (it is set to true by default). This should replicate the failures.

Temporary Workaround

As a temporary measure, I introduced a new type, Query<T>::adv_embedding, to highlight the limitation. However, this is not intended as a final solution.

Request for Guidance

I noticed your comment in the code:

// TODO: How would this work when doing a rollup query?  

It seems you’ve already been considering ways to consolidate query options. Could you please share your thoughts on how we might address this scenario effectively.

Thank you for your time and assistance! I’m looking forward to your insights.

@timonv
Copy link
Member

timonv commented Nov 22, 2024

Hey @shamb0,

Interesting, I'll get back to this more in depth asap. The query pipeline currently has a limitation, as you pointed out, when retrieving from multiple sources. Originally, the idea was for search strategies to provide a sane default to get started quickly, perhaps that can still hold, i.e. SimilaritySingleEmbedding is meant for when there is a single embedding. Storage layers can implement multiple strategies, so perhaps a MultipleNamedEmbeddings could be a good introduction here. I have a hunch adv_embedding isn't needed, as you pointed out, but I'd like to get back to that more in depth.

Zooming out and looking at the bigger picture and future architecture, I'm tempted to tackle the issue with strategies as much as possible. For instance, this was some mock coding I did recently on fusion search (multiple retrievers):

let fusion_retriever = FusionRetriever::builder()
    .add_retriever(SimpleSimilaritySearch::default(), Qdrant::default())
    .add_retriever(FullTextSearch::default(), Lancedb::default())
    .build();
    
impl<S: SearchStrategy> Retrieve<S> for FusionRetriever {
    fn retrieve(_strat, query) {
        let documents = tokio::join!(self.retrievers.iter().map(|strat, retriever| retriever.retrieve(strat, query));
        
        query.retrieved_documents(documents)
    }
}

// And then in a pipeline
query::Pipeline::default()
    ...
    .then_retrieve(fusion_retriever)
    .then(Reranker::builder().top_n(10).bm25().build()?)
    .query("How can I use the query pipeline in Swiftide?")
    .await?;

From my perspective I think it's relatively easy to implement, and having an architecture like this could really nicely enable other methods of retrieval as well (i.e. parent lookup, routed retrieval, other wild combinations). However, I do feel it introduces some odd abstractions; then again, if it's only for advanced usage, and there's sane, easier defaults to start with, perhaps it's not that bad. What are your thoughts on this?

My vision on the query pipeline (discussion greatly appreciated):

  • Search strategies offer a sane default to get started quickly
  • Retrieved documents should be first class citizens in the pipeline; you get a big pile, and they can be modified, mangled, and filtered by follow up steps, and thus not stored in the state, but in the query itself.
  • Pre-retrieval transformation transform the query to something that suits the search strategy and retrieval methods, post retrieval works towards answer generation
  • In general; I prefer to offer nice defaults, and not abstract away too much from the underlying api. If a user wants to do something more complicated, the underlying query api should be available.

// TODO: How would this work when doing a rollup query?

So one use case, you might want to generate subquestions on a query, embed each individually, then retrieve for each individually (perhaps even routed to different retrievers), then consolidate the documents, rerank, and put it back together in a single query. I think there is a nice, low level architectural solution to this, but I haven't given it that much thought yet.

I'm mostly working on agents right now, I don't really have the time to make large architectural changes.

Perhaps it's an idea to focus on one search strategy at a time? What are your thoughts on this?

@timonv
Copy link
Member

timonv commented Nov 22, 2024

@shamb0 You're not on our Discord right? Might be a bit faster to discuss things there! Love the effort you're putting into this.

@shamb0
Copy link
Contributor Author

shamb0 commented Nov 22, 2024

@shamb0 You're not on our Discord right? Might be a bit faster to discuss things there! Love the effort you're putting into this.

Sure @timonv, I'm joining now 😄 👍🏾 .

@shamb0
Copy link
Contributor Author

shamb0 commented Nov 23, 2024

Hey @shamb0,

Interesting, I'll get back to this more in depth asap. The query pipeline currently has a limitation, as you pointed out, when retrieving from multiple sources. Originally, the idea was for search strategies to provide a sane default to get started quickly, perhaps that can still hold, i.e. SimilaritySingleEmbedding is meant for when there is a single embedding. Storage layers can implement multiple strategies, so perhaps a MultipleNamedEmbeddings could be a good introduction here. I have a hunch adv_embedding isn't needed, as you pointed out, but I'd like to get back to that more in depth.

Zooming out and looking at the bigger picture and future architecture, I'm tempted to tackle the issue with strategies as much as possible. For instance, this was some mock coding I did recently on fusion search (multiple retrievers):

let fusion_retriever = FusionRetriever::builder()
    .add_retriever(SimpleSimilaritySearch::default(), Qdrant::default())
    .add_retriever(FullTextSearch::default(), Lancedb::default())
    .build();
    
impl<S: SearchStrategy> Retrieve<S> for FusionRetriever {
    fn retrieve(_strat, query) {
        let documents = tokio::join!(self.retrievers.iter().map(|strat, retriever| retriever.retrieve(strat, query));
        
        query.retrieved_documents(documents)
    }
}

// And then in a pipeline
query::Pipeline::default()
    ...
    .then_retrieve(fusion_retriever)
    .then(Reranker::builder().top_n(10).bm25().build()?)
    .query("How can I use the query pipeline in Swiftide?")
    .await?;

From my perspective I think it's relatively easy to implement, and having an architecture like this could really nicely enable other methods of retrieval as well (i.e. parent lookup, routed retrieval, other wild combinations). However, I do feel it introduces some odd abstractions; then again, if it's only for advanced usage, and there's sane, easier defaults to start with, perhaps it's not that bad. What are your thoughts on this?

My vision on the query pipeline (discussion greatly appreciated):

  • Search strategies offer a sane default to get started quickly
  • Retrieved documents should be first class citizens in the pipeline; you get a big pile, and they can be modified, mangled, and filtered by follow up steps, and thus not stored in the state, but in the query itself.
  • Pre-retrieval transformation transform the query to something that suits the search strategy and retrieval methods, post retrieval works towards answer generation
  • In general; I prefer to offer nice defaults, and not abstract away too much from the underlying api. If a user wants to do something more complicated, the underlying query api should be available.

// TODO: How would this work when doing a rollup query?

So one use case, you might want to generate subquestions on a query, embed each individually, then retrieve for each individually (perhaps even routed to different retrievers), then consolidate the documents, rerank, and put it back together in a single query. I think there is a nice, low level architectural solution to this, but I haven't given it that much thought yet.

I'm mostly working on agents right now, I don't really have the time to make large architectural changes.

Perhaps it's an idea to focus on one search strategy at a time? What are your thoughts on this?

Hi @timonv,

Thank you for sharing the architectural overview and your thoughts on FusionRetriever. The concept is fascinating and shows great potential for enhancing the query pipeline. The ability to integrate multiple search strategies from diverse sources and seamlessly feed the retrieved documents into post-retrieval stages is both flexible and robust. I also appreciate your emphasis on maintaining simplicity through sane defaults while enabling advanced use cases.

I’m interested in the use case you shared:

"One use case might involve generating subquestions from a query, embedding each individually, retrieving for each (potentially routed to different retrievers), consolidating the results, reranking them, and assembling everything back into a single query."

If your schedule is currently occupied with higher-priority tasks, would you be open to allowing me to prototype this concept? I’d love to start exploring the FusionRetriever architecture, using your suggested use case as the foundation. While I may need further guidance on the rerank process, I feel confident in tackling the rest.

Looking forward to your thoughts and feedback!

@shamb0
Copy link
Contributor Author

shamb0 commented Nov 25, 2024

Hi @timonv,

I've resolved the issue using the HybridSearch strategy.
Now unit test and integration looks OK and available for first round of review.

Thankyou

@shamb0 shamb0 force-pushed the feat/indexing-retrieval-pgvector branch 3 times, most recently from 266a62f to c810932 Compare November 25, 2024 11:34
Signed-off-by: shamb0 <r.raajey@gmail.com>
@shamb0 shamb0 force-pushed the feat/indexing-retrieval-pgvector branch from c810932 to 9a32ee9 Compare November 25, 2024 11:50
Copy link
Member

@timonv timonv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this works!

Hybrid Search is a specific search method in RAG that combines keyword search with vector search, it's very effective, but this is not it. See i.e. https://towardsdatascience.com/how-to-use-hybrid-search-for-better-llm-rag-retrieval-032f66810ebe

I'd suggest to just stick to the single embedding, or create a SimilarityMultipleNamedEmbeddings strategy, or we could go lazy (not so lazy, it's not trival), and add a Custom strategy that takes a sqlx query, or a closure that returns a sqlx query with the nodes yieled? That would enable a lot of doors in fully controlling the search process. I'd suggest to start with the small thing first (single embedding), and if you're still interested, I'm excited to see what you can think of.

fastembed.clone(),
pgv_storage.clone(),
question.to_string(),
).await?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a query_all, should be no need for a loop :-)

@shamb0
Copy link
Contributor Author

shamb0 commented Nov 29, 2024

Closing this in favor of implementing the suggested alternative search strategies.

@shamb0 shamb0 closed this Nov 29, 2024
@timonv
Copy link
Member

timonv commented Nov 29, 2024

We could still merge the single similarity, that looks great @shamb0

@shamb0
Copy link
Contributor Author

shamb0 commented Nov 29, 2024

We could still merge the single similarity, that looks great @shamb0

Thanks, @timonv! Apologies for overlooking your message earlier. I'll proceed with keeping only the 'Single Similarity' search integration for the intake merger.

@shamb0 shamb0 reopened this Nov 29, 2024
dependabot bot and others added 6 commits November 30, 2024 12:28
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.68 to
1.0.69.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/dtolnay/thiserror/releases">thiserror's
releases</a>.</em></p>
<blockquote>
<h2>1.0.69</h2>
<ul>
<li>Backport 2.0.2 fixes</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/dtolnay/thiserror/commit/41938bd3a03a70d34ed8e53d99c89c770c7c9c41"><code>41938bd</code></a>
Release 1.0.69</li>
<li><a
href="https://github.com/dtolnay/thiserror/commit/9d6506e8609930759946925f768eb4fd8dd2e4c1"><code>9d6506e</code></a>
Merge pull request <a
href="https://redirect.github.com/dtolnay/thiserror/issues/382">#382</a>
from dtolnay/hang</li>
<li><a
href="https://github.com/dtolnay/thiserror/commit/591a44d9a37b0326e808df7ef38a6a101badab17"><code>591a44d</code></a>
Fix fallback fmt expression parser hang</li>
<li><a
href="https://github.com/dtolnay/thiserror/commit/5b36e375c2f6b0a8189134f34b7c8f5ca3ec28d1"><code>5b36e37</code></a>
Add ui test of invalid expression syntax in display attribute</li>
<li>See full diff in <a
href="https://github.com/dtolnay/thiserror/compare/1.0.68...1.0.69">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=thiserror&package-manager=cargo&previous-version=1.0.68&new-version=1.0.69)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps the minor group with 2 updates:
[tracing](https://github.com/tokio-rs/tracing) and
[spider](https://github.com/spider-rs/spider).

Updates `tracing` from 0.1.40 to 0.1.41
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/tokio-rs/tracing/releases">tracing's
releases</a>.</em></p>
<blockquote>
<h2>tracing 0.1.41</h2>
<p>[ [crates.io][crate-0.1.41] ] | [ [docs.rs][docs-0.1.41] ]</p>
<p>This release updates the <code>tracing-core</code> dependency to
[v0.1.33][core-0.1.33] and
the <code>tracing-attributes</code> dependency to
[v0.1.28][attrs-0.1.28].</p>
<h3>Added</h3>
<ul>
<li><strong>core</strong>: Add index API for <code>Field</code> (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2820">#2820</a>)</li>
<li><strong>core</strong>: Allow <code>&amp;[u8]</code> to be recorded
as event/span field (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2954">#2954</a>)</li>
</ul>
<h3>Changed</h3>
<ul>
<li>Bump MSRV to 1.63 (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2793">#2793</a>)</li>
<li><strong>core</strong>: Use const <code>thread_local</code>s when
possible (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2838">#2838</a>)</li>
</ul>
<h3>Fixed</h3>
<ul>
<li>Removed core imports in macros (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2762">#2762</a>)</li>
<li><strong>attributes</strong>: Added missing RecordTypes for
instrument (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2781">#2781</a>)</li>
<li><strong>attributes</strong>: Change order of async and unsafe
modifier (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2864">#2864</a>)</li>
<li>Fix missing field prefixes (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2878">#2878</a>)</li>
<li><strong>attributes</strong>: Extract match scrutinee (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2880">#2880</a>)</li>
<li>Fix non-simple macro usage without message (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2879">#2879</a>)</li>
<li>Fix event macros with constant field names in the first position (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2883">#2883</a>)</li>
<li>Allow field path segments to be keywords (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2925">#2925</a>)</li>
<li><strong>core</strong>: Fix missed <code>register_callsite</code>
error (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2938">#2938</a>)</li>
<li><strong>attributes</strong>: Support const values for
<code>target</code> and <code>name</code> (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2941">#2941</a>)</li>
<li>Prefix macro calls with ::core to avoid clashing with local macros
(<a
href="https://redirect.github.com/tokio-rs/tracing/issues/3024">#3024</a>)</li>
</ul>
<p><a
href="https://redirect.github.com/tokio-rs/tracing/issues/2762">#2762</a>:
<a
href="https://redirect.github.com/tokio-rs/tracing/pull/2762">tokio-rs/tracing#2762</a>
<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2781">#2781</a>:
<a
href="https://redirect.github.com/tokio-rs/tracing/pull/2781">tokio-rs/tracing#2781</a>
<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2793">#2793</a>:
<a
href="https://redirect.github.com/tokio-rs/tracing/pull/2793">tokio-rs/tracing#2793</a>
<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2820">#2820</a>:
<a
href="https://redirect.github.com/tokio-rs/tracing/pull/2820">tokio-rs/tracing#2820</a>
<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2838">#2838</a>:
<a
href="https://redirect.github.com/tokio-rs/tracing/pull/2838">tokio-rs/tracing#2838</a>
<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2864">#2864</a>:
<a
href="https://redirect.github.com/tokio-rs/tracing/pull/2864">tokio-rs/tracing#2864</a>
<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2878">#2878</a>:
<a
href="https://redirect.github.com/tokio-rs/tracing/pull/2878">tokio-rs/tracing#2878</a>
<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2879">#2879</a>:
<a
href="https://redirect.github.com/tokio-rs/tracing/pull/2879">tokio-rs/tracing#2879</a>
<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2880">#2880</a>:
<a
href="https://redirect.github.com/tokio-rs/tracing/pull/2880">tokio-rs/tracing#2880</a>
<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2883">#2883</a>:
<a
href="https://redirect.github.com/tokio-rs/tracing/pull/2883">tokio-rs/tracing#2883</a>
<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2925">#2925</a>:
<a
href="https://redirect.github.com/tokio-rs/tracing/pull/2925">tokio-rs/tracing#2925</a>
<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2938">#2938</a>:
<a
href="https://redirect.github.com/tokio-rs/tracing/pull/2938">tokio-rs/tracing#2938</a>
<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2941">#2941</a>:
<a
href="https://redirect.github.com/tokio-rs/tracing/pull/2941">tokio-rs/tracing#2941</a>
<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2954">#2954</a>:
<a
href="https://redirect.github.com/tokio-rs/tracing/pull/2954">tokio-rs/tracing#2954</a>
<a
href="https://redirect.github.com/tokio-rs/tracing/issues/3024">#3024</a>:
<a
href="https://redirect.github.com/tokio-rs/tracing/pull/3024">tokio-rs/tracing#3024</a>
[attrs-0.1.28]:
<a
href="https://github.com/tokio-rs/tracing/releases/tag/tracing-attributes-0.1.28">https://github.com/tokio-rs/tracing/releases/tag/tracing-attributes-0.1.28</a>
[core-0.1.33]:
<a
href="https://github.com/tokio-rs/tracing/releases/tag/tracing-core-0.1.33">https://github.com/tokio-rs/tracing/releases/tag/tracing-core-0.1.33</a>
[docs-0.1.41]: <a
href="https://docs.rs/tracing/0.1.41/tracing/">https://docs.rs/tracing/0.1.41/tracing/</a></p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/tokio-rs/tracing/commit/0ca78870815a34a345a908f48db057057e8803d2"><code>0ca7887</code></a>
chore: prepare tracing 0.1.41 (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/3159">#3159</a>)</li>
<li><a
href="https://github.com/tokio-rs/tracing/commit/504a287abbf93ef0ffae09688210f423e8061160"><code>504a287</code></a>
tracing: update core to v0.1.33 and attributes to v0.1.28 (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/3156">#3156</a>)</li>
<li><a
href="https://github.com/tokio-rs/tracing/commit/baa54894067b245bad0af8e45f92279e25820659"><code>baa5489</code></a>
chore: prepare tracing-attributes 0.1.28 (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/3155">#3155</a>)</li>
<li><a
href="https://github.com/tokio-rs/tracing/commit/cb0f0e71dd1020d63f2190ffef298b989c9e88b0"><code>cb0f0e7</code></a>
chore: prepare tracing-core 0.1.33 (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/3153">#3153</a>)</li>
<li><a
href="https://github.com/tokio-rs/tracing/commit/11c82730359f60b290cba0a405947b23c0a68d7b"><code>11c8273</code></a>
subscriber: don't gate <code>with_ansi()</code> on the &quot;ansi&quot;
feature (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/3020">#3020</a>)</li>
<li><a
href="https://github.com/tokio-rs/tracing/commit/8a25a16873d2970a6ac0577a0ceea916f3013424"><code>8a25a16</code></a>
core: fix missed <code>register_callsite</code> error (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2938">#2938</a>)</li>
<li><a
href="https://github.com/tokio-rs/tracing/commit/6f08af07f249a88aa5b34be98c3eb596aef9fc15"><code>6f08af0</code></a>
subscriber: set <code>log</code> max level when reloading (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/1270">#1270</a>)</li>
<li><a
href="https://github.com/tokio-rs/tracing/commit/f6a6c8c2864105b2ffb86c4720f91e422260ede2"><code>f6a6c8c</code></a>
tracing: add index API for <code>Field</code> (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/2820">#2820</a>)</li>
<li><a
href="https://github.com/tokio-rs/tracing/commit/345fbff277b155d28fa53df61669535cd8b06607"><code>345fbff</code></a>
Add <code>json-subscriber</code> to ecosystem (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/3121">#3121</a>)</li>
<li><a
href="https://github.com/tokio-rs/tracing/commit/82a92dfd8d002595e2ec56598a760d2592ff0a09"><code>82a92df</code></a>
fix: correct SerializeField definition and doc formatting (<a
href="https://redirect.github.com/tokio-rs/tracing/issues/3040">#3040</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/tokio-rs/tracing/compare/tracing-0.1.40...tracing-0.1.41">compare
view</a></li>
</ul>
</details>
<br />

Updates `spider` from 2.13.71 to 2.13.78
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/spider-rs/spider/releases">spider's
releases</a>.</em></p>
<blockquote>
<h2>v2.13.76</h2>
<h1>Whats Changed</h1>
<ul>
<li>Fix infinite loop with backoff Gateway retries</li>
<li>Fix limit handling break</li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/spider-rs/spider/compare/v2.13.64...v2.13.76">https://github.com/spider-rs/spider/compare/v2.13.64...v2.13.76</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/spider-rs/spider/commits">compare view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…his context.

Signed-off-by: shamb0 <r.raajey@gmail.com>
…his context.

Signed-off-by: shamb0 <r.raajey@gmail.com>
@shamb0
Copy link
Contributor Author

shamb0 commented Nov 30, 2024

Hi @timonv, I've completed the integration of the retrieval feature using the Single Similarity search strategy.

I plan to integrated below test cases as part of SimilarityMultipleNamedEmbeddings search strategy implementation.

pgvector::tests::test_persist_nodes::both_mode_with_metadata
pgvector::tests::test_persist_nodes::both_mode_without_metadata
pgvector::tests::test_persist_nodes::perfield_mode_with_metadata
pgvector::tests::test_persist_nodes::perfield_mode_without_metadata

Please review and share any feedback you may have on the PR.

Copy link
Member

@timonv timonv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamb0 Awesome, code looks great!

Since it's full circle now, could you add a very simple index and query integration test in swiftide/tests? After that, looks good to merge. Nice job!

timonv and others added 4 commits December 1, 2024 20:27
Signed-off-by: shamb0 <r.raajey@gmail.com>
Signed-off-by: shamb0 <r.raajey@gmail.com>
…his context.

Signed-off-by: shamb0 <r.raajey@gmail.com>
@shamb0
Copy link
Contributor Author

shamb0 commented Dec 1, 2024

Thanks @timonv, for the reminder about the integration test. I’ve implemented two test scenarios inspired by qdrant and lanchedb. Could you review them to ensure they align with the integration test scope and requirements? Let me know if any enhancements are needed.

Signed-off-by: shamb0 <r.raajey@gmail.com>
Signed-off-by: shamb0 <r.raajey@gmail.com>
Copy link
Member

@timonv timonv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Looks great and thanks again for the contribution and hard work. I'll try to get the pipelines fixed and merge it asap 🎉

@shamb0
Copy link
Contributor Author

shamb0 commented Dec 2, 2024

Awesome! Looks great and thanks again for the contribution and hard work. I'll try to get the pipelines fixed and merge it asap 🎉

Thanks @timonv

@timonv timonv changed the title feat: Add indexing and query pipeline support for pgvector integration feat(query): Add support for single embedding retrieval with PGVector Dec 4, 2024
@timonv timonv merged commit 3751f49 into bosun-ai:master Dec 4, 2024
10 checks passed
timonv pushed a commit that referenced this pull request Dec 11, 2024
## 🤖 New release
* `swiftide`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-agents`: 0.14.3 -> 0.14.4
* `swiftide-core`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-macros`: 0.14.3 -> 0.14.4
* `swiftide-integrations`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-indexing`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-query`: 0.14.3 -> 0.14.4 (✓ API compatible changes)

<details><summary><i><b>Changelog</b></i></summary><p>

## `swiftide`
<blockquote>

##
[0.14.4](v0.14.3...v0.14.4)
- 2024-12-11

### New features

-
[7211559](7211559)
*(agents)* **EXPERIMENTAL** Agents in Swiftide (#463)

````text
Agents are coming to Swiftide! We are still ironing out all the kinks,
  while we make it ready for a proper release. You can already experiment
  with agents, see the rustdocs for documentation, and an example in
  `/examples`, and feel free to contact us via github or discord. Better
  documentation, examples, and tutorials are coming soon.

  Run completions in a loop, define tools with two handy macros, customize
  the agent by hooking in on lifecycle events, and much more.

  Besides documentation, expect a big release for what we build this for
  soon! 🎉
````

-
[3751f49](3751f49)
*(query)* Add support for single embedding retrieval with PGVector
(#406)

### Miscellaneous

-
[5ce4d21](5ce4d21)
Clippy and deps fixes for 1.83 (#467)


**Full Changelog**:
0.14.3...0.14.4
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).
shamb0 pushed a commit to shamb0/swiftide that referenced this pull request Dec 14, 2024
## 🤖 New release
* `swiftide`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-agents`: 0.14.3 -> 0.14.4
* `swiftide-core`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-macros`: 0.14.3 -> 0.14.4
* `swiftide-integrations`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-indexing`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-query`: 0.14.3 -> 0.14.4 (✓ API compatible changes)

<details><summary><i><b>Changelog</b></i></summary><p>

## `swiftide`
<blockquote>

##
[0.14.4](bosun-ai/swiftide@v0.14.3...v0.14.4)
- 2024-12-11

### New features

-
[7211559](bosun-ai@7211559)
*(agents)* **EXPERIMENTAL** Agents in Swiftide (bosun-ai#463)

````text
Agents are coming to Swiftide! We are still ironing out all the kinks,
  while we make it ready for a proper release. You can already experiment
  with agents, see the rustdocs for documentation, and an example in
  `/examples`, and feel free to contact us via github or discord. Better
  documentation, examples, and tutorials are coming soon.

  Run completions in a loop, define tools with two handy macros, customize
  the agent by hooking in on lifecycle events, and much more.

  Besides documentation, expect a big release for what we build this for
  soon! 🎉
````

-
[3751f49](bosun-ai@3751f49)
*(query)* Add support for single embedding retrieval with PGVector
(bosun-ai#406)

### Miscellaneous

-
[5ce4d21](bosun-ai@5ce4d21)
Clippy and deps fixes for 1.83 (bosun-ai#467)


**Full Changelog**:
bosun-ai/swiftide@0.14.3...0.14.4
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).
shamb0 pushed a commit to shamb0/swiftide that referenced this pull request Dec 17, 2024
* `swiftide`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-agents`: 0.14.3 -> 0.14.4
* `swiftide-core`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-macros`: 0.14.3 -> 0.14.4
* `swiftide-integrations`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-indexing`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-query`: 0.14.3 -> 0.14.4 (✓ API compatible changes)

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

[0.14.4](bosun-ai/swiftide@v0.14.3...v0.14.4)
- 2024-12-11

-
[7211559](bosun-ai@7211559)
*(agents)* **EXPERIMENTAL** Agents in Swiftide (bosun-ai#463)

````text
Agents are coming to Swiftide! We are still ironing out all the kinks,
  while we make it ready for a proper release. You can already experiment
  with agents, see the rustdocs for documentation, and an example in
  `/examples`, and feel free to contact us via github or discord. Better
  documentation, examples, and tutorials are coming soon.

  Run completions in a loop, define tools with two handy macros, customize
  the agent by hooking in on lifecycle events, and much more.

  Besides documentation, expect a big release for what we build this for
  soon! 🎉
````

-
[3751f49](bosun-ai@3751f49)
*(query)* Add support for single embedding retrieval with PGVector
(bosun-ai#406)

-
[5ce4d21](bosun-ai@5ce4d21)
Clippy and deps fixes for 1.83 (bosun-ai#467)

**Full Changelog**:
bosun-ai/swiftide@0.14.3...0.14.4
</blockquote>

</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).
shamb0 pushed a commit to shamb0/swiftide that referenced this pull request Dec 20, 2024
* `swiftide`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-agents`: 0.14.3 -> 0.14.4
* `swiftide-core`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-macros`: 0.14.3 -> 0.14.4
* `swiftide-integrations`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-indexing`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-query`: 0.14.3 -> 0.14.4 (✓ API compatible changes)

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

[0.14.4](bosun-ai/swiftide@v0.14.3...v0.14.4)
- 2024-12-11

-
[7211559](bosun-ai@7211559)
*(agents)* **EXPERIMENTAL** Agents in Swiftide (bosun-ai#463)

````text
Agents are coming to Swiftide! We are still ironing out all the kinks,
  while we make it ready for a proper release. You can already experiment
  with agents, see the rustdocs for documentation, and an example in
  `/examples`, and feel free to contact us via github or discord. Better
  documentation, examples, and tutorials are coming soon.

  Run completions in a loop, define tools with two handy macros, customize
  the agent by hooking in on lifecycle events, and much more.

  Besides documentation, expect a big release for what we build this for
  soon! 🎉
````

-
[3751f49](bosun-ai@3751f49)
*(query)* Add support for single embedding retrieval with PGVector
(bosun-ai#406)

-
[5ce4d21](bosun-ai@5ce4d21)
Clippy and deps fixes for 1.83 (bosun-ai#467)

**Full Changelog**:
bosun-ai/swiftide@0.14.3...0.14.4
</blockquote>

</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).
shamb0 pushed a commit to shamb0/swiftide that referenced this pull request Dec 20, 2024
* `swiftide`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-agents`: 0.14.3 -> 0.14.4
* `swiftide-core`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-macros`: 0.14.3 -> 0.14.4
* `swiftide-integrations`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-indexing`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-query`: 0.14.3 -> 0.14.4 (✓ API compatible changes)

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

[0.14.4](bosun-ai/swiftide@v0.14.3...v0.14.4)
- 2024-12-11

-
[7211559](bosun-ai@7211559)
*(agents)* **EXPERIMENTAL** Agents in Swiftide (bosun-ai#463)

````text
Agents are coming to Swiftide! We are still ironing out all the kinks,
  while we make it ready for a proper release. You can already experiment
  with agents, see the rustdocs for documentation, and an example in
  `/examples`, and feel free to contact us via github or discord. Better
  documentation, examples, and tutorials are coming soon.

  Run completions in a loop, define tools with two handy macros, customize
  the agent by hooking in on lifecycle events, and much more.

  Besides documentation, expect a big release for what we build this for
  soon! 🎉
````

-
[3751f49](bosun-ai@3751f49)
*(query)* Add support for single embedding retrieval with PGVector
(bosun-ai#406)

-
[5ce4d21](bosun-ai@5ce4d21)
Clippy and deps fixes for 1.83 (bosun-ai#467)

**Full Changelog**:
bosun-ai/swiftide@0.14.3...0.14.4
</blockquote>

</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).
shamb0 pushed a commit to shamb0/swiftide that referenced this pull request Dec 20, 2024
* `swiftide`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-agents`: 0.14.3 -> 0.14.4
* `swiftide-core`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-macros`: 0.14.3 -> 0.14.4
* `swiftide-integrations`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-indexing`: 0.14.3 -> 0.14.4 (✓ API compatible changes)
* `swiftide-query`: 0.14.3 -> 0.14.4 (✓ API compatible changes)

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

[0.14.4](bosun-ai/swiftide@v0.14.3...v0.14.4)
- 2024-12-11

-
[7211559](bosun-ai@7211559)
*(agents)* **EXPERIMENTAL** Agents in Swiftide (bosun-ai#463)

````text
Agents are coming to Swiftide! We are still ironing out all the kinks,
  while we make it ready for a proper release. You can already experiment
  with agents, see the rustdocs for documentation, and an example in
  `/examples`, and feel free to contact us via github or discord. Better
  documentation, examples, and tutorials are coming soon.

  Run completions in a loop, define tools with two handy macros, customize
  the agent by hooking in on lifecycle events, and much more.

  Besides documentation, expect a big release for what we build this for
  soon! 🎉
````

-
[3751f49](bosun-ai@3751f49)
*(query)* Add support for single embedding retrieval with PGVector
(bosun-ai#406)

-
[5ce4d21](bosun-ai@5ce4d21)
Clippy and deps fixes for 1.83 (bosun-ai#467)

**Full Changelog**:
bosun-ai/swiftide@0.14.3...0.14.4
</blockquote>

</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants