Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryVectorStore does not support Document id property #6577

Closed
5 tasks done
dschenkelman opened this issue Aug 20, 2024 · 1 comment · Fixed by #6572
Closed
5 tasks done

MemoryVectorStore does not support Document id property #6577

dschenkelman opened this issue Aug 20, 2024 · 1 comment · Fixed by #6572
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@dschenkelman
Copy link
Contributor

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain.js documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain.js rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

The following code:

import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { Document } from "@langchain/core/documents";
import { OpenAIEmbeddings } from "@langchain/openai";
import dotenv from "dotenv"; 

dotenv.config();

const embeddings = new OpenAIEmbeddings();

const docs = [
  new Document({
        metadata: {id: '1'},
        pageContent: 'Document related to test',
        id: '1'
  })
];


const ms =  MemoryVectorStore.fromDocuments(
  docs,
  embeddings
).then(ms => {
    ms.asRetriever().invoke('test').then((docs) => console.log(docs));
});

Prints

[
  Document {
    pageContent: 'Document related to test',
    metadata: { id: '1' },
    id: undefined
  }
]

Note that id is undefined

Error Message and Stack Trace (if applicable)

No response

Description

When storing Documents with id field in MemoryVectorStore and then retrieving, the id is not retrieved.

System Info

All

@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Aug 20, 2024
Copy link

dosubot bot commented Aug 20, 2024

@dschenkelman Hello there! I'm here to assist you with any bugs, questions, or contributions you have regarding this repository.

The id field of a Document becomes undefined when storing and retrieving it using MemoryVectorStore in LangChain.js because the idKey used to store and retrieve the document IDs is set to "doc_id" by default. If the documents do not have a metadata field with this key, the IDs will not be correctly stored or retrieved.

To fix this, ensure that the documents have the correct metadata field for the ID, or set the idKey to match the metadata field used in your documents. For example, if your documents use "id" as the metadata key for IDs, initialize the MultiVectorRetriever with idKey: "id":

const retriever = new MultiVectorRetriever({
  vectorstore: new MemoryVectorStore(),
  idKey: "id",
  childK: 20,
  parentK: 5,
});

Additionally, ensure that the metadata field includes the correct ID key when saving context to maintain the association between document IDs and their content [1][2][3][4].

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
1 participant