Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding one document to an existing index #1443

Closed
Hazoom opened this issue Feb 8, 2023 · 5 comments
Closed

Adding one document to an existing index #1443

Hazoom opened this issue Feb 8, 2023 · 5 comments

Comments

@Hazoom
Copy link

Hazoom commented Feb 8, 2023

Hi,

I have an index with many document that I built, and I want to add one document to it.
The naive approach will be of course to run the index command on all my documents together with the next index, but the issue is that my system runs in production environment and I cannot allow it because it takes too much time.

Could you please assist?

@duongkstn
Copy link

same issue. any advice ?

@e-tornike
Copy link

e-tornike commented Feb 18, 2023

I tried the following, but it seems to overwrite the index at each run:

>>>from pyserini.index.lucene import LuceneIndexer, IndexReader
>>>indexer = LuceneIndexer("index")
>>>indexer.add("{'id': '0', 'contents': 'Document 0'}")
>>>indexer.close()
>>>reader = IndexReader("index")
>>>print(reader.stats())
{'total_terms': 2, 'documents': 1, 'non_empty_documents': 1, 'unique_terms': 2}

>>>indexer = LuceneIndexer("index")
>>>indexer.add("{'id': '1', 'contents': 'Document 1'}")
>>>indexer.close()
>>>reader = IndexReader("index")
>>>print(reader.stats())
{'total_terms': 2, 'documents': 1, 'non_empty_documents': 1, 'unique_terms': 2}

>>>indexer = LuceneIndexer("index")
>>>indexer.add("{'id': '0', 'contents': 'Document 0'}")
>>>indexer.add("{'id': '1', 'contents': 'Document 1'}")
>>>indexer.close()
>>>reader = IndexReader("index")
>>>print(reader.stats())
{'total_terms': 4, 'documents': 2, 'non_empty_documents': 2, 'unique_terms': 3}

Based on: #1344

@lintool
Copy link
Member

lintool commented Feb 18, 2023

Hi all, thanks for all your interest! If this is a feature that many people want... then we'll be happy to prioritize!

lintool added a commit to castorini/anserini that referenced this issue Feb 23, 2023
@lintool
Copy link
Member

lintool commented Feb 23, 2023

Hi @Hazoom @duongkstn @e-tornike I think this is what you're looking for? #1451

@e-tornike
Copy link

Awesome! Thanks a lot for the quick implementation!

lintool added a commit that referenced this issue Feb 23, 2023
@lintool lintool closed this as completed Feb 23, 2023
thongnt99 pushed a commit to thongnt99/anserini-lsr that referenced this issue Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants