Skip to content

Subdocuments while Indexing #1372

Answered by lintool
vrdn-23 asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @vrdn-23 - you'll have to chop up the documents yourself into passages (i.e., create a variant corpus and index that). Number the ids like "doc1#0", "doc2#1", "doc#2"... and then you can use the max passage feature in Pyserini. See https://castorini.github.io/pyserini/2cr/msmarco-v1-doc.html for example invocations, under the "doc segmented" conditions.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by vrdn-23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants