-
Is there a way to handle or split documents into subdocuments or passages using Pyserini? I can see that during each we have an option to return the passage with max score. Could someone point me to any documentation or examples for how we deal with subdocuments or passages for a really long document? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @vrdn-23 - you'll have to chop up the documents yourself into passages (i.e., create a variant corpus and index that). Number the ids like "doc1#0", "doc2#1", "doc#2"... and then you can use the max passage feature in Pyserini. See https://castorini.github.io/pyserini/2cr/msmarco-v1-doc.html for example invocations, under the "doc segmented" conditions. |
Beta Was this translation helpful? Give feedback.
Hi @vrdn-23 - you'll have to chop up the documents yourself into passages (i.e., create a variant corpus and index that). Number the ids like "doc1#0", "doc2#1", "doc#2"... and then you can use the max passage feature in Pyserini. See https://castorini.github.io/pyserini/2cr/msmarco-v1-doc.html for example invocations, under the "doc segmented" conditions.