Commit 388bbac6 authored by Jimmy Lin's avatar Jimmy Lin

MS MARCO passage index

parent 11a9efd7
This index was generated on 11/17/2019 at the following commit point:
commit 0ed488a7cc117737df7f5beaa80c16d43691e145 (HEAD -> master, origin/master, origin/HEAD)
Author: Gin <bazinga931212@gmail.com>
Date: Sun Nov 17 19:08:26 2019 -0500
With the following command:
sh target/appassembler/bin/IndexCollection -collection JsonCollection -input /tuna1/collections/msmarco/passage/ \
-index index-msmarco-passage-20191117-0ed488 -generator LuceneDocumentGenerator -threads 9 -storeRawDocs
Note that this index was designed to be used in Colab, so we've tried to keep the size as small as possible.
Specifically, positions are *not* indexed (so no phrase queries) and document vectors are *not* indexed (so no query expansion).
However, the original collection *is* stored, so the passages can be fetched and fed to further downstream reranking components.
index-msmarco-passage-20191117-0ed488.tar.gz MD5 checksum = 3c2ef64ee6d0ee8e317adcb341b92e28
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment