diff --git a/README.md b/README.md index eec8b45531661f2ff9e190bfe7af4380349534ef..0f3a40f489e1a81ddf4b53c0b6576f4ec10acf80 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,21 @@ -# doc2query-data +# doc2query Data + +The repo contains data for the doc2query family of document expansion models. +The basic idea is to train a model, that when given an input document, generates questions that the document might answer (or more broadly, queries for which the document might be relevant). +These predicted questions (or queries) are then appended to the original documents, which are then indexed as before. + +## docTTTTTquery (with T5) + +Models and data from [`https://github.com/castorini/docTTTTTquery/`](https://github.com/castorini/docTTTTTquery/): + +File | Size | MD5 | Download +:----|-----:|:----|:----- +`doc_query_pairs.train.tsv` | 197 MB | `aa673014f93d43837ca4525b9a33422c` | [[Download](https://git.uwaterloo.ca/jimmylin/doc2query-data/raw/master/T5-passage/doc_query_pairs.train.tsv)] +`queries.dev.small.tsv` | 283 KB | `41e980d881317a4a323129d482e9f5e5` | [[Download](https://git.uwaterloo.ca/jimmylin/doc2query-data/raw/master/T5-passage/queries.dev.small.tsv)] +`qrels.dev.small.tsv` | 140 KB| `38a80559a561707ac2ec0f150ecd1e8a` | [[Download](https://git.uwaterloo.ca/jimmylin/doc2query-data/raw/master/T5-passage/qrels.dev.small.tsv)] +`collection.tar.gz` | 987 MB | `87dd01826da3e2ad45447ba5af577628` | [[Download](https://git.uwaterloo.ca/jimmylin/doc2query-data/raw/master/T5-passage/collection.tar.gz)] +`predicted_queries_topk_sampling.zip` | 7.9 GB | `8bb33ac317e76385d5047322db9b9c34` | [[Download](https://git.uwaterloo.ca/jimmylin/doc2query-data/raw/master/T5-passage/predicted_queries_topk_sampling.zip)] +`run.dev.small.tsv` | 133 MB | `d6c09a6606a5ed9f1a300c258e1930b2` | [[Download](https://git.uwaterloo.ca/jimmylin/doc2query-data/raw/master/T5-passage/run.dev.small.tsv)] +`t5-base.zip` | 357 MB | `881d3ca87c307b3eac05fae855c79014` | [[Download](https://git.uwaterloo.ca/jimmylin/doc2query-data/raw/master/T5-passage/t5-base.zip)] +`t5-large.zip` | 1.2 GB | `21c7e625210b0ae872679bc36ed92d44` | [[Download](https://git.uwaterloo.ca/jimmylin/doc2query-data/raw/master/T5-passage/t5-large.zip)]