data #43

wangxinzhe123 · 2022-03-28T14:56:29Z

Because I want to run this code with other data sets, how can I get .run and .pair files similar to those in /data?

seanmacavaney · 2022-03-28T18:32:34Z

Hi- the format description of these files are given here: https://github.com/Georgetown-IR-Lab/cedr#getting-started

In short, training pairs are sampled from lines like [query-id] [doc-id] and run files are the standard TREC run format: [query-id] 0 [doc-id] [rank] [score] [runtag]. The latter can be the output of various retrieval systems, and the former can just be sampled from run files (depending on what you want to train with).

wangxinzhe123 · 2022-03-29T03:21:15Z

Does the .run and .pair files need to be built manually or automatically by running some program?

cmacdonald · 2022-03-29T10:50:15Z

There is also an integration plugin for CEDR using PyTerrier - see
https://github.com/terrierteam/pyterrier_bert#cedr-usage
(though its a little more dated compared to other PyTerrier plugins now)

seanmacavaney · 2022-03-29T16:53:19Z

@wangxinzhe123 -- ultimately how you construct these files depends on your experimental setup. The main questions are:

What results do you want CEDR to re-rank?
What data do you want CEDR to sample as training data?

wangxinzhe123 · 2022-03-31T05:39:41Z

Excuse me, can you provide the index file containing the indexbuildindex parameter?

seanmacavaney · 2022-03-31T07:46:57Z

That again depends on what experiment you're running -- especially since you mention that you're running it with different datasets.

Since you brought up Indri, here's documentation on it: https://sourceforge.net/p/lemur/wiki/IndriBuildIndex%20Parameters/

I'm not very familiar with Indri, however. I'm happy to help out using PyTerrier though -- especially if you provide some details on what you're trying to do. Here's the documentation on indexing: https://pyterrier.readthedocs.io/en/latest/terrier-indexing.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data #43

data #43

wangxinzhe123 commented Mar 28, 2022

seanmacavaney commented Mar 28, 2022

wangxinzhe123 commented Mar 29, 2022

cmacdonald commented Mar 29, 2022

seanmacavaney commented Mar 29, 2022

wangxinzhe123 commented Mar 31, 2022

seanmacavaney commented Mar 31, 2022

data #43

data #43

Comments

wangxinzhe123 commented Mar 28, 2022

seanmacavaney commented Mar 28, 2022

wangxinzhe123 commented Mar 29, 2022

cmacdonald commented Mar 29, 2022

seanmacavaney commented Mar 29, 2022

wangxinzhe123 commented Mar 31, 2022

seanmacavaney commented Mar 31, 2022