GitHub - lintool/Zambezi: Real-time indexer and search engine

Query File

A query file must follow the following format:

<first_line> .=. <Number of queries:integer>
<line> .=. <query id: integer> <query length: integer> <query: text>

Building an Inverted Index

The input to the index must be a set of (gzipped or raw) text files, where each line contains one document in the following format:

<document_id: integer> \t <document: text>.

Please note that you must perform necessary preprocessing (e.g., parsing, stopping, stemming) prior to using the indexer, as the index is only able to read parsed documents and does not perform any sort of stopping or stemming.

To run the indexer:

out/driver/indexer -index <output-index-root-path> [-positional | -tf]
-mb <maximum-buffer-length-in-number-of-blocks> -input <input-paths>

Note that -input must be the last argument, and that <input-paths> is a list of files.

You can create a contiguous index as follows:

out/driver/buildContiguous -input <input-index-root-path> -output <output-index-root-path>

Retrieval

To do retrieval:

out/driver/retrieval -index <index-root-path> -query <query-path> -algorithm <SvS|WAND>
[-hits <hits>] [-output <output-path>]

If -output is included, the output is stored at <output-path>.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
sample		sample
src		src
.gitignore		.gitignore
README.md		README.md
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Query File

Building an Inverted Index

Retrieval

About

Releases

Packages

Languages

lintool/Zambezi

Folders and files

Latest commit

History

Repository files navigation

Query File

Building an Inverted Index

Retrieval

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages