PIM Lucene

Apache Lucene is a high-performance, full-featured text search engine library written in Java.

PIM-lucene is a project to create an extension of Lucene to offload specific queries to UPMEM’s PIM (Processing In Memory) hardware.

UPMEM is a French company proposing a PIM product which can accelerate data-intensive applications. The PIM hardware is a DIMM module in which each memory chip embed small processors with fast access to the memory bank. More information about UPMEM is available on the company website and in the UPMEM's SDK documentation.

Our goal is to create a non-intrusive extension of the Lucene code base, providing an option to use PIM for specific queries (or part of queries) without impacting Lucene's performance or functionality. When using the PIM extension, the standard Lucene index is created but a new index specific to PIM is also created and stored in the PIM system. A PimIndexWriter object is the new interface for writing the Lucene index augmented with the PIM index.

The first query being ported to PIM is the phrase query. A PimPhraseQuery object can be used in place of a PhraseQuery object in order to use PIM to execute the query. When using a PimPhraseQuery, the system may or may not execute the query using PIM (e.g., depending on the PIM system availability, the PIM load vs CPU load).

Project Status

This project is currently under development. The implementation of the PimPhraseQuery is functional and the current performance (QPS) when compared to standard Lucene is reported in the benchmarks' section. The next step is to improve the score's lower bound computation to reduce the work imbalance between the PIM cores.

Building

Basic steps:

Install OpenJDK 17 or 18.
Clone PIM Lucene's git repository.
Run git submodule update --init.
Make sure cunit is installed on your system (sudo apt install libcunit1-dev).
Run gradle launcher script (gradlew).

We'll assume that you know how to get and set up the JDK - if you don't, then we suggest starting at https://jdk.java.net/ and learning more about Java, before returning to this README.

Benchmarks

Benchmarking Setup

The machine used has the following characteristics:

The dataset is the english wikipedia dataset, and the set of queries consist in 1036 phrase queries extracted from the luceneutil repository. The setup and details of the benchmarks are found here. Both standard Lucene and PIM-Lucene are run on the same server.

Results

The speedup in throughput (QPS) for various number of search threads and top docs is as follows:

Name		Name	Last commit message	Last commit date
Latest commit History 36,923 Commits
.github		.github
buildSrc		buildSrc
dev-docs		dev-docs
dev-tools		dev-tools
gradle		gradle
help		help
lucene		lucene
pictures		pictures
.asf.yaml		.asf.yaml
.dir-locals.el		.dir-locals.el
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.hgignore		.hgignore
.lift.toml		.lift.toml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
NOTICE.txt		NOTICE.txt
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle
versions.lock		versions.lock
versions.props		versions.props

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PIM Lucene

Project Status

Building

Basic steps:

Benchmarks

Benchmarking Setup

Results

About

Releases

Packages

Languages

License

upmem/pim-lucene

Folders and files

Latest commit

History

Repository files navigation

PIM Lucene

Project Status

Building

Basic steps:

Benchmarks

Benchmarking Setup

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages