DBSCAN

My multi-threaded, GPU-enabled implementation of Density-Based Spatial Clustering of Applications with Noise.

The GPU implementation has achieved >20x speedup against de facto SKLearn; multi-threaded CPU implementation is 40% faster than SKLearn with one thread, and scalable with multiple threads.

Requirements

C++17; GCC 7.5; CUDA 10.2; Thrust; pthread.

How to build

For the first time: git submodule update --init --recursive

Build main

cmake -DCMAKE_BUILD_TYPE=None -Bbuild -H.
- For cpu-main
  - set environment variable AVX=1 to enable AVX;
  - set environment variable BIT_ADJ=1 if on average, each vertex has more than |V|/64 number of neighbours.
- For gpu-main
  - Modify gpu/CMakeLists.txt, change the architecture code to fit your hardware.
cmake --build build --target cpu-main/gpu-main

Build tests

cmake -DCMAKE_BUILD_TYPE=Debug -Bbuild -H.
- For cpu-test set AVX=1 and BIT_ADJ=1 correspondingly.
- For gpu-test modify gpu/CMakeLists.txt correspondingly.
cmake --build build --target cpu-test/gpu-test

How to run

Existing test data under test_inputs/.

[optional] Generate an example using the Python script: python3 generate_dateset.py --n-samples=20000 --cluster-std=3.0
- 20,000 points with 3.0 stddiv;
[optional] Cluster the input and visualize it using Sklearn: python3 dbscan.py --input-name=test_input.txt --eps=0.1 --min-pts=12
- 0.1 radius and 12 neighbour points for clustering.

CPU algorithm

./build/bin/cpu-main --input=<path_to_input> --eps=<eps> --min-pts=<P>.
- Append --print to see the cluster ids.
- Append --num-threads=K to speed up the processing.

GPU algorithm

./build/bin/gpu-main --input=<path_to_input> --eps=<eps> --min-pts=<P>.
- Append --print to see the cluster ids.

Test data

The query parameters are selected to emulate 10% Noises.

test_input_20k.txt is generated using --cluster-std=2.2 --n-samples=20000, with 4 clusters. Should use --eps=0.15 --min-pts=180 to query.

test_input_50k.txt is generated using --cluster-std=2.2 --n-samples=50000, with 20 clusters. Should use --eps=0.09 --min-pts=110 to query.

test_input_100k.txt is generated using --cluster-std=2.2 --n-samples=100000, with 20 clusters. Should use --eps=0.07 --min-pts=100 to query.

test_input_200k.txt is generated using --cluster-std=2.2 --n-samples=200000, with 20 clusters. Should use --eps=0.05 --min-pts=100 to query.

test_input_300k.txt is generated using --cluster-std=2.2 --n-samples=300000, with 20 clusters. Should use --eps=0.044 --min-pts=100 to query.

test_input_400k.txt is generated using --cluster-std=2.2 --n-samples=400000, with 20 clusters. Should use --eps=0.04 --min-pts=105 to query.

test_input_500k.txt is generated using --cluster-std=2.2 --n-samples=500000, with 20 clusters. Should use --eps=0.035 --min-pts=115 to query.

test_input_600k.txt is generated using --cluster-std=2.2 --n-samples=600000, with 20 clusters. Should use --eps=0.035 --min-pts=120 to query.

test_input_700k.txt is generated using --cluster-std=2.2 --n-samples=600000, with 20 clusters. Should use --eps=0.033 --min-pts=122 to query.

test_input_800k.txt is generated using --cluster-std=2.2 --n-samples=800000, with 20 clusters. Should use --eps=0.03 --min-pts=120 to query.

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
cpu		cpu
gpu		gpu
gtest		gtest
shared/DBSCAN		shared/DBSCAN
test_inputs		test_inputs
third_party		third_party
.clang-format		.clang-format
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
CPU_scalability.png		CPU_scalability.png
CPUvsSKL.png		CPUvsSKL.png
GPUvsSKL.png		GPUvsSKL.png
LICENSE		LICENSE
README.md		README.md
dbscan.py		dbscan.py
generate_dateset.py		generate_dateset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DBSCAN

Requirements

How to build

Build main

Build tests

How to run

CPU algorithm

GPU algorithm

Test data

About

Releases

Packages

Languages

License

superliuxz/DBSCAN

Folders and files

Latest commit

History

Repository files navigation

DBSCAN

Requirements

How to build

Build main

Build tests

How to run

CPU algorithm

GPU algorithm

Test data

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages