Skip to content

Sentimentron/DBSCAN-Cluster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

7249c63 · Jul 19, 2014

History

60 Commits
May 11, 2014
Jul 19, 2014
May 11, 2014
May 11, 2014
Jan 29, 2014
Jul 19, 2014
Apr 12, 2014
Apr 14, 2014
Jan 27, 2014
Jan 27, 2014
Apr 13, 2014
Apr 13, 2014
Jul 19, 2014
Jan 29, 2014
Jan 22, 2014
May 11, 2014
May 11, 2014
Apr 12, 2014
Jul 19, 2014
Apr 14, 2014
Jan 26, 2014
Jan 26, 2014
May 11, 2014
Jan 22, 2014
Jan 22, 2014
Jan 22, 2014
Jan 22, 2014
Jan 22, 2014
Jan 22, 2014
Jan 22, 2014
Jan 25, 2014
Jan 25, 2014
Apr 12, 2014
Dec 30, 2013
Dec 30, 2013
Dec 29, 2013
Jan 23, 2014
Dec 30, 2013
Dec 30, 2013
Dec 30, 2013
Jan 25, 2014
Jan 25, 2014
Jan 25, 2014
Jan 27, 2014
Jan 25, 2014
Jan 25, 2014

Repository files navigation

DBSCAN-Cluster

DBSCAN implementation in C. Uses a quadtree datastructure to handle very large, sparse, binary feature spaces. Implements Jaccard distance as the default distance metric (neighbours.c).

Building and running the tests

  1. Use cmake . to generate the build files.
  2. Run make to build the library, applications and Python SWIG wrapper (if applicable)
  3. Run make test to run the tests.

Using the Python wrapper

See test.py for an example.

Create the quadtree

  import pydbscan
  tree = pydbscan.create_quadtree(8, 8)
  • Its recommended to have height and width as the same power of two.

Insert points into the quadtree

  pydbscan.quadtree_insert(tree, 0, 1) # Sets a 1 for x = 0, y = 1
  • This function returns 0 if the insert did not succeed (e.g. the point was out of range or was already set).
  • For most typical applications, the document is the first argument, the label is the second.
  • It's recommended to count documents from zero.

Cluster

  pydbscan.pyDBSCAN(tree, 6, 0.67, 2)
  • The first argument is the number of documents input into the quadtree.
  • The second is the epsilon value (points are considered part of another's neighbourhood if their distance is less than 1 - epsilon).
  • The third argument is the minimum number of points needed to make a cluster.

Using the C API

  • quadtree_init allocates and initialises the quadtree (arguments must be one less than a powers of two).
  • quadtree_insert adds a document-label pair in the quadtree, and returns a non-zero value if the insert succeeded.
  • DBSCAN takes a quadtree as a reference, an array of unsigned integers of length of the size of the documents, the number of documents, the epsilon value,the minimum points, and a pointer to a neighbourhood distance function ** neighbours_search implements neighborhood filtering via the Jaccard index.

About

DBSCAN implementation in C

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published