Skip to content

adelaneh/py_stringsimjoin

 
 

Repository files navigation

py_stringsimjoin

This project seeks to build a Python software package that provides scalable implementation of string similarity joins over two tables, for commonly used similarity measures such as Jaccard, Dice, cosine, overlap, overlap coefficient and edit distance. The package is free, open-source, and BSD-licensed.

Important links

Dependencies

py_stringsimjoin has been tested on Python 2.7, Python 3.3, Python 3.4 and Python 3.5.

The required dependencies to build the package are pandas 0.16.0 or higher, py_stringmatching 0.2.1 or higher, joblib, pyprind and six.

Platforms

py_stringsimjoin has been tested on Linux, OS X and Windows.

About

Scalable String Similarity Joins in Python

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 75.9%
  • Jupyter Notebook 24.1%