Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] - Kmeans added #11

Merged
merged 12 commits into from
Nov 1, 2018
43 changes: 23 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,43 @@
# cuML (v0.1 Alpha)

Machine learning is a fundamental capability of RAPIDS. cuML is a suite of libraries that implements a machine learning algorithms within the RAPIDS data science ecosystem. cuML enables data scientists, researchers, and software engineers to run traditional ML tasks on GPUs without going into the details of CUDA programming.
Machine learning is a fundamental capability of RAPIDS. cuML is a suite of libraries that implements a machine learning algorithms within the RAPIDS data science ecosystem. cuML enables data scientists, researchers, and software engineers to run traditional ML tasks on GPUs without going into the details of CUDA programming.

The cuML repository contains:

1. ***python***: Python based GPU Dataframe (GDF) machine learning package that takes [cuDF](https://github.com/rapidsai/cudf-alpha) dataframes as input. cuML connects the data to C++/CUDA based cuML and ml-prims libraries without ever leaving GPU memory.

2. ***cuML***: C++/CUDA machine learning algorithms. This library currently includes the following five algorithms;
a. Single GPU Truncated Singular Value Decomposition (tSVD),
b. Single GPU Principal Component Analysis (PCA),
c. Single GPU Density-based Spatial Clustering of Applications with Noise (DBSCAN),
d. Single GPU Kalman Filtering,
e. Multi-GPU K-Means Clustering.
2. ***cuML***: C++/CUDA machine learning algorithms. This library currently includes the following six algorithms;
a) Single GPU Truncated Singular Value Decomposition (tSVD),
b) Single GPU Principal Component Analysis (PCA),
c) Single GPU Density-based Spatial Clustering of Applications with Noise (DBSCAN),
d) Single GPU Kalman Filtering,
e) Multi-GPU K-Means Clustering,
f) Multi-GPU K-Nearest Neighbors (Uses [Faiss](https://github.com/facebookresearch/faiss)).

3. ***ml-prims***: Low level machine learning primitives used in cuML. ml-prims is comprised of the following components;
a. Linear Algebra,
b. Statistics,
c. Basic Matrix Operations,
d. Distance Functions,
e. Random Number Generation.
a) Linear Algebra,
b) Statistics,
c) Basic Matrix Operations,
d) Distance Functions,
e) Random Number Generation.

#### Available Algorithms for version 0.1alpha:

- Truncated Singular Value Decomposition (tSVD)
- Truncated Singular Value Decomposition (tSVD).

- Principal Component Analysis (PCA)
- Principal Component Analysis (PCA).

- Density-based spatial clustering of applications with noise (DBSCAN)
- Density-based spatial clustering of applications with noise (DBSCAN).

Upcoming algorithms for version 0.1:
- K-Means Clustering.

- K-Nearest Neighbors (Requires [Faiss](https://github.com/facebookresearch/faiss) installation to use).

- K-Means Clustering
Upcoming algorithms for version 0.1:

- Kalman Filter
- Kalman Filter.

More ML algorithms in cuML and more ML primitives in ml-prims are being added currently. Example notebooks are provided in the python folder to test the functionality and performance of this v0.1 alpha version. Goals for future versions include more algorithms and multi-gpu versions of the algorithms and primitives.
More ML algorithms in cuML and more ML primitives in ml-prims are being added currently. Example notebooks are provided in the python folder to test the functionality and performance of this v0.1 alpha version. Goals for future versions include more algorithms and multi-gpu versions of the algorithms and primitives.

The installation option provided currently consists on building from source. Upcoming versions will add `pip` and `conda` options, along docker containers. They will be available in the coming weeks.

Expand All @@ -48,7 +51,7 @@ To use cuML, it must be cloned and built in an environment that already has the
List of dependencies:

1. zlib
2. cmake (>= 3.8, version 3.11.4 is recommended and there are issues with version 3.12)
2. cmake for gtests (>= 3.8, version 3.11.4 is recommended and there are issues with version 3.12)
3. CUDA (>= 9.0)
4. Cython (>= 0.28)
5. gcc (>=5.4.0)
Expand Down
1 change: 1 addition & 0 deletions cuML/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ add_executable(ml_test
test/pca_test.cu
test/tsvd_test.cu
test/dbscan_test.cu
test/kmeans_test.cu
)

target_link_libraries(ml_test
Expand Down
Loading