Final project
Course 10324, Parallel and Distributed Computation
2017 FALL Semester
This is the documentation of the final project assignment as part of “Parallel and Distributed Computation (10324)” course.
The assignment was to implement and parallelize the K-Means algorithm.
My solution is based on the “Simplified K-Means algorithm” given to us in the assignment file.
Given a set of points in 2-dimensional space.
Initial position (xi, yi) and velocity (vxi, vyi) are known for each point Pi. Its position at the given time t can be calculated as follows:
xi(t) = xi + t*vxi
yi(t) = yi + t*vyi
Implement simplified K-Means algorithm to find K clusters. Find a first occurrence during given time interval [0, T] when a system of K clusters has a Quality Measure q that is less than given value QM.
- Choose first K points as a cluster centers.
- Group points around the given cluster centers - for each point define a center that is most close to the point.
- Recalculate the cluster centers (average of all points in the cluster)
- Check the termination condition – no points move to other clusters or maximum iteration LIMIT was made.
- Repeat from 2 till the termination condition fulfills.
- Evaluate the Quality of the clusters found. The Quality is equal to an average of diameters of the cluster divided by distance to other clusters. For example, in case of k = 3 the quality is equal:
q = (d1/D12 + d1/D13 + d2/D21 + d2/D23 + d3/D31 + d3/D32) / 6
where di is a diameter of cluster i and Dij is a distance between centers of cluster i and cluster j.
You will be supplied with the following data
- N - number of points.
- K - number of clusters to find.
- LIMIT – the maximum number of iterations for K-MEAN algorithm.
- QM – quality measure to stop.
- T – defines the end of time interval [0, T].
- dT – defines moments t = n*dT, n = {0, 1, 2, …, T/dT} for which calculate the clusters and the quality.
- Coordinates and Velocities of all points.
The first line of the file contains N K T dT LIMIT QM.
Next lines are Initial Positions and Velocities of the points
For example:
5000 4 30 0.1 2000 7.3
2.3 4. 5 6. 55 -2.3
76.2 -3.56 50.0 12
…
45.23 20 -167.1 98
The output file contains information on the found clusters with the moment when the Quality Measure QM is reached for first time.
For example:
First occurrence at t = 24.5 with q = 6.9
Centers of the clusters:
1.123 34
-5.3 17.01
33.56 -23
14.1 98
The solution was parallelized with MPI, OpenMP and CUDA.
Parallelization Steps:
(P – Number of Processes in MPI)
(N – Number of Points)
- First, each process gets his equal share of points.
Process 0 gets to deal with the remaining (N % P) points. - Each process uses his cuda device to define each point’s cluster based on their distances.
- Each process uses OpenMP technology to add each point to its corresponding cluster.
- Process 0 gets the gathered information about the points and clusters back from all P processes.
- Process 0 send all the gathered points to his cuda device to update their current location in time.
- Process 0 uses OpenMP technology to calculate both the diameters of the clusters and the quality of clusters found.
-
At first, when I solved the problem I tried to do as many actions as possible in the process’s cuda device, alas I found out that any calculation that holds the GPU’s threads for too long are not possible.
-
So, I decided that the cuda device will deal with simple calculations rather than dealing with a large number of iterations for each thread.
A simple example might be assigning a point its current location, this action is simple, each thread needs to make a simple multiplication and its done. -
Large calculations that can take a bit of time were made with OpenMP.
The general case’s complexity for my solution is:
O (N * K * I * (T/dT))
N – Number of points.
K – Number of clusters to find.
I – Number of iterations in each delta time (DT) iteration.
T/dT – Number of time iterations.
-
MPI – MPICH2 Installed.
Notes: Make sure you include the MPI “lib” and “include” folders as part of your project properties. -
OpenMP – Enable Open MP Support to your project.
-
CUDA Installed.
Notes: It is very recommended to open an empty CUDA project and to copy the files of my project into it, I’ve used CUDA 9.1 (should work just fine on other versions as well).