SKU-Clustering

In this notebook I will attempt to cluster eCommerce item data by their names. The data is from an outdoor apparel brand's catalog. I want to use the item names to find similar items and group them together. For example, if it's a t-shirt it should belong in the t-shirt group.

The steps to accomplish this goal will be:

Cleaning the data to just include the name (pandas)
Transform the corpus into vector space using tf-idf (Sci Kit)
Calculating cosine distance between each document as a measure of similarity (Sci Kit)
Hierarchical Clustering and Dendrogram (Scipy)
Cluster the documents with k-means (Sci Kit)
Use MDS to reduce the dimension
Plot the clusters (matplotlib)

The dataset consists of 500 actual SKUs from an outdoor apparel brand's product catalog downloaded from Kaggle (https://www.kaggle.com/cclark/product-item-data).

I used http://brandonrose.org/clustering as a reference for this project. He has a lot of interesting projects with great explanations in his blog.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md
SKU_Clustering.ipynb		SKU_Clustering.ipynb
sample-data.csv		sample-data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SKU-Clustering

About

Releases

Packages

Languages

moyphilip/SKU-Clustering

Folders and files

Latest commit

History

Repository files navigation

SKU-Clustering

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages