======= point-of-sales-proof-of-concept

Proof of Concept for processing transaction logs using Hadoop MapReduce, Hive and Luigi.

The 'pos' package contains all the code for processing a point of sales transaction log. Calculating the most popular product category per user, together with the revenue per quarter per user.
The preprocessing.py contains the pipeline used for cleaning the input data.
The workflow.py file contains the actual data processing related to the problem statement.
The setup.py module contains all the package dependencies.
The client.cfg file contains Luigi configuration.

Make sure to have the transaction log files present in the /input/transactions_raw directory on HDFS, together with the 'products' catalog in a MySQL database of which the credentials are configured in settings.py.

The schema for the MySQL product catalog looks like this:

CREATE TABLE products (
  id INT(32) PRIMARY KEY,
  name VARCHAR(255),
  category VARCHAR(255),
  price DOUBLE
);

INSERT INTO products VALUES 
(1, 'Banana', 'Fruit', 1.00),
(2, 'Apple', 'Fruit', 0.85),
(3, 'Raspberry', 'Fruit', 1.65),
(4, 'Chocolate sprinkles', 'Decoration', 1.00),
.
.
(9998, 'Windscreen wipers', 'Car accessories', 33.00),
(9999, 'Rear windscreen wipers', 'Car accessories', 35.00);

The format for the /input/transactions_raw files is as follows (they are assumed to be GZIP compressed):

DateþOrderIDþProductIDþUserIDþAccMngrIDþQuantity
2013-01-01þ10000þ1þ1þ1þ10
2013-01-01þ10000þ2þ1þ1þ5
2013-01-01þ10001þ1þ2þ1þ1
.
.
.
2013-12-31þ99992þ9999þ666þ1þ10

After installing Hive, Hadoop MapReduce (YARN) and Luigi, and adapting the client.cfg, start with:

python workflow.py Main --local-scheduler

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
pos		pos
LICENSE		LICENSE
README.md		README.md
client.cfg		client.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

======= point-of-sales-proof-of-concept

About

Releases

Packages

Languages

License

stefanvanwouw/point-of-sales-proof-of-concept

Folders and files

Latest commit

History

Repository files navigation

======= point-of-sales-proof-of-concept

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages