Skip to content
This repository has been archived by the owner on Jan 5, 2021. It is now read-only.

vmware-archive/retail-demo-xd

Repository files navigation

VMware has ended active development of this project, this repository will no longer be updated.


xd-demo with Pivotal HD Retail data ===================================

Contributors

Demo User Story
We want to ingest real time orders from our POS system directly to HDFS via a pipe delimited HTTP post. A sample post looks like:

Customer ID, Order ID, Order Amount, Store ID
curl -d "{\"orderid\":\"123\",\"storeid\":\"456\",\"customerid\":\"789\",\"orderamount\":\"5000.01\"}" http://localhost:8000 - Good Post
curl -d "{\"orderid\":\"BAD_DATA\",\"storeid\":\"456\",\"customerid\":\"789\",\"orderamount\":\"5000.01\"}" http://localhost:8000 - Bad Post
123|456|789|5000.01 - Dream State in HDFS with HAWQ and in-memory Query

We are going to re-use some integration work that was done in the past and we need to transform and filter the POS data before ingesting into hadoop. The HTTP stream will accept JSON formatted key/value pairs of Order data. Some orders have bad data. We need to filter these records before persisting them to HDFS. After landing the data into hadoop, we would like to run SQL analytics on the orders to see if they match known fraudulent orders from the past. Hive is not an option because it does not provide fast enough response time and full ANSI compliance. We want to run a logistic regression model on all
orders to feed our real-time fraud detection applications that aim to catch criminals before they leave the store. The logistic regression model needs to be re-trained periodically via a scheduled process. The in-memory fraud data store needs to be flushed on a configurable interval and HDFS files need to be archived via a scheduled process.

In order to get this running with Pivotal HD

  1. Start Pivotal HD instance. It is optional to run the "pivotal-samples" data labs to populate the retail_demo DB with HAWQ tables/data. The "pivotal-samples" github project is located at:

    https://github.com/PivotalHD/pivotal-samples

  2. Download and install the latest Spring XD binary. The project is located at:

    http://projects.spring.io/spring-xd/

  3. <<<<<<< HEAD

  4. Update your spring-xd hadoop config ($SPRING_XD/conf/hadoop.properties) to reflect your hdfs address: =======
  5. Update your spring-xd hadoop config ($SPRING_XD/xd/config/hadoop.properties) to reflect webhdfs: >>>>>>> 4985ef63c23b7c2723e426e91d14f685bebacd48

    fs.default.name=hdfs://my-hadoop:8020

  6. Open config.py and add entries for each property. This is very important to ensure connectivity to Pivotal HD and SQLFire.
  7. In a terminal window run(will scp python demo scripts to pivotal hd and sqlfire VMs. Will copy spring xd scripts, lib jars, modules and sink config:
    ./install.py
  8. Run 3 Spring XD runtimes in terminal windows(redis, admin, container)
    sudo sysctl -w net.inet.tcp.msl=1000
    $SPRING_XD/redis/bin/redis-server
    $SPRING_XD/xd/bin/xd-admin --hadoopDistro phd1
    $SPRING_XD/xd/bin/xd-container --hadoopDistro phd1
  9. Run Spring XD Shell in a terminal window


    $SPRING_XD/shell/bin/spring-xd-shell --hadoopDistro phd1

  10. In Spring XD Shell - Create Hadoop ingest, Pivotal HD analytics tap and SQLFire sink. script --file ../../xd/cmd/create-all.cmds
  11. [PIVOTALHD TERMINAL] Open an ssh session to your Pivotal VM and run this script. You must do this before starting the data stream.
    ./demo.py setup_hdfs
  12. In a terminal window, run send_data.py to start a data stream simulation.
    ./send_data.py
  13. [SQLFIRE TERMINAL] Verify that SQLFire is getting only a small subset of orders
    ./demo.py query
  14. In Spring XD Shell - Re-run batch jobs(should delete SQLFire data, populate HAWQ tables, and re-run analytic training model)
    script --file ../../xd/cmd/deploy-batch.cmds
  15. In Spring XD Shell - Reset the richgauge taps to 0)
    script --file ../../xd/cmd/reset-taps.cmds
  16. [PIVOTALHD_TERMINAL] Run a PXF and HAWQ Query
    ./demo.py query_hawq
  17. Install DB Visualizer and run queries through a JDBC client GUI. http://www.dbvis.com/. You will need to add a new "Cache" Driver JAR for SQLFire. You will need to modify '/data/1/hawq_master/gpseg-1/pg_hba.conf' in your Pivotal HD VM to remote connect.
  18. [PIVOTALHD TERMINAL] Restart Pivotal HD via the stop/start scripts.
    /home/gpadmin/stop_all.sh;
    /home/gpadmin/start_all.sh;
  19. In Spring XD Shell - Remove all streams/taps from Spring XD. Does not delete any data) script --file ../../xd/cmd/destroy-all.cmds

xd-demo-client

  1. Update app.properties (src/main/webapps/WEB-INF/classes) to reflect the IP addresses of your sqlfire environment
  2. Open a terminal and build the war via maven
    mvn install
  3. Copy the WAR file to a working tc Server or Tomcat server
  4. The application will be available at: http://localhost:8080/xd-demo-client/resources/index.html