Studying and exercising material to deploy data science and machine learning model on top of GCP. All the contents here are the exercise material taken from Data Science on the Google Cloud Platform book by Valliappa Lakshmanan (2017). Some of the codes are modified to be able to be ran on local machine.
For the complete repository, please refer to: https://github.com/GoogleCloudPlatform/data-science-on-gcp
The exercise includes:
-
01_ingest
- Ingesting data from external, here we're using BTS data.
- Deploy the ingestion app using flask.
- Containerize it into a docker container.
-
02_streaming
- Create streaming transformation using Apache Beam on local file.
- Moving and running the local transformation to Google Dataflow.
- Simulate the streaming data, and publish it using Google Pub/Sub.
-
03_pyspark
- Create a simple bayes model using Apache Spark.
- Shell command editor to convert notebook to .py file using nbconvert.
- Can be ran on Google Dataproc, please check the source's github.
-
04_sparkml
- Create a logistic regression model using MLLib from Pyspark.
- Evaluate the model manually.
-
05_mlopstf
- Program for training the model and save it into local drive.
- Use the model to make a prediction on an input data.
- All done using TensorFlow library.
- Both of the programs are containerized already in a Docker.