CLOUDY automates the execution of experiments on Google Cloud. It creates VM instances and buckets, installs dependencies, runs Python scripts, and handles resource cleanup.
The workflow of CLOUDY comprises the following steps:
-
The script
launch.sh
prepares a VM instance, according to the options specified in theconfig.json
file. -
The script
setup.sh
is executed in the VM to install dependencies and run the Python script indicated. -
The output is saved to an existing bucket, or a new one is created as required.
-
The instance is automatically deleted once its execution has finished.
This project consists of the following scripts:
launch.sh
: creates a VM instance on Google Cloud according to the configuration defined inconfig.json
. It also downloads and copies your repository to the VM instance.setup.sh
: runs on the VM instance. Installs dependencies, runs your Python script, and saves the results to a Google Cloud bucket, creating it if necessary.clean.sh
: cleans up all VM instances and buckets on Google Cloud.Makefile
: enables the execution of the scripts through simple commands.
-
Prerequisites
First, create a service account on GCP with the required permissions for Compute Engine and Cloud Storage (e.g., storage administrator, compute instances administrator).
Then, install the following dependencies:
-
Google Cloud SDK
: required to interact with Google Cloud from the command line. -
jq
: used to read the JSON configuration file.
-
-
Edit
config.json
Define your custom configuration in the
config.json
file, located in the root directory of the project. For example:{ "INSTANCE_NAME": "vm", "BUCKET_NAME": "bucket", "REPO_URL": "https://github.com/manjavacas/cloudy.git", "SCRIPT_PATH": "foo/foo.py", "SCRIPT_ARGS": "cloudy", "DEPENDENCIES": "numpy pandas", "SERVICE_ACCOUNT": "[email protected]", "SETUP_SCRIPT": "setup.sh", "MACHINE_TYPE": "n2-standard-2", "ZONE": "europe-southwest1-b", "IMAGE_FAMILY": "ubuntu-2004-lts", "IMAGE_PROJECT": "ubuntu-os-cloud", "BUCKET_ZONE": "eu" }
The main options to edit are:
INSTANCE_NAME
andBUCKET_NAME
: identifiers for the created instance and bucket.REPO_URL
: the repository to clone. This is where the code you want to execute is located.SCRIPT_PATH
andSCRIPT_ARGS
: path to the Python script you want to execute in the repository, along with its input arguments.DEPENDENCIES
: dependencies required to run the Python script.SERVICE_ACCOUNT
: GCP service account to be used. It must have the necessary permissions.
-
Run CLOUDY
a. Using
Makefile
- To launch a VM instance, run:
$ make launch
- To clean up all VM instances and buckets, run:
$ make clean
- To delete VM instances and buckets and then relaunch, run:
$ make reset
b. Using
cloudy.py
Alternatively, you can use the Python script
cloud.py
for the same operations:$ python cloudy.py launch $ python cloudy.py clean $ python cloudy.py reset