Baby Ray is a minimal yet fully-functioning (fault-tolerance included!) implementation of Ray in Go.
Currently a work in progress. By the end of this project, you should be able to simulate a full Ray cluster using Docker Compose and launch CPU jobs using Python, using exactly the same API exposed by the real Ray Core library.
To run unit tests, either push or just run act locally.
To run integration tests:
make all
Then,
./scripts/run_tests.sh
This script will spin up a Baby Ray cluster and have the first worker node run a suite of Pytest tests (tests/integration/test_cluster.py
).
make all
docker-compose up
Separate terminal:
./log_into_driver.sh
You can automatically deploy the entire cluster (workers, GCS, global scheduler nodes) by following these instructions.
First, build everything:
make all
This will generate the Go and Python gRPC stub files, compile the Go server code, and build the Docker images we need to deploy the cluster.
Next, spin up the cluster:
docker-compose up
Now, your cluster should be running! Let's test out if a node can talk to another node. Note that these commands will now be run-specific, since Docker containers are ID'd randomly. First, list the container ID's of our cluster with docker ps
. We'll get something like:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c20387bbd534 ray-node "/bin/sh -c './go/bi…" 3 seconds ago Up 2 seconds 0.0.0.0:50004->50004/tcp babyray_worker3_1
dce69f377b01 ray-node "/bin/sh -c ./go/bin…" 3 seconds ago Up 2 seconds 0.0.0.0:50001->50001/tcp babyray_global_scheduler_1
e06583f90acd ray-node "/bin/sh -c './go/bi…" 3 seconds ago Up 2 seconds 0.0.0.0:50002->50002/tcp babyray_worker1_1
1d2559b0bb90 ray-node "/bin/sh -c './go/bi…" 3 seconds ago Up 2 seconds 0.0.0.0:50003->50003/tcp babyray_worker2_1
c3781cf7bbce ray-node "/bin/sh -c './go/bi…" 3 seconds ago Up 2 seconds 0.0.0.0:50000->50000/tcp babyray_gcs_1
Now, pick one of those container ID's---here we pick worker 1---and run an interactive shell on it:
docker exec -it e06583f90acd /bin/bash
Then, run the command ping node0
and you'll get something like:
docker exec -it e06583f90acd /bin/bash
root@e06583f90acd:/app# ping node0
PING node0 (172.24.0.2) 56(84) bytes of data.
64 bytes from babyray_gcs_1.babyray_mynetwork (172.24.0.2): icmp_seq=1 ttl=64 time=3.11 ms
64 bytes from babyray_gcs_1.babyray_mynetwork (172.24.0.2): icmp_seq=2 ttl=64 time=0.499 ms
64 bytes from babyray_gcs_1.babyray_mynetwork (172.24.0.2): icmp_seq=3 ttl=64 time=0.470 ms
64 bytes from babyray_gcs_1.babyray_mynetwork (172.24.0.2): icmp_seq=4 ttl=64 time=0.348 ms
64 bytes from babyray_gcs_1.babyray_mynetwork (172.24.0.2): icmp_seq=5 ttl=64 time=0.224 ms
64 bytes from babyray_gcs_1.babyray_mynetwork (172.24.0.2): icmp_seq=6 ttl=64 time=0.268 ms
64 bytes from babyray_gcs_1.babyray_mynetwork (172.24.0.2): icmp_seq=7 ttl=64 time=0.173 ms
Packets are flowing between worker 1 and the GCS!
Reset by doing ^C in the docker-compose up
session and then running docker-compose down
to make sure all the containers are killed and the network is deleted.
To run a single Ray node, just do ./run.sh
. This will put you into a shell, and then you can run something like:
./go/bin/worker
and this will just launch the "worker" Go service. You should see something like:
root@fc8df146bfc1:/app# ./go/bin/worker
2024/04/27 00:55:59 server listening at [::]:50002
Remember to do ^D to exit the Docker container interactive session.
Install everything you need first.
brew install go
brew install protobuf
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
echo 'export PATH="$PATH:$(go env GOPATH)/bin"' >> ~/.zshrc
source ~/.zshrc
pip install grpcio-tools
Then, run:
make all
This will compile all the Go server files and generate the Python/Go gRPC stub code.
Then, run
./go/bin/driver
This will start up the driver server.
Finally, run:
python3 python/babyray/client.py
This will run the Python client that will talk to the driver server.
See here in this file for how to load DNS name + port number for a particular service in Go (in this case, GCS function table).
Note: that code is untested, so it may not work, but should give you something to work off of.
Set up some global naming standards so that we can all follow along.
We assign each node a non-negative integer:
- 0: GCS
- 1: global scheduler
- 2 and above: worker nodes
We take advantage of DNS to avoid having to address nodes by their IP addresses. Specifically, name each container a separate name, "nodeX" where "X" is the node's integer ID. For example, "node0" for GCS.
Standard Docker allows us to do this: create a network with docker network create mynetwork
and then any time you do docker run
you add --network mynetwork --network-alias myname
(replace myname
with the DNS name). Also works in Docker Compose. So for now, just assume these names are there and Go's gRPC library will handle DNS resolution + the actual communication under the hood.
We also decide on port numbers for services. For local worker nodes:
- 50000: local object store
- 50001: local scheduler
- 50002: 0th worker
- 50003: 1st worker
- 50004: 2nd worker
- ...
For GCS:
- 50000: function table service
- 50001: object table service
For the global scheduler: port 50000.
All of this information is kept in this config file, so pull from it in your code.
Go to the python/
directory and do python3 -m pip install -e .
(or just pip install -e .
, depending on whether you like the Python executable that your pip is attached to).
Then, go to some arbitrary place on your computer, and you can run this script with Python.
import babyray
babyray.init() # initialize in the same way as the real Ray does
@babyray.remote
def f(x):
return x * x
futures = [f.remote(i) for i in range(4)]
print(babyray.get(futures)) # [0, 1, 4, 9]
It doesn't work right now because it hits an error when it tries to query the local scheduler (which doesn't exist), but this will start working once we implement all the Go stuff.
file structure is kinda like:
babyray/
│
├── docs/ # Documentation files
│ ├── setup.md
│ └── usage.md
│
├── python/ # Python client package
│ ├── babyray/ # Source code for the babyray package
│ │ ├── __init__.py
│ │ ├── client.py # gRPC client implementation
│ │ └── utils.py # Helper functions and utilities
│ │
│ ├── tests/ # Unit tests for Python code
│ │ ├── __init__.py
│ │ ├── test_client.py
│ │ └── test_utils.py
│ │
│ ├── setup.py # Setup script for the babyray package
│ └── requirements.txt # Python dependencies
│
├── go/ # Go server-side implementation
│ ├── cmd/ # Main applications for the project
│ │ ├── server/ # Server application entry point
│ │ │ └── main.go
│ │ └── client/ # Example Go client, if applicable
│ │ └── main.go
│ │
│ ├── pkg/ # Library code that's ok to use by external applications
│ │ ├── objectstore/ # Implementation of the object store
│ │ │ ├── server.go # Server-side implementations
│ │ │ └── client.go # Client-side implementations (if needed)
│ │ └── core/ # Core shared libraries
│ │
│ ├── internal/ # Private application and library code
│ │ └── config/ # Configuration related functionality
│ │
│ └── go.mod # Go module definitions
│ └── go.sum # Go module checksums
│
├── proto/ # Protocol Buffers definitions
│ ├── ray.proto # gRPC service definitions
│ └── Makefile # Automate proto generation for Python and Go
│
├── scripts/ # Scripts for various build, run, or deployment tasks
│ ├── deploy.sh
│ └── setup_env.sh
│
├── .gitignore # Specifies intentionally untracked files to ignore
├── LICENSE # License file
└── README.md # Project overview and setup instructions
thanks to chatgpt for this. https://chat.openai.com/share/65e914c1-a030-45c7-9019-e7647d9707c2
If you'd like to run go test
individually, you should do the following:
cd babyray
pwd
Then, on MacOS:
nano ~/.zshrc
add the following: export PROJECT_ROOT=/path/to/your/project
source ~/.zshrc
This ensures that your Go code has access to the app_config.yaml
file.