Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document choosing cluster options #799

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions docs/cloud.rst
Original file line number Diff line number Diff line change
Expand Up @@ -454,6 +454,29 @@ or shut it down, use the `gateway` object.
cluster.close()


Choosing Cluster Options
^^^^^^^^^^^^^^^^^^^^^^^^

Your workload might constrain the choice of how much memory your workers need.
For example, if some stage of your computation requires loading in 5 arrays of
3GB each, then you'd need *at least* 15GB of memory on your worker nodes.

That said, certain values for the cores / memory per worker will work better for
pangeo's Kubernetes cluster than others.

At the end of the day, pangeo is launching Dask worker *pods* on our Kubernetes cluster.
Each of these worker pods is scheduled on a Kubernetes *node*: a physical machine
with some CPU and memory capacity. Depending on your per-worker CPU and memory requests,
we maybe be able to pack more than one Dask worker *pod* on each *node*, leading
to better cluster utilization (and potentially more total workers for you).

At the moment, our nodes have 4 CPUs and 26124 Mi of memory. So you want to
avoid requesting something like 3CPUs or anywhere from ~13GB-26GB.
If you're performing a large computation and *if your workload allows for it*
make sure to request less than half of the physical machine's memory per worker
(in practice, make it less than 11GB of memory per worker, to allow for some
other kubernetes pods to be scheduled on the node too).

Environment variables on the cluster
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down