Skip to content

Latest commit

 

History

History
573 lines (375 loc) · 20.5 KB

installation.rst

File metadata and controls

573 lines (375 loc) · 20.5 KB

Installation

Note

For Macs, macOS >= 10.15 is required to install SkyPilot. Apple Silicon-based devices (e.g. Apple M1) must run pip uninstall grpcio; conda install -c conda-forge grpcio=1.43.0 prior to installing SkyPilot.

Install SkyPilot using pip:

.. tab-set::

    .. tab-item:: Latest Release
        :sync: latest-release-tab

        .. code-block:: shell

          # Recommended: use a new conda env to avoid package conflicts.
          # SkyPilot requires 3.7 <= python <= 3.11.
          conda create -y -n sky python=3.10
          conda activate sky

          # Choose your cloud:

          pip install "skypilot[aws]"
          pip install "skypilot[gcp]"
          pip install "skypilot[azure]"
          pip install "skypilot[oci]"
          pip install "skypilot[lambda]"
          pip install "skypilot[runpod]"
          pip install "skypilot[fluidstack]"
          pip install "skypilot[paperspace]"
          pip install "skypilot[cudo]"
          pip install "skypilot[ibm]"
          pip install "skypilot[scp]"
          pip install "skypilot[vsphere]"
          pip install "skypilot[kubernetes]"
          pip install "skypilot[all]"


    .. tab-item:: Nightly
        :sync: nightly-tab

        .. code-block:: shell

          # Recommended: use a new conda env to avoid package conflicts.
          # SkyPilot requires 3.7 <= python <= 3.11.
          conda create -y -n sky python=3.10
          conda activate sky

          # Choose your cloud:

          pip install "skypilot-nightly[aws]"
          pip install "skypilot-nightly[gcp]"
          pip install "skypilot-nightly[azure]"
          pip install "skypilot-nightly[oci]"
          pip install "skypilot-nightly[lambda]"
          pip install "skypilot-nightly[runpod]"
          pip install "skypilot-nightly[fluidstack]"
          pip install "skypilot-nightly[paperspace]"
          pip install "skypilot-nightly[do]"
          pip install "skypilot-nightly[cudo]"
          pip install "skypilot-nightly[ibm]"
          pip install "skypilot-nightly[scp]"
          pip install "skypilot-nightly[vsphere]"
          pip install "skypilot-nightly[kubernetes]"
          pip install "skypilot-nightly[all]"


    .. tab-item:: From Source
        :sync: from-source-tab

        .. code-block:: shell

          # Recommended: use a new conda env to avoid package conflicts.
          # SkyPilot requires 3.7 <= python <= 3.11.
          conda create -y -n sky python=3.10
          conda activate sky

          git clone https://github.com/skypilot-org/skypilot.git
          cd skypilot

          # Choose your cloud:

          pip install -e ".[aws]"
          pip install -e ".[gcp]"
          pip install -e ".[azure]"
          pip install -e ".[oci]"
          pip install -e ".[lambda]"
          pip install -e ".[runpod]"
          pip install -e ".[fluidstack]"
          pip install -e ".[paperspace]"
          pip install -e ".[cudo]"
          pip install -e ".[ibm]"
          pip install -e ".[scp]"
          pip install -e ".[vsphere]"
          pip install -e ".[kubernetes]"
          pip install -e ".[all]"

To use more than one cloud, combine the pip extras:

.. tab-set::

    .. tab-item:: Latest Release
        :sync: latest-release-tab

        .. code-block:: shell

          pip install -U "skypilot[aws,gcp]"

    .. tab-item:: Nightly
        :sync: nightly-tab

        .. code-block:: shell

          pip install -U "skypilot-nightly[aws,gcp]"

    .. tab-item:: From Source
        :sync: from-source-tab

        .. code-block:: shell

          pip install -e ".[aws,gcp]"

Alternatively, we also provide a :ref:`Docker image <docker-image>` as a quick way to try out SkyPilot.

Verifying cloud access

After installation, run sky check to verify that credentials are correctly set up:

sky check

This will produce a summary like:

Checking credentials to enable clouds for SkyPilot.
  AWS: enabled
  GCP: enabled
  Azure: enabled
  OCI: enabled
  Lambda: enabled
  RunPod: enabled
  Paperspace: enabled
  Fluidstack: enabled
  Cudo: enabled
  IBM: enabled
  SCP: enabled
  vSphere: enabled
  Cloudflare (for R2 object store): enabled
  Kubernetes: enabled

SkyPilot will use only the enabled clouds to run tasks. To change this, configure cloud credentials, and run sky check.

If any cloud's credentials or dependencies are missing, sky check will output hints on how to resolve them. You can also refer to the cloud setup section :ref:`below <cloud-account-setup>`.

Tip

If your clouds show enabled --- |:tada:| |:tada:| Congratulations! |:tada:| |:tada:| You can now head over to :ref:`Quickstart <quickstart>` to get started with SkyPilot.

Tip

To check credentials only for specific clouds, pass the clouds as arguments: sky check aws gcp

Cloud account setup

SkyPilot currently supports these cloud providers: AWS, GCP, Azure, OCI, Lambda Cloud, RunPod, Fluidstack, Paperspace, Cudo, IBM, SCP, VMware vSphere and Cloudflare (for R2 object store).

If you already have cloud access set up on your local machine, run sky check to :ref:`verify that SkyPilot can properly access your enabled clouds<verify-cloud-access>`.

Otherwise, configure access to at least one cloud using the following guides.

Amazon Web Services (AWS)

To get the AWS access key required by aws configure, please go to the AWS IAM Management Console and click on the "Access keys" dropdown (detailed instructions here). The Default region name [None]: and Default output format [None]: fields are optional and can be left blank to choose defaults.

# Install boto
pip install boto3

# Configure your AWS credentials
aws configure

To use AWS IAM Identity Center (AWS SSO), see :ref:`here<aws-sso>` for instructions.

Optional: To create a new AWS user with minimal permissions for SkyPilot, see :ref:`AWS User Creation <cloud-permissions-aws>`.

Google Cloud Platform (GCP)

conda install -c conda-forge google-cloud-sdk

gcloud init

# Run this if you don't have a credentials file.
# This will generate ~/.config/gcloud/application_default_credentials.json.
gcloud auth application-default login

Tip

If you are using multiple GCP projects, list all the projects by gcloud projects list and activate one by gcloud config set project <PROJECT_ID> (see GCP docs).

.. dropdown:: Common GCP installation errors

    Here some commonly encountered errors and their fixes:

    * ``RemoveError: 'requests' is a dependency of conda and cannot be removed from conda's operating environment`` when running :code:`conda install -c conda-forge google-cloud-sdk` --- run :code:`conda update --force conda` first and rerun the command.
    * ``Authorization Error (Error 400: invalid_request)`` with the url generated by :code:`gcloud auth login` --- install the latest version of the `Google Cloud SDK <https://cloud.google.com/sdk/docs/install>`_ (e.g., with :code:`conda install -c conda-forge google-cloud-sdk`) on your local machine (which opened the browser) and rerun the command.

Optional: To create and use a long-lived service account on your local machine, see :ref:`here<gcp-service-account>`.

Optional: To create a new GCP user with minimal permissions for SkyPilot, see :ref:`GCP User Creation <cloud-permissions-gcp>`.

Azure

# Login
az login
# Set the subscription to use
az account set -s <subscription_id>

Hint: run az account subscription list to get a list of subscription IDs under your account.

Oracle Cloud Infrastructure (OCI)

To access Oracle Cloud Infrastructure (OCI), setup the credentials by following this guide. After completing the steps in the guide, the ~/.oci folder should contain the following files:

~/.oci/config
~/.oci/oci_api_key.pem

The ~/.oci/config file should contain the following fields:

[DEFAULT]
user=ocid1.user.oc1..aaaaaaaa
fingerprint=aa:bb:cc:dd:ee:ff:gg:hh:ii:jj:kk:ll:mm:nn:oo:pp
tenancy=ocid1.tenancy.oc1..aaaaaaaa
region=us-sanjose-1
# Note that we should avoid using full home path for the key_file configuration, e.g. use ~/.oci instead of /home/username/.oci
key_file=~/.oci/oci_api_key.pem

By default, the provisioned nodes will be in the root compartment. To specify the compartment other than root, create/edit the file ~/.sky/config.yaml, put the compartment's OCID there, as the following:

oci:
  default:
    compartment_ocid: ocid1.compartment.oc1..aaaaaaaa......

Lambda Cloud

Lambda Cloud is a cloud provider offering low-cost GPUs. To configure Lambda Cloud access, go to the API Keys page on your Lambda console to generate a key and then add it to ~/.lambda_cloud/lambda_keys:

mkdir -p ~/.lambda_cloud
echo "api_key = <your_api_key_here>" > ~/.lambda_cloud/lambda_keys

Paperspace

Paperspace is a cloud provider that provides access to GPU accelerated VMs. To configure Paperspace access, go to follow these instructions to generate an API key. Add the API key with:

mkdir -p ~/.paperspace
echo "{'api_key' : <your_api_key_here>}" > ~/.paperspace/config.json

RunPod

RunPod is a specialized AI cloud provider that offers low-cost GPUs. To configure RunPod access, go to the Settings page on your RunPod console and generate an API key. Then, run:

pip install "runpod>=1.5.1"
runpod config

Fluidstack

Fluidstack is a cloud provider offering low-cost GPUs. To configure Fluidstack access, go to the Home page on your Fluidstack console to generate an API key and then add the API key to ~/.fluidstack/api_key :

mkdir -p ~/.fluidstack
echo "your_api_key_here" > ~/.fluidstack/api_key

Cudo Compute

Cudo Compute provides low cost GPUs powered by green energy.

  1. Create a billing account.

  2. Create a project.

  3. Create an API Key.

  4. Download and install the cudoctl command line tool

  5. Run cudoctl init:

    cudoctl init
      ✔ api key: my-api-key
      ✔ project: my-project
      ✔ billing account: my-billing-account
      ✔ context: default
      config file saved ~/.config/cudo/cudo.yml
    
    pip install "cudo-compute>=0.1.10"

If you want to want to use SkyPilot with a different Cudo Compute account or project, run cudoctl init again.

IBM

To access IBM's VPC service, store the following fields in ~/.ibm/credentials.yaml:

iam_api_key: <user_personal_api_key>
resource_group_id: <resource_group_user_is_a_member_of>

Note

Stock images aren't currently providing ML tools out of the box. Create private images with the necessary tools (e.g. CUDA), by following the IBM segment in this documentation.

To access IBM's Cloud Object Storage (COS), append the following fields to the credentials file:

access_key_id: <access_key_id>
secret_access_key: <secret_key_id>

To get access_key_id and secret_access_key use the IBM web console:

  1. Create/Select a COS instance from the web console.
  2. From "Service Credentials" tab, click "New Credential" and toggle "Include HMAC Credential".
  3. Copy "secret_access_key" and "access_key_id" to file.

Finally, install rclone via: curl https://rclone.org/install.sh | sudo bash

Note

sky check does not reflect IBM COS's enabled status. IBM: enabled only guarantees that IBM VM instances are enabled.

Samsung Cloud Platform (SCP)

Samsung Cloud Platform(SCP) provides cloud services optimized for enterprise customers. You can learn more about SCP here.

To configure SCP access, you need access keys and the ID of the project your tasks will run. Go to the Access Key Management page on your SCP console to generate the access keys, and the Project Overview page for the project ID. Then, add them to ~/.scp/scp_credential by running:

# Create directory if required
mkdir -p ~/.scp
# Add the lines for "access_key", "secret_key", and "project_id" to scp_credential file
echo "access_key = <your_access_key>" >> ~/.scp/scp_credential
echo "secret_key = <your_secret_key>" >> ~/.scp/scp_credential
echo "project_id = <your_project_id>" >> ~/.scp/scp_credential

Note

Multi-node clusters are currently not supported on SCP.

VMware vSphere

To configure VMware vSphere access, store the vSphere credentials in ~/.vsphere/credential.yaml:

mkdir -p ~/.vsphere
touch ~/.vsphere/credential.yaml

Here is an example of configuration within the credential file:

vcenters:
  - name: <your_vsphere_server_ip_01>
    username: <your_vsphere_user_name>
    password: <your_vsphere_user_passwd>
    skip_verification: true # If your vcenter have valid certificate then change to 'false' here
    # Clusters that can be used by SkyPilot:
    #   [] means all the clusters in the vSphere can be used by Skypilot
    # Instead, you can specify the clusters in a list:
    # clusters:
    #   - name: <your_vsphere_cluster_name1>
    #   - name: <your_vsphere_cluster_name2>
    clusters: []
  # If you are configuring only one vSphere instance, omit the following line.
  - name: <your_vsphere_server_ip_02>
    username: <your_vsphere_user_name>
    password: <your_vsphere_user_passwd>
    skip_verification: true
    clusters: []

After configuring the vSphere credentials, ensure that the necessary preparations for vSphere are completed. Please refer to this guide for more information: :ref:`Cloud Preparation for vSphere <cloud-prepare-vsphere>`

Cloudflare R2

Cloudflare offers R2, an S3-compatible object storage without any egress charges. SkyPilot can download/upload data to R2 buckets and mount them as local filesystem on clusters launched by SkyPilot. To set up R2 support, run:

# Install boto
pip install boto3
# Configure your R2 credentials
AWS_SHARED_CREDENTIALS_FILE=~/.cloudflare/r2.credentials aws configure --profile r2

In the prompt, enter your R2 Access Key ID and Secret Access Key (see instructions to generate R2 credentials). Select auto for the default region and json for the default output format.

AWS Access Key ID [None]: <access_key_id>
AWS Secret Access Key [None]: <access_key_secret>
Default region name [None]: auto
Default output format [None]: json

Next, get your Account ID from your R2 dashboard and store it in ~/.cloudflare/accountid with:

mkdir -p ~/.cloudflare
echo <YOUR_ACCOUNT_ID_HERE> > ~/.cloudflare/accountid

Note

Support for R2 is in beta. Please report and issues on Github or reach out to us on Slack.

Kubernetes

SkyPilot can also run tasks on on-prem or cloud hosted Kubernetes clusters (e.g., EKS, GKE). The only requirement is a valid kubeconfig at ~/.kube/config.

# Place your kubeconfig at ~/.kube/config
mkdir -p ~/.kube
cp /path/to/kubeconfig ~/.kube/config

See :ref:`SkyPilot on Kubernetes <kubernetes-overview>` for more.

Requesting quotas for first time users

If your cloud account has not been used to launch instances before, the respective quotas are likely set to zero or a low limit. This is especially true for GPU instances.

Please follow :ref:`Requesting Quota Increase <quota>` to check quotas and request quota increases before proceeding.

Quick alternative: trying in Docker

As a quick alternative to installing SkyPilot on your laptop, we also provide a Docker image with SkyPilot main branch automatically cloned. You can simply run:

# NOTE: '--platform linux/amd64' is needed for Apple silicon Macs
docker run --platform linux/amd64 \
  -td --rm --name sky \
  -v "$HOME/.sky:/root/.sky:rw" \
  -v "$HOME/.aws:/root/.aws:rw" \
  -v "$HOME/.config/gcloud:/root/.config/gcloud:rw" \
  berkeleyskypilot/skypilot

docker exec -it sky /bin/bash

If your cloud CLIs are already setup, your credentials (AWS and GCP) will be mounted to the container and you can proceed to :ref:`Quickstart <quickstart>`. Otherwise, you can follow the instructions in :ref:`Cloud account setup <cloud-account-setup>` inside the container to set up your cloud accounts.

Once you are done with experimenting with SkyPilot, remember to delete any clusters and storage resources you may have created using the following commands:

# Run inside the container:
sky down -a -y
sky storage delete -a -y

Finally, you can stop the container with:

docker stop sky

See more details about the dev container image berkeleyskypilot/skypilot-nightly here.

Enabling shell completion

SkyPilot supports shell completion for Bash (Version 4.4 and up), Zsh and Fish. This is only available for click versions 8.0 and up (use pip install click==8.0.4 to install).

To enable shell completion after installing SkyPilot, you will need to modify your shell configuration. SkyPilot automates this process using the --install-shell-completion option, which you should call using the appropriate shell name or auto:

sky --install-shell-completion auto
# sky --install-shell-completion zsh
# sky --install-shell-completion bash
# sky --install-shell-completion fish

Shell completion may perform poorly on certain shells and machines. If you experience any issues after installation, you can use the --uninstall-shell-completion option to uninstall it, which you should similarly call using the appropriate shell name or auto:

sky --uninstall-shell-completion auto
# sky --uninstall-shell-completion zsh
# sky --uninstall-shell-completion bash
# sky --uninstall-shell-completion fish