Skip to content

Commit

Permalink
Improvements to OSS builds and the Release Process (#1627)
Browse files Browse the repository at this point in the history
Summary:
- Add support for CUDA 11.8 in the OSS builds
- Annotate the package wheels with Python 3.9 and 3.10 support tags
- Update `setup.py` to auto-derive the package version from the git information (namely tags) to allow us for fast tag-and-release
- Check for the actual presence of NVIDIA drivers in OSS builds, and error out with friendly message instead of cryptic `RuntimeError: No such operator fbgemm::jagged_2d_to_dense` errors when `fbgemm_gpu` is installed and loaded on a system with a GPU but without GPU drivers installed

Pull Request resolved: #1627

Reviewed By: brad-mengchi, shintaro-iwasaki

Differential Revision: D43868995

Pulled By: q10

fbshipit-source-id: 7843622c8415df847fc1c25775f084875e1324b6
  • Loading branch information
q10 authored and facebook-github-bot committed Mar 8, 2023
1 parent 936ec59 commit fd4ac90
Show file tree
Hide file tree
Showing 10 changed files with 105 additions and 75 deletions.
47 changes: 31 additions & 16 deletions .github/scripts/setup_env.bash
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ print_exec () {
echo "+ $*"
echo ""
"$@"
echo ""
}

exec_with_retries () {
Expand Down Expand Up @@ -238,6 +239,30 @@ free_disk_space () {
# Info Functions
################################################################################

print_gpu_info () {
echo "################################################################################"
echo "[INFO] Check GPU info ..."
install_system_packages lshw
print_exec sudo lshw -C display

echo "################################################################################"
echo "[INFO] Check NVIDIA GPU info ..."

if [[ "${ENFORCE_NVIDIA_GPU}" ]]; then
# Ensure that nvidia-smi is available and returns GPU entries
if ! nvidia-smi; then
echo "[CHECK] NVIDIA driver is required, but does not appear to have been installed. This will cause FBGEMM_GPU installation to fail!"
return 1
fi

else
if which nvidia-smi; then
# If nvidia-smi is installed on a machine without GPUs, this will return error
(print_exec nvidia-smi) || true
fi
fi
}

print_system_info () {
echo "################################################################################"
echo "# Print System Info"
Expand All @@ -264,17 +289,6 @@ print_system_info () {
print_exec uname -a
print_exec cat /proc/version
print_exec cat /etc/os-release

echo "################################################################################"
echo "[INFO] Check GPU info ..."
install_system_packages lshw
print_exec sudo lshw -C display

if which nvidia-smi; then
echo "################################################################################"
echo "[INFO] Check NVIDIA GPU info ..."
print_exec nvidia-smi
fi
}

print_ec2_info () {
Expand Down Expand Up @@ -335,7 +349,7 @@ setup_miniconda () {
print_exec . ~/.bashrc

echo "[SETUP] Updating Miniconda base packages ..."
print_exec conda update -n base -c defaults -y conda
(exec_with_retries conda update -n base -c defaults -y conda) || return 1

# Print Conda info
print_exec conda info
Expand Down Expand Up @@ -369,12 +383,12 @@ create_conda_environment () {
(exec_with_retries conda create -y --name "${env_name}" python="${python_version}") || return 1

echo "[SETUP] Upgrading PIP to latest ..."
print_exec conda run -n "${env_name}" pip install --upgrade pip
(exec_with_retries conda run -n "${env_name}" pip install --upgrade pip) || return 1

# The pyOpenSSL and cryptography packages versions need to line up for PyPI publishing to work
# https://stackoverflow.com/questions/74981558/error-updating-python3-pip-attributeerror-module-lib-has-no-attribute-openss
echo "[SETUP] Upgrading pyOpenSSL ..."
print_exec conda run -n "${env_name}" python -m pip install "pyOpenSSL>22.1.0"
(exec_with_retries conda run -n "${env_name}" python -m pip install "pyOpenSSL>22.1.0") || return 1

# This test fails with load errors if the pyOpenSSL and cryptography package versions don't align
echo "[SETUP] Testing pyOpenSSL import ..."
Expand Down Expand Up @@ -886,7 +900,7 @@ prepare_fbgemm_gpu_build () {
git submodule update --init --recursive

echo "[BUILD] Installing other build dependencies ..."
print_exec conda run -n "${env_name}" python -m pip install -r requirements.txt
(exec_with_retries conda run -n "${env_name}" python -m pip install -r requirements.txt) || return 1

(test_python_import "${env_name}" numpy) || return 1
(test_python_import "${env_name}" skbuild) || return 1
Expand Down Expand Up @@ -1095,7 +1109,7 @@ install_fbgemm_gpu_package () {
print_exec sha1sum "${package_name}"

echo "[INSTALL] Installing FBGEMM-GPU wheel: ${package_name} ..."
conda run -n "${env_name}" python -m pip install "${package_name}"
(exec_with_retries conda run -n "${env_name}" python -m pip install "${package_name}") || return 1

echo "[INSTALL] Checking imports ..."
(test_python_import "${env_name}" fbgemm_gpu) || return 1
Expand Down Expand Up @@ -1217,4 +1231,5 @@ publish_to_pypi () {
"${package_name}"

echo "[PUBLISH] Successfully published package(s) to PyPI: ${package_name}"
echo "[PUBLISH] NOTE: The publish command is a successful no-op if the wheel version already existed in PyPI; please double check!"
}
6 changes: 6 additions & 0 deletions .github/workflows/fbgemm_gpu_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ jobs:
- name: Display System Info
run: . $PRELUDE; print_system_info

- name: Display GPU Info
run: . $PRELUDE; print_gpu_info

- name: Free Disk Space
run: . $PRELUDE; free_disk_space

Expand Down Expand Up @@ -150,6 +153,9 @@ jobs:
- name: Display System Info
run: . $PRELUDE; print_system_info

- name: Display GPU Info
run: . $PRELUDE; print_gpu_info

- name: Setup Miniconda
run: |
. $PRELUDE; setup_miniconda $HOME/miniconda
Expand Down
18 changes: 12 additions & 6 deletions .github/workflows/fbgemm_nightly_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ jobs:
matrix:
os: [ linux.12xlarge ]
python-version: [ "3.8", "3.9", "3.10" ]
cuda-version: [ "11.7.1" ]
cuda-version: [ "11.7.1", "11.8.0" ]

steps:
- name: Checkout the Repository
Expand All @@ -57,6 +57,9 @@ jobs:
- name: Display System Info
run: . $PRELUDE; print_system_info

- name: Display GPU Info
run: . $PRELUDE; print_gpu_info

- name: Setup Miniconda
run: |
. $PRELUDE; setup_miniconda $HOME/miniconda
Expand Down Expand Up @@ -103,12 +106,15 @@ jobs:
env:
PRELUDE: .github/scripts/setup_env.bash
BUILD_ENV: build_binary
ENFORCE_NVIDIA_GPU: 1
strategy:
fail-fast: false
matrix:
os: [ linux.g5.4xlarge.nvidia.gpu ]
python-version: [ "3.8", "3.9", "3.10" ]
cuda-version: [ "11.7.1" ]
cuda-version: [ "11.7.1", "11.8.0" ]
# Specify exactly ONE CUDA version for artifact publish
cuda-version-publish: [ "11.7.1" ]
needs: build_artifact

steps:
Expand All @@ -118,10 +124,10 @@ jobs:
submodules: true

- name: Display System Info
run: . $PRELUDE; print_system_info
run: . $PRELUDE; print_system_info; print_ec2_info

- name: Display EC2 Info
run: . $PRELUDE; print_ec2_info
- name: Display GPU Info
run: . $PRELUDE; print_gpu_info

- name: Setup Miniconda
run: |
Expand Down Expand Up @@ -157,7 +163,7 @@ jobs:
run: . $PRELUDE; cd fbgemm_gpu/test; run_fbgemm_gpu_tests $BUILD_ENV

- name: Push FBGEMM_GPU Nightly Binary to PYPI
if: ${{ github.event_name != 'pull_request' && github.event_name != 'push' }}
if: ${{ github.event_name != 'pull_request' && github.event_name != 'push' && matrix.cuda-version == matrix.cuda-version-publish }}
env:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
run: . $PRELUDE; publish_to_pypi $BUILD_ENV fbgemm_gpu_nightly-*.whl "$PYPI_TOKEN"
9 changes: 6 additions & 3 deletions .github/workflows/fbgemm_nightly_build_cpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,9 @@ jobs:
- name: Display System Info
run: . $PRELUDE; print_system_info

- name: Display GPU Info
run: . $PRELUDE; print_gpu_info

- name: Setup Miniconda
run: |
. $PRELUDE; setup_miniconda $HOME/miniconda
Expand Down Expand Up @@ -110,10 +113,10 @@ jobs:
submodules: true

- name: Display System Info
run: . $PRELUDE; print_system_info
run: . $PRELUDE; print_system_info; print_ec2_info

- name: Display EC2 Info
run: . $PRELUDE; print_ec2_info
- name: Display GPU Info
run: . $PRELUDE; print_gpu_info

- name: Setup Miniconda
run: |
Expand Down
18 changes: 12 additions & 6 deletions .github/workflows/fbgemm_release_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ jobs:
matrix:
os: [ linux.12xlarge ]
python-version: [ "3.8", "3.9", "3.10" ]
cuda-version: [ "11.7.1" ]
cuda-version: [ "11.7.1", "11.8.0" ]

steps:
- name: Checkout the Repository
Expand All @@ -49,6 +49,9 @@ jobs:
- name: Display System Info
run: . $PRELUDE; print_system_info

- name: Display GPU Info
run: . $PRELUDE; print_gpu_info

- name: Setup Miniconda
run: |
. $PRELUDE; setup_miniconda $HOME/miniconda
Expand Down Expand Up @@ -95,12 +98,15 @@ jobs:
env:
PRELUDE: .github/scripts/setup_env.bash
BUILD_ENV: build_binary
ENFORCE_NVIDIA_GPU: 1
strategy:
fail-fast: false
matrix:
os: [ linux.g5.4xlarge.nvidia.gpu ]
python-version: [ "3.8", "3.9", "3.10" ]
cuda-version: [ "11.7.1" ]
cuda-version: [ "11.7.1", "11.8.0" ]
# Specify exactly ONE CUDA version for artifact publish
cuda-version-publish: [ "11.7.1" ]
needs: build_artifact
steps:
- name: Checkout the Repository
Expand All @@ -109,10 +115,10 @@ jobs:
submodules: true

- name: Display System Info
run: . $PRELUDE; print_system_info
run: . $PRELUDE; print_system_info; print_ec2_info

- name: Display EC2 Info
run: . $PRELUDE; print_ec2_info
- name: Display GPU Info
run: . $PRELUDE; print_gpu_info

- name: Setup Miniconda
run: |
Expand Down Expand Up @@ -148,7 +154,7 @@ jobs:
run: . $PRELUDE; cd fbgemm_gpu/test; run_fbgemm_gpu_tests $BUILD_ENV

- name: Push FBGEMM_GPU Binary to PYPI
if: ${{ github.event_name != 'pull_request' && github.event_name != 'push' }}
if: ${{ github.event_name != 'pull_request' && github.event_name != 'push' && matrix.cuda-version == matrix.cuda-version-publish }}
env:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
run: . $PRELUDE; publish_to_pypi $BUILD_ENV fbgemm_gpu-*.whl "$PYPI_TOKEN"
9 changes: 6 additions & 3 deletions .github/workflows/fbgemm_release_build_cpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,9 @@ jobs:
- name: Display System Info
run: . $PRELUDE; print_system_info

- name: Display GPU Info
run: . $PRELUDE; print_gpu_info

- name: Setup Miniconda
run: |
. $PRELUDE; setup_miniconda $HOME/miniconda
Expand Down Expand Up @@ -102,10 +105,10 @@ jobs:
submodules: true

- name: Display System Info
run: . $PRELUDE; print_system_info
run: . $PRELUDE; print_system_info; print_ec2_info

- name: Display EC2 Info
run: . $PRELUDE; print_ec2_info
- name: Display GPU Info
run: . $PRELUDE; print_gpu_info

- name: Setup Miniconda
run: |
Expand Down
3 changes: 2 additions & 1 deletion fbgemm_gpu/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,8 @@ if(NOT FBGEMM_CPU_ONLY)
src/merge_pooled_embeddings_gpu.cpp
src/topology_utils.cpp)
else()
message(STATUS "Could not find NVML_LIB_PATH; will NOT include certain sources into the build!")
message(STATUS
"Could not find NVML_LIB_PATH; skipping certain sources into the build")
endif()
endif()

Expand Down
1 change: 1 addition & 0 deletions fbgemm_gpu/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ jinja2
ninja
numpy
scikit-build
setuptools_git_versioning
52 changes: 29 additions & 23 deletions fbgemm_gpu/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,29 +7,41 @@
import argparse
import os
import random
import re
import subprocess
import sys

from datetime import date
from typing import List, Optional

import setuptools_git_versioning as gitversion
import torch
from skbuild import setup


def get_version():
# get version string from version.py
# TODO: ideally the version.py should be generated when setup is run
version_file = os.path.join(os.path.dirname(__file__), "version.py")
version_regex = r"__version__ = ['\"]([^'\"]*)['\"]"
with open(version_file, "r") as f:
version = re.search(version_regex, f.read(), re.M).group(1)
return version
def generate_package_version(package_name: str):
print("[SETUP.PY] Generating the package version ...")

if "nightly" in package_name:
# Use date stamp for nightly versions
print("[SETUP.PY] Package is for NIGHTLY; using timestamp for the versioning")
today = date.today()
version = f"{today.year}.{today.month}.{today.day}"

elif "test" in package_name:
# Use date stamp for nightly versions
print("[SETUP.PY] Package is for TEST: using random number for the versioning")
version = (f"0.0.{random.randint(0, 1000)}",)

else:
# Use git tag / branch / commit info to generate a PEP-440-compliant version string
print("[SETUP.PY] Package is for RELEASE: using git info for the versioning")
print(
f"[SETUP.PY] TAG: {gitversion.get_tag()}, BRANCH: {gitversion.get_branch()}, SHA: {gitversion.get_sha()}"
)
version = gitversion.version_from_git()

def get_nightly_version():
today = date.today()
return f"{today.year}.{today.month}.{today.day}"
print(f"[SETUP.PY] Setting the package version: {version}")
return version


def get_cxx11_abi():
Expand Down Expand Up @@ -170,23 +182,15 @@ def main(argv: List[str]) -> None:
if args.nvml_lib_path:
cmake_args.append(f"-DNVML_LIB_PATH={args.nvml_lib_path}")

name = args.package_name
print("name: ", name)
is_nightly = "nightly" in name
is_test = "test" in name

version = get_nightly_version() if is_nightly else get_version()
if is_test:
version = (f"0.0.{random.randint(0, 1000)}",)
print(f"-- {name} building version: {version}")
package_version = generate_package_version(args.package_name)

# Repair command line args for setup.
sys.argv = [sys.argv[0]] + unknown

setup(
# Metadata
name=name,
version=version,
name=args.package_name,
version=package_version,
author="FBGEMM Team",
author_email="[email protected]",
long_description=long_description,
Expand All @@ -210,6 +214,8 @@ def main(argv: List[str]) -> None:
"License :: OSI Approved :: BSD License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
],
)
Expand Down
Loading

0 comments on commit fd4ac90

Please sign in to comment.