[T150228288] Update the FBGEMM and FBGEMM_GPU READMEs

- Clean up the FBGEMM_GPU READMEs to consolidate all FBGEMM_GPU build instructions into `docs/BuildInstructions.md` - Fix the build badges for FBGEMM and FBGEMM_GPU - Add Slack contact information to the READMEs
pytorch · Apr 11, 2023 · 3fafdb2 · 3fafdb2
1 parent d853fe4
commit 3fafdb2
Show file tree

Hide file tree

Showing 3 changed files with 89 additions and 193 deletions.
diff --git a/README.md b/README.md
@@ -1,7 +1,6 @@
 # FBGEMM
 
-[![FBGEMMCI](https://github.com/pytorch/FBGEMM/actions/workflows/fbgemmci.yml/badge.svg)](https://github.com/pytorch/FBGEMM/actions/workflows/fbgemmci.yml)
-[![Nightly Build](https://github.com/pytorch/FBGEMM/actions/workflows/fbgemm_nightly_build.yml/badge.svg)](https://github.com/pytorch/FBGEMM/actions/workflows/fbgemm_nightly_build.yml)
+[![FBGEMM CI](https://github.com/pytorch/FBGEMM/actions/workflows/fbgemm_ci.yml/badge.svg)](https://github.com/pytorch/FBGEMM/actions/workflows/fbgemm_ci.yml)
 
 FBGEMM (Facebook GEneral Matrix Multiplication) is a low-precision,
 high-performance matrix-matrix multiplications and convolution library for
@@ -113,7 +112,12 @@ recommend citing our
 ```
 
 ## Join the FBGEMM community
-See the [`CONTRIBUTING`](CONTRIBUTING.md) file for how to help out.
+For questions or feature requests, please file a ticket over on
+[GitHub Issues](https://github.com/pytorch/FBGEMM/issues) or reach out to us on
+the `#fbgemm` channel in [PyTorch Slack](https://bit.ly/ptslack).
+
+For contributions, please see the [`CONTRIBUTING`](../CONTRIBUTING.md) file for
+ways to help out.
 
 ## License
 FBGEMM is BSD licensed, as found in the [`LICENSE`](LICENSE) file.

diff --git a/fbgemm_gpu/README.md b/fbgemm_gpu/README.md
@@ -1,39 +1,29 @@
 # FBGEMM_GPU
 
-[![FBGEMMCI](https://github.com/pytorch/FBGEMM/actions/workflows/fbgemmci.yml/badge.svg)](https://github.com/pytorch/FBGEMM/actions/workflows/fbgemmci.yml)
-[![Nightly Build](https://github.com/pytorch/FBGEMM/actions/workflows/fbgemm_nightly_build.yml/badge.svg)](https://github.com/pytorch/FBGEMM/actions/workflows/fbgemm_nightly_build.yml)
-[![Nightly Build CPU](https://github.com/pytorch/FBGEMM/actions/workflows/fbgemm_nightly_build_cpu.yml/badge.svg)](https://github.com/pytorch/FBGEMM/actions/workflows/fbgemm_nightly_build_cpu.yml)
+[![FBGEMM_GPU CI](https://github.com/pytorch/FBGEMM/actions/workflows/fbgemm_gpu_ci.yml/badge.svg)](https://github.com/pytorch/FBGEMM/actions/workflows/fbgemm_gpu_ci.yml)
+[![FBGEMM_GPU-CPU Nightly Build](https://github.com/pytorch/FBGEMM/actions/workflows/fbgemm_gpu_cpu_nightly.yml/badge.svg)](https://github.com/pytorch/FBGEMM/actions/workflows/fbgemm_gpu_cpu_nightly.yml)
+[![FBGEMM_GPU-CUDA Nightly Build](https://github.com/pytorch/FBGEMM/actions/workflows/fbgemm_gpu_cuda_nightly.yml/badge.svg)](https://github.com/pytorch/FBGEMM/actions/workflows/fbgemm_gpu_cuda_nightly.yml)
 
-FBGEMM_GPU (FBGEMM GPU kernel library) is a collection of
-high-performance CUDA GPU operator library for GPU training and inference.
+FBGEMM_GPU (FBGEMM GPU Kernels Library) is a collection of high-performance PyTorch
+GPU operator libraries for training and inference.  The library provides efficient
+table batched embedding bag, data layout transformation, and quantization supports.
 
-The library provides efficient table batched embedding bag,
-data layout transformation, and quantization supports.
-
-Currently tested with CUDA 11.3, 11.5, 11.6, and 11.7 in CI. In all cases, we test with PyTorch packages which are built with CUDA 11.7.
+FBGEMM_GPU is currently tested with CUDA 11.7.1 and 11.8 in CI, and with PyTorch
+packages that are built against those CUDA versions.
 
 Only Intel/AMD CPUs with AVX2 extensions are currently supported.
 
-General build and install instructions are as follows:
 
-Build dependencies: `scikit-build`, `cmake`, `ninja`, `jinja2`, `torch`, `cudatoolkit`,
-and for testing: `hypothesis`.
+## Build Instructions
 
-```
-conda install scikit-build jinja2 ninja cmake hypothesis
-```
+This section is intended for FBGEMM_GPU developers.  The full build instructions
+for the CUDA, ROCm, and CPU-only variants of FBGEMM_GPU can be found
+[here](docs/BuildInstructions.md).
 
-**If you're planning to build from source** and **don't** have `nvml.h` in your system, you can install it via the command
-below.
-```
-conda install -c conda-forge cudatoolkit-dev
-```
 
-Certain operations require this library to be present. Be sure to provide the path to `libnvidia-ml.so` to
-`--nvml_lib_path` if installing from source (e.g. `python setup.py install --nvml_lib_path path_to_libnvidia-ml.so`).
+## Installation
 
-
-## PIP install
+### Install through PIP
 
 Currently only built with sm70/80 (V100/A100 GPU) wheel supports:
 
@@ -53,194 +43,72 @@ pip install fbgemm-gpu-nightly
 # Nightly CPU-only
 pip install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
 pip install fbgemm-gpu-nightly-cpu
-
 ```
 
-## Build from source
+### Running FBGEMM_GPU
 
-Additional dependencies: currently cuDNN is required to be installed.
-Please [download][4] and follow instructions [here][5] to install cuDNN.
+The tests (in test folder) and benchmarks (in bench folder) are some great
+examples of using FBGEMM_GPU.  To run the tests or benchmarks after building
+FBGEMM_GPU (if tests or benchmarks are built), use the following command:
 
 ```
-# Requires PyTorch 1.13 or later
-conda install pytorch cuda -c pytorch-nightly -c "nvidia/label/cuda-11.7.1"
-git clone --recursive https://github.com/pytorch/FBGEMM.git
-cd FBGEMM/fbgemm_gpu
-# if you are updating an existing checkout
-git submodule sync
-git submodule update --init --recursive
-
-# Specify CUDA version to use
-# (may not be needed with only a single version installed)
-export CUDA_BIN_PATH=/usr/local/cuda-11.3/
-export CUDACXX=/usr/local/cuda-11.3/bin/nvcc
-
-# Specify cuDNN library and header paths.  We tested CUDA 11.6 and 11.7 with
-# cuDNN version 8.5.0.96
-export CUDNN_LIBRARY=${HOME}/cudnn-linux-x86_64-8.5.0.96_cuda11-archive/lib
-export CUDNN_INCLUDE_DIR=${HOME}/cudnn-linux-x86_64-8.5.0.96_cuda11-archive/include
-
-# in fbgemm_gpu folder
-# build for the CUDA architecture supported by current system (or all architectures if no CUDA device present)
-python setup.py install
-# or build it for specific CUDA architectures (see PyTorch documentation for usage of TORCH_CUDA_ARCH_LIST)
-python setup.py install -DTORCH_CUDA_ARCH_LIST="7.0;8.0"
-```
-
-
-## Usage Example:
-```bash
-cd bench
-python split_table_batched_embeddings_benchmark.py uvm
+# run the tests and benchmarks of table batched embedding bag op,
+# data layout transform op, quantized ops, etc.
+cd test
+python split_table_batched_embeddings_test.py
+python quantize_ops_test.py
+python sparse_ops_test.py
+python split_embedding_inference_converter_test.py
+cd ../bench
+python split_table_batched_embeddings_benchmark.py
 ```
-## Build on ROCm
-
-FBGEMM_GPU supports running on AMD (ROCm) devices. A Docker container is recommended for setting up the ROCm environment. The installation on bare metal is also available. ROCm5.3 is used as an example of the installation below.
 
-##### Build in a Docker container
-Pull Docker container and run
-```
-docker pull rocm/pytorch:rocm5.4_ubuntu20.04_py3.8_pytorch_staging_base
-sudo docker run -it --network=host --shm-size 16G --device=/dev/kfd --device=/dev/dri \
-                --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
-                --ipc=host --env PYTORCH_ROCM_ARCH="gfx906;gfx908;gfx90a" -u 0 \
-                rocm/pytorch:rocm5.4_ubuntu20.04_py3.8_pytorch_staging_base
-```
-In the container
+To run the tests and benchmarks on a GPU-capable device in CPU-only mode use CUDA_VISIBLE_DEVICES=-1
 ```
-pip3 install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/rocm5.3/
-cd ~
-git clone https://github.com/pytorch/FBGEMM.git
-cd FBGEMM/fbgemm_gpu
-# if you are updating an existing checkout
-git submodule sync
-git submodule update --init --recursive
-pip install -r requirements.txt
-pip install update hypothesis
-
-# in fbgemm_gpu folder
-# build for the current ROCm architecture
-gpu_arch="$(/opt/rocm/bin/rocminfo | grep -o -m 1 'gfx.*')"
-export PYTORCH_ROCM_ARCH=$gpu_arch
-python setup.py install develop
-# or build for specific ROCm architectures
-export PYTORCH_ROCM_ARCH="gfx906;gfx908"
-python setup.py install develop
-# otherwise the build will be for the default architectures gfx906;gfx908;gfx90a
+CUDA_VISIBLE_DEVICES=-1 python split_table_batched_embeddings_test.py
 ```
 
-##### Build on bare metal
-Please refer to the installation instructions of ROCm5.3 [here][6]. Take the installation on Ubuntu20.04 as an example
-```
-sudo apt-get update
-wget https://repo.radeon.com/amdgpu-install/5.3/ubuntu/focal/amdgpu-install_5.3.50300-1_all.deb
-sudo apt-get install ./amdgpu-install_5.3.50300-1_all.deb
-sudo amdgpu-install --usecase=hiplibsdk,rocm --no-dkms
-```
-MIOpen is required and needs to be installed separately.
-```
-sudo apt-get install miopen-hip miopen-hip-dev
-```
-The remaining steps are the same as the "in the container" section.
+### Run the tests on ROCm
 
-##### Run the tests on ROCm
 Please add `FBGEMM_TEST_WITH_ROCM=1` flag when running tests on ROCm.
 ```
 cd test
 FBGEMM_TEST_WITH_ROCM=1 python split_table_batched_embeddings_test.py
 ```
 
-## Issues
+### Benchmark Example
 
-Building is CMAKE based and keeps state across install runs.
-Specifying the CUDA architectures in the command line once is enough.
-However on failed builds (missing dependencies ..) this can cause problems
-and using
 ```bash
-python setup.py clean
-```
-to remove stale cached state can be helpful.
-
-## Examples
-
-The tests (in test folder) and benchmarks (in bench folder) are some great
-examples of using FBGEMM_GPU.
-
-## Build Notes
-FBGEMM_GPU uses a scikit-build CMAKE-based build flow.
-
-### Dependencies
-FBGEMM_GPU requires nvcc and a Nvidia GPU with
-compute capability of 3.5+.
-
-+ ###### CUB
-
-CUB is now included with CUDA 11.1+ - the section below will still be needed for lower CUDA versions (once they are tested).
-
-For the [CUB][1] build time dependency, if you are using conda, you can continue with
-```
-conda install -c bottler nvidiacub
-```
-Otherwise download the CUB library from https://github.com/NVIDIA/cub/releases and unpack it to a folder of your choice. Define the environment variable CUB_DIR before building and point it to the directory that contains CMakeLists.txt for CUB. For example on Linux/Mac,
-
-```
-curl -LO https://github.com/NVIDIA/cub/archive/1.10.0.tar.gz
-tar xzf 1.10.0.tar.gz
-export CUB_DIR=$PWD/cub-1.10.0
-```
-
-+ ###### PyTorch, Jinja2, scikit-build
-[PyTorch][2], [Jinja2][3] and scikit-build are **required** to build and run the table
-batched embedding bag operator. One thing to note is that the implementation
-of this op relies on the version of PyTorch 1.9 or later.
-
-```
-conda install scikit-build jinja2 ninja cmake
+cd bench
+python split_table_batched_embeddings_benchmark.py uvm
 ```
 
-## Running FBGEMM_GPU
 
-To run the tests or benchmarks after building FBGEMM_GPU (if tests or benchmarks
-are built), use the following command:
-```
-# run the tests and benchmarks of table batched embedding bag op,
-# data layout transform op, quantized ops, etc.
-cd test
-python split_table_batched_embeddings_test.py
-python quantize_ops_test.py
-python sparse_ops_test.py
-python split_embedding_inference_converter_test.py
-cd ../bench
-python split_table_batched_embeddings_benchmark.py
-```
+## Documentation
 
-To run the tests and benchmarks on a GPU-capable device in CPU-only mode use CUDA_VISIBLE_DEVICES=-1
-```
-CUDA_VISIBLE_DEVICES=-1 python split_table_batched_embeddings_test.py
-```
+### How FBGEMM_GPU works
 
-## How FBGEMM_GPU works
 For a high-level overview, design philosophy and brief descriptions of various
 parts of FBGEMM_GPU please see our Wiki (work in progress).
 
-## Full documentation
 We have extensively used comments in our source files. The best and up-to-date
 documentation is available in the source files.
 
-# Building API Documentation
+### Building the API Documentation
 
 See [docs/README.md](docs/README.md).
 
-## Join the FBGEMM community
-See the [`CONTRIBUTING`](../CONTRIBUTING.md) file for how to help out.
+
+## Join the FBGEMM_GPU Community
+
+For questions or feature requests, please file a ticket over on
+[GitHub Issues](https://github.com/pytorch/FBGEMM/issues) or reach out to us on
+the `#fbgemm` channel in [PyTorch Slack](https://bit.ly/ptslack).
+
+For contributions, please see the [`CONTRIBUTING`](../CONTRIBUTING.md) file for
+ways to help out.
+
 
 ## License
-FBGEMM is BSD licensed, as found in the [`LICENSE`](../LICENSE) file.
-
-[0]:https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html
-[1]:https://github.com/NVIDIA/cub
-[2]:https://github.com/pytorch/pytorch
-[3]:https://jinja.palletsprojects.com/en/2.11.x/
-[4]:https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#download
-[5]:https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#installlinux-tar
-[6]:https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.3/page/How_to_Install_ROCm.html#_How_to_Install
+
+FBGEMM_GPU is BSD licensed, as found in the [`LICENSE`](../LICENSE) file.
diff --git a/fbgemm_gpu/docs/BuildInstructions.md b/fbgemm_gpu/docs/BuildInstructions.md
@@ -105,10 +105,10 @@ conda install -n "${env_name}" -y \
 ## Set Up for CUDA Build
 
 The CUDA build of FBGEMM_GPU requires `nvcc` that supports compute capability
-3.5+.  Setting the machine up for CUDA builds of FBGEMM_GPU can be done either
-through pre-built Docker images or through Conda installation on bare metal.
-Note that neither a GPU nor the NVIDIA drivers need to be present for builds,
-since they are only used at runtime.
+**`3.5+`**.  Setting the machine up for CUDA builds of FBGEMM_GPU can be done
+either through pre-built Docker images or through Conda installation on bare
+metal.  Note that neither a GPU nor the NVIDIA drivers need to be present for
+builds, since they are only used at runtime.
 
 ### Docker Image
 
@@ -182,8 +182,9 @@ wget -q https://github.com/NVIDIA/cub/archive/1.10.0.tar.gz
 
 ## Set Up for ROCm Build
 
-Setting the machine up for ROCm builds of FBGEMM_GPU can be done either through
-pre-built Docker images or through bare metal.
+FBGEMM_GPU supports running on AMD (ROCm) devices.  Setting the machine up for
+ROCm builds of FBGEMM_GPU can be done either through pre-built Docker images or
+through bare metal.
 
 ### Docker Image
 
@@ -356,6 +357,10 @@ package_name=fbgemm_gpu
 # If no CUDA device is present either, all CUDA architectures will be targeted
 cuda_arch_list=7.0;8.0
 
+# Unset TORCH_CUDA_ARCH_LIST if it exists, bc it takes precedence over
+# -DTORCH_CUDA_ARCH_LIST during the invocation of setup.py
+unset TORCH_CUDA_ARCH_LIST
+
 # Build the wheel artifact only
 python setup.py bdist_wheel \
     --package_name="${package_name}" \
@@ -415,11 +420,30 @@ python setup.py bdist_wheel \
 python setup.py install --cpu_only
 ```
 
-### Post-Build Checks
+### Post-Build Checks (For Developers)
+
+After the build completes, it is useful to run some checks that verify that the
+build is actually correct.
+
+#### Undefined Symbols Check
+
+Because FBGEMM_GPU contains a lot of template functions and their instantiations,
+it is important to make sure that there are no undefined template instantiations:
+
+```sh
+# !! Run in fbgemm_gpu/ directory inside the Conda environment !!
+
+# Locate the built .SO file
+fbgemm_gpu_lib_path=$(find . -name fbgemm_gpu_py.so)
+
+# Check that the undefined symbols don't include fbgemm_gpu-defined functions
+nm -gDCu "${fbgemm_gpu_lib_path}"
+```
+
+#### GLIBC Version Compatibility Check
 
-After the build completes, it is useful to check the built library and verify
-the version numbers of GLIBCXX referenced as well as the availability of certain
-function symbols:
+It is also useful to verify that the version numbers of GLIBCXX referenced as
+well as the availability of certain function symbols:
 
 ```sh
 # !! Run in fbgemm_gpu/ directory inside the Conda environment !!