FBGEMM_GPU v0.4.0
Release Notes
Software Requirements
FBGEMM_GPU v0.4.0 has been tested and known to work on the following setups:
- PyTorch: v2.0
- CUDA: v11.7, 11.8
- Python: v3.8, 3.9, 3.10 (3.11 not supported yet)
It is recommended to prepare an isolated environment for installing and running FBGEMM_GPU, such as Conda and/or Docker.
Availability
FBGEMM_GPU may be fetched directly from PyPI:
# FBGEMM_GPU (CUDA variant)
pip install fbgemm-gpu==0.4.0
# FBGEMM_GPU (CPU variant)
pip install fbgemm-gpu-cpu==0.4.0
Changes
Table batched embedding (TBE) operators
- [New] SSD for inference TBE (#1473, #1479, #1485, #1517, #1533, #1535)
- [New] Inplace TBE update (#1480, #1482, #1492, #1529)
- [New] BF16 support for inference TBE (#1498, #1503)
- [New] BF16 support for TBE on CPU (#1540, #1583)
- [Improvement] Training TBE backward performance improvement (#1563)
UVM cache improvement
- [New] Delta in-place update (#1436)
- [New] UVM caching stats report (#1623, #1462, #1433, #1623, #1570)
- [Improvement]
[lfu|lru]_cache_insert_byte_kernel
vectorization (#1475)
Jagged Tensor Operators
- [New] Backends (Meta and Autograd) (#1461, #1466, #1467, #1469, #1468, #1477, #1556)
- [New] BF16 support (#1472, #1560)
- [New] FP32 + BF16 hybrid support for
jagged_dense_dense_elementwise_add_jagged
(#1487) - [New] Jagged tensors with no inner dense dimension support (#1267)
- [New] New jagged tensor operators (#1557, #1577, #1578, #1579, #1594, #1595)
Index Select Operators
- [New]
group_index_select
(#1421, #1592) - [New]
index_select
for selecting KeyJaggedTensor dim 1 (previously support only dim 0) (#1429) - [New]
jagged_index_select
for CPU (#1586)
Low-precision operators
- [New] FP8 rowwise quantized communication (#1423)
Misc
- Support 2D inputs for
asynchronous_complete_cumsum
(#1573)
Benchmarks / Tests
- [New]
nbit_device_with_spec
for table batched embedding inference benchmark (#1455, #1465) - [New] Variable bag sizes for TBE benchmark (#1450)
- [Improvement] Parallel
bottom_unique_k_per_row
for faster Zipf data generation (for FBGEMM benchmarks) (#1447)