torch custom_op support: norm #552

abcdabcd987 · 2024-10-23T20:50:31Z

Add torch custom_op (aka, torch library, torch.compile) support for norm.py. It should be a no-op for PyTorch < 2.4.

Testing is done by torch.compile -- as we expect the custom_op marks can isolate out our kernels during torch.compile. To avoid changes to tests, I introduced some magic that replaces the kernels with a torch.compile-ed version. For example, to run with/without torch.compile:

# With torch.compile
FLASHINFER_TEST_TORCH_COMPILE=1 pytest -svx tests/test_norm.py

# Without torch.compile
pytest -svx tests/test_norm.py

If this PR looks good, I'll add it to more kernels.

yzh119

LGTM, thank you @abcdabcd987 !

Follow up of #552. This PR adds torch library annotation to all FlashInfer kernels so that torch.compile can recognize the kernels. Most changes are tedious. I manually ran subsets of pytest test cases when I made these changes, but since there are too many of them and also some of them didn't pass even before I made the change, I cannot guarantee it's all working. To run tests with torch.compile, pass `FLASHINFER_TEST_TORCH_COMPILE=1` env. ```bash # With torch.compile FLASHINFER_TEST_TORCH_COMPILE=1 pytest -svx tests/test_norm.py # Without torch.compile pytest -svx tests/test_norm.py ``` Notable changes: * For the prefill and decode pybind, it used to return `Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]` depending on `return_lse`. This causes trouble for `torch.compile`. I changed the pybind interface to accept a `maybe_lse: Optional[torch.Tensor]` and only return one tensor. The allocation of the lse tensor is moved to Python side. The Python API does not change. * `chain_speculative_sampling` pybind: Move the allocation of `accepted` and `emitted` from C++ to Python. This is because `torch.compile` doesn't like returning input tensor as output tensor. The Python API does not change. Piggyback changes: * `BatchPrefillWithRaggedKVCacheWrapper.plan`: Bugfix qo_indptr not on CPU * `merge_state`: Fix typo in docs * Change `run_return_lse(...)` to `run(..., return_lse=True)` because torch.compile does not recognize `functools.partial`. * In tests, change `flashinfer.xxx()` to `flashinfer.<module>.xxx()` so that the monkeypatch works. Unsupported for torch.compile: * `flashinfer.quantization.segment_packbits`: Because it's data dependent. Untouched: * `sparse.py`: Tests didn't pass beforehand, so I skiped this. Also, it doesn't seem like need custom_op annotations, as it does not have CUDA kernels. Failed test cases: * batch_decode non contiguous kv: `test_batch_decode_with_paged_kv_cache[False-kv_dtype0-q_dtype0-True-0.0-NONE-NHD-128-4-4-1-54-12]`

Here's the reason why docs fail to build after #552: As specified in `conf.py`, Sphinx mocks `torch`. The mock makes the following predicate behave badly: `TorchVersion(torch_version) < TorchVersion("2.4")`. The fix is to explicitly pass in an env var indicating docs building. Also changing the way that `prefill.py` imports compiled `_kernels` so that it's consistent with other files.

torch custom_op support: norm

9017366

abcdabcd987 requested a review from yzh119 October 23, 2024 20:50

yzh119 approved these changes Oct 24, 2024

View reviewed changes

yzh119 merged commit f6e0010 into flashinfer-ai:main Oct 24, 2024

github-actions bot mentioned this pull request Oct 24, 2024

chore(main): release 0.2.0 #476

Open

abcdabcd987 mentioned this pull request Oct 24, 2024

feat: torch.compile and custom_op support #554

Merged

abcdabcd987 mentioned this pull request Oct 30, 2024

Fix Sphinx #573

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch custom_op support: norm #552

torch custom_op support: norm #552

abcdabcd987 commented Oct 23, 2024

yzh119 left a comment

torch custom_op support: norm #552

torch custom_op support: norm #552

Conversation

abcdabcd987 commented Oct 23, 2024

yzh119 left a comment

Choose a reason for hiding this comment