Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add BF16 tensor support via dlpack #371

Merged
merged 5 commits into from
Jul 30, 2024
Merged

feat: Add BF16 tensor support via dlpack #371

merged 5 commits into from
Jul 30, 2024

Conversation

rmccorm4
Copy link
Contributor

@rmccorm4 rmccorm4 commented Jul 27, 2024

What does the PR do?

Adds BF16 tensor support via DLPack. tensor.as_numpy() will not be supported for TYPE_BF16 tensors at this time due to lack of native support for BF16 in numpy.

These BF16 tensors can easily be converted around with zero copies to dlpack-compatible frameworks like PyTorch and TensorFlow with their respective to_dlpack and from_dlpack utilities.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

Test plan:

See testing PR.

Caveats:

as_numpy() is not supported on BF16 tensors due to lack of native BF16 support in numpy, and DLPack must be used instead.

Background

LLMs are commonly trained in BF16, and when deployed for serving often have a pre/post processing model written in Python. This makes lack of BF16 support in the python models a blocker or requires significant workarounds for writing an ensemble/BLS LLM pipeline.

Adding BF16 support to the python backend will simplify this workflow.

Example

Example model.py using BF16 tensors via dlpack and torch:

from torch.utils.dlpack import to_dlpack, from_dlpack
import triton_python_backend_utils as pb_utils

class TritonPythonModel:
    @classmethod
    def auto_complete_config(cls, cfg):
        inputs = [{
            'name': 'INPUT0',
            'data_type': 'TYPE_BF16',
            'dims': [-1, -1]
        }]
        outputs = [{
            'name': 'OUTPUT0',
            'data_type': 'TYPE_BF16',
            'dims': [-1, -1]
        }]

        [cfg.add_input(i) for i in inputs]
        [cfg.add_output(o) for o in outputs]
        cfg.set_max_batch_size(64)
        return cfg

    def infer(self, request):
        output_tensors = []
        for input_tensor in request.inputs():
            # Numpy representation is not supported for BF16
            # NOTE: Could raise an exception or an error instead
            assert input_tensor.as_numpy() == None
            # Convert PB tensor to torch tensor with dlpack
            torch_tensor = from_dlpack(input_tensor.to_dlpack())
            # Manipulate torch tensor
            torch_tensor *= 2
            # Convert torch tensor back to PB tensor with dlpack
            output_tensor = pb_utils.Tensor.from_dlpack(
                input_tensor.name().replace("INPUT", "OUTPUT"),
                to_dlpack(torch_tensor)
            )
            output_tensors.append(output_tensor)
        return pb_utils.InferenceResponse(output_tensors=output_tensors)

    def execute(self, requests):
        responses = []
        for request in requests:
            responses.append(self.infer(request))

        return responses

@rmccorm4 rmccorm4 marked this pull request as draft July 27, 2024 01:06
@rmccorm4 rmccorm4 changed the title Add proof of concept BF16 support via dlpack feat: Add proof of concept BF16 support via dlpack Jul 27, 2024
@rmccorm4 rmccorm4 changed the title feat: Add proof of concept BF16 support via dlpack feat: Add BF16 tensor support via dlpack Jul 30, 2024
@rmccorm4 rmccorm4 marked this pull request as ready for review July 30, 2024 00:36
Copy link
Member

@Tabrizian Tabrizian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice feature and PR description 🚀

@rmccorm4 rmccorm4 merged commit 2b12abe into main Jul 30, 2024
3 checks passed
@rmccorm4 rmccorm4 deleted the rmccormick-pb-bf16 branch July 30, 2024 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants