Skip to content

Commit

Permalink
Remove Pytorch platform handler documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
dyastremsky committed Oct 12, 2023
1 parent 4c4a552 commit 31bece9
Showing 1 changed file with 0 additions and 110 deletions.
110 changes: 0 additions & 110 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1451,116 +1451,6 @@ this workflow.
For a simple example of using PyTorch in a Python Backend model, see the
[AddSubNet PyTorch example](#addsubnet-in-pytorch).

### PyTorch Platform \[Experimental\]

**NOTE**: *This feature is subject to change and removal, and should not
be used in production.*

Starting from 23.08, we are adding an experimental support for loading and
serving PyTorch models directly via Python backend. The model can be provided
within the triton server model repository, and a
[pre-built Python model](src/resources/platform_handlers/pytorch/model.py) will
be used to load and serve the PyTorch model.

#### Model Layout

The model repository should look like:

```
model_repository/
`-- model_directory
|-- 1
| |-- model.py
| `-- model.pt
`-- config.pbtxt
```

The `model.py` contains the class definition of the PyTorch model. The class
should extend the
[`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).
The `model.pt` may be optionally provided which contains the saved
[`state_dict`](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference)
of the model. For serving TorchScript models, a `model.pt` TorchScript can be
provided in place of the `model.py` file.

By default, Triton will use the
[PyTorch backend](https://github.com/triton-inference-server/pytorch_backend) to
load and serve TorchScript models. In order to serve from Python backend,
[model configuration](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md)
should explicitly provide the following settings:

```
backend: "python"
platform: "pytorch"
```

#### PyTorch Installation

This feature will take advantage of the
[`torch.compile`](https://pytorch.org/docs/stable/generated/torch.compile.html#torch-compile)
optimization, make sure the
[PyTorch 2.0+ pip package](https://pypi.org/project/torch/2.0.1/) is available
in the same Python environment.

```
pip install torch==2.0.1
```
Alternatively, a
[Python Execution Environment](#using-custom-python-execution-environments)
with the PyTorch dependency may be used.

#### Customization

The following PyTorch settings may be customized by setting parameters on the
`config.pbtxt`.

[`torch.set_num_threads(int)`](https://pytorch.org/docs/stable/generated/torch.set_num_threads.html#torch.set_num_threads)
- Key: NUM_THREADS
- Value: The number of threads used for intraop parallelism on CPU.

[`torch.set_num_interop_threads(int)`](https://pytorch.org/docs/stable/generated/torch.set_num_interop_threads.html#torch.set_num_interop_threads)
- Key: NUM_INTEROP_THREADS
- Value: The number of threads used for interop parallelism (e.g. in JIT
interpreter) on CPU.

[`torch.compile()` parameters](https://pytorch.org/docs/stable/generated/torch.compile.html#torch-compile)
- Key: TORCH_COMPILE_OPTIONAL_PARAMETERS
- Value: Any of following parameter(s) encoded as a JSON object.
- fullgraph (*bool*): Whether it is ok to break model into several subgraphs.
- dynamic (*bool*): Use dynamic shape tracing.
- backend (*str*): The backend to be used.
- mode (*str*): Can be either "default", "reduce-overhead" or "max-autotune".
- options (*dict*): A dictionary of options to pass to the backend.
- disable (*bool*): Turn `torch.compile()` into a no-op for testing.

For example:
```
parameters: {
key: "NUM_THREADS"
value: { string_value: "4" }
}
parameters: {
key: "TORCH_COMPILE_OPTIONAL_PARAMETERS"
value: { string_value: "{\"disable\": true}" }
}
```

#### Example

You can find the complete example instructions in
[examples/pytorch_platform_handler](examples/pytorch_platform_handler/README.md).

#### Limitations

Following are few known limitations of this feature:
- Python functions optimizable by `torch.compile` may not be served directly in
the `model.py` file, they need to be enclosed by a class extending the
[`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).
- Model weights cannot be shared across multiple instances on the same GPU
device.
- When using `KIND_MODEL` as model instance kind, the default device of the
first parameter on the model is used.

### PyTorch Determinism

When running PyTorch code, you may notice slight differences in output values
Expand Down

0 comments on commit 31bece9

Please sign in to comment.