From 31bece9aae84b454c9bbb80c2010fe8eedcc4403 Mon Sep 17 00:00:00 2001 From: David Yastremsky Date: Wed, 11 Oct 2023 18:31:15 -0700 Subject: [PATCH] Remove Pytorch platform handler documentation --- README.md | 110 ------------------------------------------------------ 1 file changed, 110 deletions(-) diff --git a/README.md b/README.md index 4cb9a960..514d4214 100644 --- a/README.md +++ b/README.md @@ -1451,116 +1451,6 @@ this workflow. For a simple example of using PyTorch in a Python Backend model, see the [AddSubNet PyTorch example](#addsubnet-in-pytorch). -### PyTorch Platform \[Experimental\] - -**NOTE**: *This feature is subject to change and removal, and should not -be used in production.* - -Starting from 23.08, we are adding an experimental support for loading and -serving PyTorch models directly via Python backend. The model can be provided -within the triton server model repository, and a -[pre-built Python model](src/resources/platform_handlers/pytorch/model.py) will -be used to load and serve the PyTorch model. - -#### Model Layout - -The model repository should look like: - -``` -model_repository/ -`-- model_directory - |-- 1 - | |-- model.py - | `-- model.pt - `-- config.pbtxt -``` - -The `model.py` contains the class definition of the PyTorch model. The class -should extend the -[`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module). -The `model.pt` may be optionally provided which contains the saved -[`state_dict`](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference) -of the model. For serving TorchScript models, a `model.pt` TorchScript can be -provided in place of the `model.py` file. - -By default, Triton will use the -[PyTorch backend](https://github.com/triton-inference-server/pytorch_backend) to -load and serve TorchScript models. In order to serve from Python backend, -[model configuration](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md) -should explicitly provide the following settings: - -``` -backend: "python" -platform: "pytorch" -``` - -#### PyTorch Installation - -This feature will take advantage of the -[`torch.compile`](https://pytorch.org/docs/stable/generated/torch.compile.html#torch-compile) -optimization, make sure the -[PyTorch 2.0+ pip package](https://pypi.org/project/torch/2.0.1/) is available -in the same Python environment. - -``` -pip install torch==2.0.1 -``` -Alternatively, a -[Python Execution Environment](#using-custom-python-execution-environments) -with the PyTorch dependency may be used. - -#### Customization - -The following PyTorch settings may be customized by setting parameters on the -`config.pbtxt`. - -[`torch.set_num_threads(int)`](https://pytorch.org/docs/stable/generated/torch.set_num_threads.html#torch.set_num_threads) -- Key: NUM_THREADS -- Value: The number of threads used for intraop parallelism on CPU. - -[`torch.set_num_interop_threads(int)`](https://pytorch.org/docs/stable/generated/torch.set_num_interop_threads.html#torch.set_num_interop_threads) -- Key: NUM_INTEROP_THREADS -- Value: The number of threads used for interop parallelism (e.g. in JIT -interpreter) on CPU. - -[`torch.compile()` parameters](https://pytorch.org/docs/stable/generated/torch.compile.html#torch-compile) -- Key: TORCH_COMPILE_OPTIONAL_PARAMETERS -- Value: Any of following parameter(s) encoded as a JSON object. - - fullgraph (*bool*): Whether it is ok to break model into several subgraphs. - - dynamic (*bool*): Use dynamic shape tracing. - - backend (*str*): The backend to be used. - - mode (*str*): Can be either "default", "reduce-overhead" or "max-autotune". - - options (*dict*): A dictionary of options to pass to the backend. - - disable (*bool*): Turn `torch.compile()` into a no-op for testing. - -For example: -``` -parameters: { - key: "NUM_THREADS" - value: { string_value: "4" } -} -parameters: { - key: "TORCH_COMPILE_OPTIONAL_PARAMETERS" - value: { string_value: "{\"disable\": true}" } -} -``` - -#### Example - -You can find the complete example instructions in -[examples/pytorch_platform_handler](examples/pytorch_platform_handler/README.md). - -#### Limitations - -Following are few known limitations of this feature: -- Python functions optimizable by `torch.compile` may not be served directly in -the `model.py` file, they need to be enclosed by a class extending the -[`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module). -- Model weights cannot be shared across multiple instances on the same GPU -device. -- When using `KIND_MODEL` as model instance kind, the default device of the -first parameter on the model is used. - ### PyTorch Determinism When running PyTorch code, you may notice slight differences in output values