From 31bece9aae84b454c9bbb80c2010fe8eedcc4403 Mon Sep 17 00:00:00 2001
From: David Yastremsky <dyastremsky@nvidia.com>
Date: Wed, 11 Oct 2023 18:31:15 -0700
Subject: [PATCH] Remove Pytorch platform handler documentation

---
 README.md | 110 ------------------------------------------------------
 1 file changed, 110 deletions(-)

diff --git a/README.md b/README.md
index 4cb9a960..514d4214 100644
--- a/README.md
+++ b/README.md
@@ -1451,116 +1451,6 @@ this workflow.
 For a simple example of using PyTorch in a Python Backend model, see the
 [AddSubNet PyTorch example](#addsubnet-in-pytorch).
 
-### PyTorch Platform \[Experimental\]
-
-**NOTE**: *This feature is subject to change and removal, and should not
-be used in production.*
-
-Starting from 23.08, we are adding an experimental support for loading and
-serving PyTorch models directly via Python backend. The model can be provided
-within the triton server model repository, and a
-[pre-built Python model](src/resources/platform_handlers/pytorch/model.py) will
-be used to load and serve the PyTorch model.
-
-#### Model Layout
-
-The model repository should look like:
-
-```
-model_repository/
-`-- model_directory
-    |-- 1
-    |   |-- model.py
-    |   `-- model.pt
-    `-- config.pbtxt
-```
-
-The `model.py` contains the class definition of the PyTorch model. The class
-should extend the
-[`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).
-The `model.pt` may be optionally provided which contains the saved
-[`state_dict`](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference)
-of the model. For serving TorchScript models, a `model.pt` TorchScript can be
-provided in place of the `model.py` file.
-
-By default, Triton will use the
-[PyTorch backend](https://github.com/triton-inference-server/pytorch_backend) to
-load and serve TorchScript models. In order to serve from Python backend,
-[model configuration](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md)
-should explicitly provide the following settings:
-
-```
-backend: "python"
-platform: "pytorch"
-```
-
-#### PyTorch Installation
-
-This feature will take advantage of the
-[`torch.compile`](https://pytorch.org/docs/stable/generated/torch.compile.html#torch-compile)
-optimization, make sure the
-[PyTorch 2.0+ pip package](https://pypi.org/project/torch/2.0.1/) is available
-in the same Python environment.
-
-```
-pip install torch==2.0.1
-```
-Alternatively, a
-[Python Execution Environment](#using-custom-python-execution-environments)
-with the PyTorch dependency may be used.
-
-#### Customization
-
-The following PyTorch settings may be customized by setting parameters on the
-`config.pbtxt`.
-
-[`torch.set_num_threads(int)`](https://pytorch.org/docs/stable/generated/torch.set_num_threads.html#torch.set_num_threads)
-- Key: NUM_THREADS
-- Value: The number of threads used for intraop parallelism on CPU.
-
-[`torch.set_num_interop_threads(int)`](https://pytorch.org/docs/stable/generated/torch.set_num_interop_threads.html#torch.set_num_interop_threads)
-- Key: NUM_INTEROP_THREADS
-- Value: The number of threads used for interop parallelism (e.g. in JIT
-interpreter) on CPU.
-
-[`torch.compile()` parameters](https://pytorch.org/docs/stable/generated/torch.compile.html#torch-compile)
-- Key: TORCH_COMPILE_OPTIONAL_PARAMETERS
-- Value: Any of following parameter(s) encoded as a JSON object.
-  - fullgraph (*bool*): Whether it is ok to break model into several subgraphs.
-  - dynamic (*bool*): Use dynamic shape tracing.
-  - backend (*str*): The backend to be used.
-  - mode (*str*): Can be either "default", "reduce-overhead" or "max-autotune".
-  - options (*dict*): A dictionary of options to pass to the backend.
-  - disable (*bool*): Turn `torch.compile()` into a no-op for testing.
-
-For example:
-```
-parameters: {
-    key: "NUM_THREADS"
-    value: { string_value: "4" }
-}
-parameters: {
-    key: "TORCH_COMPILE_OPTIONAL_PARAMETERS"
-    value: { string_value: "{\"disable\": true}" }
-}
-```
-
-#### Example
-
-You can find the complete example instructions in
-[examples/pytorch_platform_handler](examples/pytorch_platform_handler/README.md).
-
-#### Limitations
-
-Following are few known limitations of this feature:
-- Python functions optimizable by `torch.compile` may not be served directly in
-the `model.py` file, they need to be enclosed by a class extending the
-[`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).
-- Model weights cannot be shared across multiple instances on the same GPU
-device.
-- When using `KIND_MODEL` as model instance kind, the default device of the
-first parameter on the model is used.
-
 ### PyTorch Determinism
 
 When running PyTorch code, you may notice slight differences in output values