Skip to content

Commit

Permalink
Add generate endpoint to Tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
fpetrini15 committed Nov 13, 2023
1 parent f4bfa7c commit 9f27e64
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 4 deletions.
2 changes: 1 addition & 1 deletion Quick_Deploy/HuggingFaceTransformers/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,5 @@
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FROM nvcr.io/nvidia/tritonserver:23.09-py3
FROM nvcr.io/nvidia/tritonserver:23.10-py3
RUN pip install transformers==4.34.0 protobuf==3.20.3 sentencepiece==0.1.99 accelerate==0.23.0 einops==0.6.1
11 changes: 8 additions & 3 deletions Quick_Deploy/HuggingFaceTransformers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ I0922 23:28:40.395611 1 http_server.cc:187] Started Metrics Service at 0.0.0.0:8

Now we can query the server using curl, specifying the server address and input details:

```json
```bash
curl -X POST localhost:8000/v2/models/falcon7b/infer -d '{"inputs": [{"name":"text_input","datatype":"BYTES","shape":[1],"data":["I am going"]}]}'
```
In our testing, the server returned the following result (formatted for legibility):
Expand Down Expand Up @@ -135,7 +135,7 @@ Again, launch the server by invoking the `docker run` command from above and wai
that the server has launched successfully.

Query the server making sure to change the host address for each model:
```json
```bash
curl -X POST localhost:8000/v2/models/falcon7b/infer -d '{"inputs": [{"name":"text_input","datatype":"BYTES","shape":[1],"data":["How can you be"]}]}'
curl -X POST localhost:8000/v2/models/persimmon8b/infer -d '{"inputs": [{"name":"text_input","datatype":"BYTES","shape":[1],"data":["Where is the nearest"]}]}'
```
Expand All @@ -147,7 +147,12 @@ In our testing, these queries returned the following parsed results:
# persimmon8b
"Where is the nearest starbucks?"
```
Beginning in the 23.10 release, users can now interact with large language models (LLMs) hosted
by Triton in a simplified fashion by using Triton's generate endpoint:

```bash
curl -X POST localhost:8000/v2/models/falcon7b/generate -d '{"text_input":"How can you be"}'
```
## 'Day Zero' Support

The latest transformer models may not always be supported in the most recent, official
Expand Down Expand Up @@ -206,7 +211,7 @@ the Triton server using the `docker run` command from above.
Once Triton launches successfully, start a Triton SDK container by running the following in a separate window:

```bash
docker run -it --net=host nvcr.io/nvidia/tritonserver:23.09-py3-sdk bash
docker run -it --net=host nvcr.io/nvidia/tritonserver:23.10-py3-sdk bash
```
This container comes with all of Triton's deployment analyzers pre-installed, meaning
we can simply enter the following to get feedback on our model's inference performance:
Expand Down

0 comments on commit 9f27e64

Please sign in to comment.