-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add containers/tgi/tpu/0.1.4/Dockerfile
#57
Conversation
It's based on Optimum TPU TGI image.
containers/tgi/README.md
Outdated
To build a Text Generation Inference container for TPU, you can do it with this command: | ||
|
||
```bash | ||
docker build -t us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-tpu.0.1.3 -f containers/tgi/tpu/0.1.3/Dockerfile . | ||
|
||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we need that since the same command just different path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this may have been updated in #59, so most likely not needed, we can update the command directly within that PR as well as some other references such as the main README.md
table with the available containers, so maybe we can discard the changes on the README.md
files and copy those over to #59 instead to prevent conflicts when merging to main
, WDYT?
These features are included by TGI at build time, so the whole Dockerfile is now a variation of optimum-tpu's TGI one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great thanks for this PR @tengomucho 👏🏻
Could we add a bit more information on the changes w.r.t. the original Dockerfile as part of this PR's description?
Also I have a probably very opinionated suggestion, but is there any rationale under the installation of Python as python3
+ pip3
over just installing it as python
+ pip
, or as using conda
, mamba
, miniconda
, or any other environment manager solution?
Thanks a lot in advance! 🤗
containers/tgi/tpu/0.1.3/Dockerfile
Outdated
# Fetch and extract the TGI sources (TGI_VERSION is mandatory) | ||
FROM alpine AS tgi | ||
ARG TGI_VERSION='v2.0.3' | ||
RUN mkdir -p /tgi | ||
ADD https://github.com/huggingface/text-generation-inference/archive/${TGI_VERSION}.tar.gz /tgi/sources.tar.gz | ||
RUN tar -C /tgi -xf /tgi/sources.tar.gz --strip-components=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we don't need to "parametrize" the TGI_VERSION
atm since we have unique images per version so I guess we're better off without this, also alignment-wise with the rest of the Dockerfile
images pulling from either TGI or TEI.
# Fetch and extract the TGI sources (TGI_VERSION is mandatory) | |
FROM alpine AS tgi | |
ARG TGI_VERSION='v2.0.3' | |
RUN mkdir -p /tgi | |
ADD https://github.com/huggingface/text-generation-inference/archive/${TGI_VERSION}.tar.gz /tgi/sources.tar.gz | |
RUN tar -C /tgi -xf /tgi/sources.tar.gz --strip-components=1 | |
# Fetch and extract the TGI sources | |
FROM alpine AS tgi | |
RUN mkdir -p /tgi | |
ADD https://github.com/huggingface/text-generation-inference/archive/refs/tags/v2.0.3.tar.gz /tgi/sources.tar.gz | |
RUN tar -C /tgi -xf /tgi/sources.tar.gz --strip-components=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do this change, but the Dockerfile will diverge a lot from what we have in https://github.com/huggingface/optimum-tpu/blob/main/text-generation-inference/docker/Dockerfile, where the version was an argument. I do not mind setting it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was more a comment in terms of alignment, since this BUILD ARG won't be used when building these containers (at least within this repository), but not sure what are @philschmid thoughts around aligning as much as possible with the rest of the Dockerfiles here + with the original TGI Dockerfile too.
containers/tgi/tpu/0.1.3/Dockerfile
Outdated
|
||
# Build cargo components (adapted from TGI original Dockerfile) | ||
# Note that the build image is aligned on the same Linux version as the base image (Debian bookworm/ Ubuntu 22.04) | ||
FROM lukemathwalker/cargo-chef:latest-rust-1.77-bookworm AS chef |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK text-generation-inference
v2.0.3 is using rustc 1.78
instead of rustc 1.77
as pinned here, is there any rationale for this? Thanks in advance!
FROM lukemathwalker/cargo-chef:latest-rust-1.77-bookworm AS chef | |
FROM lukemathwalker/cargo-chef:latest-rust-1.78-bookworm AS chef |
See the reference for the original TGI image for that version https://github.com/huggingface/text-generation-inference/blob/v2.0.3/Dockerfile, as well as the matching bookworm
tag https://hub.docker.com/layers/lukemathwalker/cargo-chef/latest-rust-1.78-bookworm/images/sha256-e718cf4902f7c0009a00054f5faf89dc2ca4a6bef04022daf54cdc99ad97010b?context=explore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in previous comment, the rationale is that this is based on the Dockerfile from optimum-tpu. While I can change that, I think it's better to keep this aligned to the version from optimum-tpu, because that's what we test on Inference Endpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something we should keep in mind for upcoming releases of TGI with optimum-tpu
, or are those Dockerfile
s always going to differ based on TPU specifics? Because unless this is a blocker or a clear improvement over the base TGI Dockerfile I would try to keep those as aligned as possible i.e. less diff between GPU Dockerfile fromtext-generation-inference
and TPU Dockerfile from optimum-tpu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be better to keep the gpu and tpu Dockerfiles aligned, but I think it's even better to stick to tested versions. So what I propose is to stick to this version for now, and later I can raise the rust version in TGI/IE and then upgrade those settings here too.
containers/tgi/tpu/0.1.3/Dockerfile
Outdated
COPY --from=tgi /tgi/benchmark benchmark | ||
COPY --from=tgi /tgi/router router | ||
COPY --from=tgi /tgi/launcher launcher | ||
RUN cargo build --release --workspace --exclude benchmark --features google |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the addition of the --workspace
flag and --exclude benchmark
related to the TPUs? If so, could you elaborate on the rationale to better understand it? Thanks in advance!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, these are generic options used to build TGI. The --exclude benchmark
is to speed up the build without building that feature, the --workspace
I think is to reduce the number of cargo members it will build.
I inherited those from optimum-tpu's options.
It's a shame we need the --features google
: if the feature was dynamic (e.g.: triggered on the command line), we could probably have much smaller dockerfiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about excluding the text-generation-benchmark
here, since we eventually use it to benchmark the deployed containers, specially when deployed via GKE, so I would keep the benchmark
IMO, WDYT @philschmid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, ok I did not know about that, I can add the benchmark back, I think there should be no problem with that.
Hey @alvarobartt thanks for the feedback!
I added some details, I hope it explains things a bit.
I think the base image is based on Ubuntu 22.04, where the default way of invoking python is still python3 + pip3. Since we work on a container, I do not think any other virtual environment is necessary, unless there is a feature I am missing?
Thank you for your review! 🤗 |
- Remove redundant TGI version arg. - Update exposed port and hub cache dir.
Hi @tengomucho I've added some follow up comments to the unresolved comments, but so far so good! 🤗 I'll try this in GKE today to double check everything's working as expected, or did you already do so? Let me know which tests could you run and I'll run some more to ensure that the container is working as expected Any LLM or group of LLMs we should try? Also what hardware do you recommend for single and multi-TPU inference respectively? Thanks in advance! |
containers/tgi/tpu/0.1.3/Dockerfile
For now TPU TGI only supports single-host configurations, i.e.: up to |
Fair thanks! I meant single host multiple TPU accelerators i.e. |
containers/tgi/tpu/0.1.3/Dockerfile
containers/tgi/tpu/0.1.4/Dockerfile
Added container with TGI TPU, based on work from Optimum TPU, plus support of GCS to align with other containers.
The docker image for this is mostly based on optimum-tpu's TGI dockerfile, the rationale being to reduce the difference as much as possible to reduce the probability of introducing an unsupported feature.
The whole Dockerfile has been copied-and-pasted for now, because TGI needs to be re-built to have the google-features turned on.