Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add containers/tgi/tpu/0.1.4/Dockerfile #57

Merged
merged 7 commits into from
Jul 30, 2024
Merged

Conversation

tengomucho
Copy link
Collaborator

@tengomucho tengomucho commented Jul 12, 2024

Added container with TGI TPU, based on work from Optimum TPU, plus support of GCS to align with other containers.

The docker image for this is mostly based on optimum-tpu's TGI dockerfile, the rationale being to reduce the difference as much as possible to reduce the probability of introducing an unsupported feature.
The whole Dockerfile has been copied-and-pasted for now, because TGI needs to be re-built to have the google-features turned on.

Comment on lines 21 to 27
To build a Text Generation Inference container for TPU, you can do it with this command:

```bash
docker build -t us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-tpu.0.1.3 -f containers/tgi/tpu/0.1.3/Dockerfile .

```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need that since the same command just different path

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I can remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this may have been updated in #59, so most likely not needed, we can update the command directly within that PR as well as some other references such as the main README.md table with the available containers, so maybe we can discard the changes on the README.md files and copy those over to #59 instead to prevent conflicts when merging to main, WDYT?

containers/tgi/tpu/0.1.3/Dockerfile Outdated Show resolved Hide resolved
These features are included by TGI at build time, so the whole
Dockerfile is now a variation of optimum-tpu's TGI one.
@tengomucho tengomucho requested a review from philschmid July 12, 2024 14:39
@alvarobartt alvarobartt linked an issue Jul 15, 2024 that may be closed by this pull request
Copy link
Member

@alvarobartt alvarobartt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thanks for this PR @tengomucho 👏🏻

Could we add a bit more information on the changes w.r.t. the original Dockerfile as part of this PR's description?

Also I have a probably very opinionated suggestion, but is there any rationale under the installation of Python as python3 + pip3 over just installing it as python + pip, or as using conda, mamba, miniconda, or any other environment manager solution?

Thanks a lot in advance! 🤗

Comment on lines 1 to 6
# Fetch and extract the TGI sources (TGI_VERSION is mandatory)
FROM alpine AS tgi
ARG TGI_VERSION='v2.0.3'
RUN mkdir -p /tgi
ADD https://github.com/huggingface/text-generation-inference/archive/${TGI_VERSION}.tar.gz /tgi/sources.tar.gz
RUN tar -C /tgi -xf /tgi/sources.tar.gz --strip-components=1
Copy link
Member

@alvarobartt alvarobartt Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we don't need to "parametrize" the TGI_VERSION atm since we have unique images per version so I guess we're better off without this, also alignment-wise with the rest of the Dockerfile images pulling from either TGI or TEI.

Suggested change
# Fetch and extract the TGI sources (TGI_VERSION is mandatory)
FROM alpine AS tgi
ARG TGI_VERSION='v2.0.3'
RUN mkdir -p /tgi
ADD https://github.com/huggingface/text-generation-inference/archive/${TGI_VERSION}.tar.gz /tgi/sources.tar.gz
RUN tar -C /tgi -xf /tgi/sources.tar.gz --strip-components=1
# Fetch and extract the TGI sources
FROM alpine AS tgi
RUN mkdir -p /tgi
ADD https://github.com/huggingface/text-generation-inference/archive/refs/tags/v2.0.3.tar.gz /tgi/sources.tar.gz
RUN tar -C /tgi -xf /tgi/sources.tar.gz --strip-components=1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do this change, but the Dockerfile will diverge a lot from what we have in https://github.com/huggingface/optimum-tpu/blob/main/text-generation-inference/docker/Dockerfile, where the version was an argument. I do not mind setting it though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was more a comment in terms of alignment, since this BUILD ARG won't be used when building these containers (at least within this repository), but not sure what are @philschmid thoughts around aligning as much as possible with the rest of the Dockerfiles here + with the original TGI Dockerfile too.


# Build cargo components (adapted from TGI original Dockerfile)
# Note that the build image is aligned on the same Linux version as the base image (Debian bookworm/ Ubuntu 22.04)
FROM lukemathwalker/cargo-chef:latest-rust-1.77-bookworm AS chef
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK text-generation-inference v2.0.3 is using rustc 1.78 instead of rustc 1.77 as pinned here, is there any rationale for this? Thanks in advance!

Suggested change
FROM lukemathwalker/cargo-chef:latest-rust-1.77-bookworm AS chef
FROM lukemathwalker/cargo-chef:latest-rust-1.78-bookworm AS chef

See the reference for the original TGI image for that version https://github.com/huggingface/text-generation-inference/blob/v2.0.3/Dockerfile, as well as the matching bookworm tag https://hub.docker.com/layers/lukemathwalker/cargo-chef/latest-rust-1.78-bookworm/images/sha256-e718cf4902f7c0009a00054f5faf89dc2ca4a6bef04022daf54cdc99ad97010b?context=explore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in previous comment, the rationale is that this is based on the Dockerfile from optimum-tpu. While I can change that, I think it's better to keep this aligned to the version from optimum-tpu, because that's what we test on Inference Endpoints.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we should keep in mind for upcoming releases of TGI with optimum-tpu, or are those Dockerfiles always going to differ based on TPU specifics? Because unless this is a blocker or a clear improvement over the base TGI Dockerfile I would try to keep those as aligned as possible i.e. less diff between GPU Dockerfile fromtext-generation-inference and TPU Dockerfile from optimum-tpu

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be better to keep the gpu and tpu Dockerfiles aligned, but I think it's even better to stick to tested versions. So what I propose is to stick to this version for now, and later I can raise the rust version in TGI/IE and then upgrade those settings here too.

COPY --from=tgi /tgi/benchmark benchmark
COPY --from=tgi /tgi/router router
COPY --from=tgi /tgi/launcher launcher
RUN cargo build --release --workspace --exclude benchmark --features google
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the addition of the --workspace flag and --exclude benchmark related to the TPUs? If so, could you elaborate on the rationale to better understand it? Thanks in advance!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, these are generic options used to build TGI. The --exclude benchmark is to speed up the build without building that feature, the --workspace I think is to reduce the number of cargo members it will build.
I inherited those from optimum-tpu's options.
It's a shame we need the --features google: if the feature was dynamic (e.g.: triggered on the command line), we could probably have much smaller dockerfiles.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about excluding the text-generation-benchmark here, since we eventually use it to benchmark the deployed containers, specially when deployed via GKE, so I would keep the benchmark IMO, WDYT @philschmid?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, ok I did not know about that, I can add the benchmark back, I think there should be no problem with that.

containers/tgi/tpu/0.1.3/Dockerfile Outdated Show resolved Hide resolved
@tengomucho
Copy link
Collaborator Author

Hey @alvarobartt thanks for the feedback!

Great thanks for this PR @tengomucho 👏🏻

Could we add a bit more information on the changes w.r.t. the original Dockerfile as part of this PR's description?

I added some details, I hope it explains things a bit.

Also I have a probably very opinionated suggestion, but is there any rationale under the installation of Python as python3 + pip3 over just installing it as python + pip, or as using conda, mamba, miniconda, or any other environment manager solution?

I think the base image is based on Ubuntu 22.04, where the default way of invoking python is still python3 + pip3. Since we work on a container, I do not think any other virtual environment is necessary, unless there is a feature I am missing?

Thanks a lot in advance! 🤗

Thank you for your review! 🤗

- Remove redundant TGI version arg.
- Update exposed port and hub cache dir.
@tengomucho tengomucho requested a review from alvarobartt July 22, 2024 14:11
@alvarobartt
Copy link
Member

alvarobartt commented Jul 25, 2024

Hi @tengomucho I've added some follow up comments to the unresolved comments, but so far so good! 🤗

I'll try this in GKE today to double check everything's working as expected, or did you already do so? Let me know which tests could you run and I'll run some more to ensure that the container is working as expected Any LLM or group of LLMs we should try? Also what hardware do you recommend for single and multi-TPU inference respectively? Thanks in advance!

@alvarobartt alvarobartt changed the title Tpu tgi container Add containers/tgi/tpu/0.1.3/Dockerfile Jul 25, 2024
@tengomucho
Copy link
Collaborator Author

Hi @tengomucho I've added some follow up comments to the unresolved comments, but so far so good! 🤗

I'll try this in GKE today to double check everything's working as expected, or did you already do so? Let me know which tests could you run and I'll run some more to ensure that the container is working as expected Any LLM or group of LLMs we should try? Also what hardware do you recommend for single and multi-TPU inference respectively? Thanks in advance!

For now TPU TGI only supports single-host configurations, i.e.: up to v5e-litepod8, multi host support will come later. The models that you can support are the ones mentioned in the internal slack: https://huggingface.slack.com/archives/C03E4DQ9LAJ/p1719396122379289

@alvarobartt
Copy link
Member

For now TPU TGI only supports single-host configurations, i.e.: up to v5e-litepod8, multi host support will come later. The models that you can support are the ones mentioned in the internal slack

Fair thanks! I meant single host multiple TPU accelerators i.e. v5litepod-4, I'll run those in GKE then and report back if I encounter any issue, otherwise I'll just merge this PR, thanks 🤗

@alvarobartt alvarobartt changed the title Add containers/tgi/tpu/0.1.3/Dockerfile Add containers/tgi/tpu/0.1.4/Dockerfile Jul 26, 2024
@alvarobartt alvarobartt merged commit 8b5d742 into main Jul 30, 2024
@alvarobartt alvarobartt deleted the tpu-tgi-container branch July 30, 2024 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[TPU][TGI] Inference Container
3 participants