-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bump to vllm0.6.2 and add explicit chat template #3964
Conversation
/rerun-all |
20322fd
to
0299c07
Compare
With newer versions of transformers, it does not support default chat templates for models: huggingface/transformers#31733, thus I explicitly give a chat template in tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the PR title and description to also include any features that you'd like to bring in this release?
@terrytangyuan thanks a lot for your reply. I've updated the title and description! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me, however, my only concern is backward compatibility.
For users migrating to the new KServe version which brings it, will user be required to do anything else or?
@spolti I thought the added parameter has a default value |
We should have 0.14.0 stable release cut while we thoroughly test these new change separately. |
@johnugeorge Hi, thanks for your comment! Do you mean that I should expect some more PRs on |
e21db9e
to
64266a5
Compare
…atting Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
Signed-off-by: yxia216 <[email protected]>
5d433f5
to
60598c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
Thanks @hustxiayang ! great work on this |
* explicitly give a chat template Signed-off-by: yxia216 <[email protected]> * fix dummy model issue, fix python version smaller than 3.10, and formatting Signed-off-by: yxia216 <[email protected]> * fix vLLMModel Signed-off-by: yxia216 <[email protected]> * change the interface of CreateChatCompletionRequest Signed-off-by: yxia216 <[email protected]> * update dummy model's para Signed-off-by: yxia216 <[email protected]> * consitent with OpenAIGPTTokenizer and OpenAIGPTModel Signed-off-by: yxia216 <[email protected]> * give a chat template if there is no Signed-off-by: yxia216 <[email protected]> * update the response and update the readme Signed-off-by: yxia216 <[email protected]> * update the chat_template Signed-off-by: yxia216 <[email protected]> * update data Signed-off-by: yxia216 <[email protected]> * add test of chat temmplate for tokenizer Signed-off-by: yxia216 <[email protected]> * jinja2 template format Signed-off-by: yxia216 <[email protected]> * use a simpler chat template --------- Signed-off-by: yxia216 <[email protected]> Signed-off-by: Snehomoy <[email protected]>
* add tags to rest server timing logs to differentiate cpu and wall time (kserve#3954) Signed-off-by: Gregory Keith <[email protected]> * Implement Huggingface model download in storage initializer (kserve#3584) * initial commit for hugging face model download and load Signed-off-by: Andrews Arokiam <[email protected]> * bug fix on storage initializer Signed-off-by: Andrews Arokiam <[email protected]> * added hf_token and unittests Signed-off-by: Andrews Arokiam <[email protected]> * separate hf-storage-initializer image to reduce image size Signed-off-by: Andrews Arokiam <[email protected]> * review comment changes Signed-off-by: Andrews Arokiam <[email protected]> * snapshot download Signed-off-by: Andrews Arokiam <[email protected]> * use existing image for storage initializer Signed-off-by: Andrews Arokiam <[email protected]> * resolved merge conflicts Signed-off-by: Andrews Arokiam <[email protected]> * added hf storage uri validation Signed-off-by: Andrews Arokiam <[email protected]> * resolved merge conflicts Signed-off-by: Andrews Arokiam <[email protected]> --------- Signed-off-by: Andrews Arokiam <[email protected]> * Update OWNERS file (kserve#3966) Signed-off-by: Dan Sun <[email protected]> * Cluster local model controller (kserve#3860) * Consolidate into one commit Signed-off-by: Jin Dong <[email protected]> * Fix configmap format Signed-off-by: Jin Dong <[email protected]> * Fix configmap Signed-off-by: Jin Dong <[email protected]> * Log configmap read error Signed-off-by: Jin Dong <[email protected]> * fix naming Signed-off-by: Dan Sun <[email protected]> * Update comments Signed-off-by: Jin Dong <[email protected]> * Add enabled flag to configmap and avoid cluster resource check in isvc defaulter Signed-off-by: Jin Dong <[email protected]> * move client into the local model block Signed-off-by: Dan Sun <[email protected]> * Fix lint Signed-off-by: Jin Dong <[email protected]> --------- Signed-off-by: Jin Dong <[email protected]> Signed-off-by: Dan Sun <[email protected]> Co-authored-by: Dan Sun <[email protected]> * Prepare for 0.14.0-rc1release and automate sync process (kserve#3970) * Sync helm chart with kustomize Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Update manifest generation script to sync helm charts Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Make kserve-addressable-resolver role optional Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Prepare for 0.14.0-rc1 release Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Update release process Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Comment out crd sync script in make Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Fix helm template syntax Signed-off-by: Sivanantham Chinnaiyan <[email protected]> --------- Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * add a new API for multi-node/multi-gpu (kserve#3871) * add a new API for multi-node/multi-gpu Signed-off-by: jooho lee <[email protected]> * fix gitaction Signed-off-by: jooho lee <[email protected]> * fix merging conflict Signed-off-by: jooho lee <[email protected]> * fix gitaction fail Signed-off-by: jooho lee <[email protected]> * regenerate codegen/manifests Signed-off-by: jooho lee <[email protected]> * Apply suggestions from code review Co-authored-by: Dan Sun <[email protected]> Signed-off-by: Jooho Lee <[email protected]> * remove unnecessary comment Signed-off-by: jooho lee <[email protected]> * change the type of workerSpec in isvc to PodSpec Signed-off-by: jooho lee <[email protected]> * update controller-gen version Signed-off-by: jooho lee <[email protected]> * remove replicas from workerSpec Signed-off-by: jooho lee <[email protected]> * fix conflict merging Signed-off-by: jooho lee <[email protected]> * added size(replicas) for workerSpec again Signed-off-by: jooho lee <[email protected]> * add WorkerSpec to inferenceService Signed-off-by: jooho lee <[email protected]> * fix go linter Signed-off-by: jooho lee <[email protected]> --------- Signed-off-by: jooho lee <[email protected]> Signed-off-by: Jooho Lee <[email protected]> Signed-off-by: Jooho Lee <[email protected]> Co-authored-by: Dan Sun <[email protected]> * Fix update-openapigen.sh that can be executed from kserve dir (kserve#3924) * fix openapigen.sh that can be executed from kserve dir Signed-off-by: jooho lee <[email protected]> * regenerate codegen/manifests Signed-off-by: jooho lee <[email protected]> * Update go.sum Signed-off-by: Dan Sun <[email protected]> --------- Signed-off-by: jooho lee <[email protected]> Signed-off-by: Dan Sun <[email protected]> Co-authored-by: Dan Sun <[email protected]> * Add python 3.12 support and remove python 3.8 support (kserve#3645) * Support python 3.12 Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Update dependencies Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Update deps to support 3.12 Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Remove python 3.8 support Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Remove skip for infer client test Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Fix port forward Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Fix sklearn pandas dep Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * skip pydantic v1 test for py 3.12 Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add setuptools dep for pmml Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Fix lgb Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Include setuptools for paddle Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Include setuptools for huggingface Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Rebase Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Rebase Signed-off-by: Sivanantham Chinnaiyan <[email protected]> --------- Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Fix openssl vulnerability CWE-1395 (kserve#3975) Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Fix Kubernetes Doc Links (kserve#3670) * Bump version to 0.13.0-rc0 (kserve#3665) Signed-off-by: Curtis Maddalozzo <[email protected]> Signed-off-by: jordanyono <[email protected]> * fixing docs Signed-off-by: jordanyono <[email protected]> * fix spelling mistake Signed-off-by: jordanyono <[email protected]> --------- Signed-off-by: Curtis Maddalozzo <[email protected]> Signed-off-by: jordanyono <[email protected]> Co-authored-by: Curtis Maddalozzo <[email protected]> * Fix kserve local testing env (kserve#3981) * Fix local testing Signed-off-by: Dan Sun <[email protected]> * Fix codegen Signed-off-by: Dan Sun <[email protected]> --------- Signed-off-by: Dan Sun <[email protected]> * Fix streaming response not working properly with logger (kserve#3847) Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add a flag for automount serviceaccount token (kserve#3979) * Add a flag for automount serviceaccount Signed-off-by: Jin Dong <[email protected]> * Set default to false Signed-off-by: Jin Dong <[email protected]> * Default to true Signed-off-by: Jin Dong <[email protected]> * Fix test error Signed-off-by: Jin Dong <[email protected]> * Update openapi generated.go Signed-off-by: Jin Dong <[email protected]> * Fix python lint Signed-off-by: Jin Dong <[email protected]> * Fix config loading Signed-off-by: Jin Dong <[email protected]> --------- Signed-off-by: Jin Dong <[email protected]> * Do not set security context on the storage initializer from user container (kserve#3985) * Do not set security context on the storage initializer from user container Signed-off-by: Jin Dong <[email protected]> * Add securityContext to the default storage container in the helm chart Signed-off-by: Jin Dong <[email protected]> --------- Signed-off-by: Jin Dong <[email protected]> * Modelcar race condition mitigation with an init container (kserve#3932) This adds the model container as an init-container to mitigate a race condition that would happen if the model container is not present on the cluster-node. The race condition happens if the cluster is able to fetch and start the runtime container before the modelcar is fetched. This would lead to the runtime to terminate with error. By configuring the model container as an init-container the runtime won't start until the modelcar is fetched. Although there is still the risk of a race condition when the cluster schedules the runtime container first, the pod should stabilize after a few restarts of the runtime container and should either prevent a CrashLoopBackOff event on the pod, or the crash event would finish quickly. This improves compatibility with the runtimes which can now stay agnostic to the modelcar implementation, until better techniques (like native sidecars, and oci volume mounts) become mature. Signed-off-by: Edgar Hernández <[email protected]> * Fix: Headers passing for v1/v2 endpoints (kserve#3669) * Initial commit for headers passing issue Signed-off-by: Andrews Arokiam <[email protected]> * modifying the e2e test for rebase conflict Signed-off-by: Andrews Arokiam <[email protected]> * bug fix on unittest Signed-off-by: Andrews Arokiam <[email protected]> * review changes Signed-off-by: Andrews Arokiam <[email protected]> * fix for test failure Signed-off-by: Andrews Arokiam <[email protected]> * bug fix on e2e test Signed-off-by: Andrews Arokiam <[email protected]> * overridding the entrypoint of custom model images Signed-off-by: Andrews Arokiam <[email protected]> * custom response header Signed-off-by: Andrews Arokiam <[email protected]> * fix for unittest failure Signed-off-by: Andrews Arokiam <[email protected]> * added custom response headers in post process Signed-off-by: Andrews Arokiam <[email protected]> * added predict time latency in example response header Signed-off-by: Andrews Arokiam <[email protected]> * fix OOM --------- Signed-off-by: Andrews Arokiam <[email protected]> Co-authored-by: Dan Sun <[email protected]> * Torchserve security update (kserve#3774) * security update Signed-off-by: udai <[email protected]> * adding sign off Signed-off-by: udai <[email protected]> --------- Signed-off-by: udai <[email protected]> * Pin ubuntu 22.04 for minikube setup action (kserve#3994) Signed-off-by: Jin Dong <[email protected]> * KServe 0.14 Release (kserve#3988) * temp commit Signed-off-by: Jin Dong <[email protected]> * python-release.sh Signed-off-by: Jin Dong <[email protected]> --------- Signed-off-by: Jin Dong <[email protected]> * bump to vllm0.6.2 add explicit chat template (kserve#3964) * explicitly give a chat template Signed-off-by: yxia216 <[email protected]> * fix dummy model issue, fix python version smaller than 3.10, and formatting Signed-off-by: yxia216 <[email protected]> * fix vLLMModel Signed-off-by: yxia216 <[email protected]> * change the interface of CreateChatCompletionRequest Signed-off-by: yxia216 <[email protected]> * update dummy model's para Signed-off-by: yxia216 <[email protected]> * consitent with OpenAIGPTTokenizer and OpenAIGPTModel Signed-off-by: yxia216 <[email protected]> * give a chat template if there is no Signed-off-by: yxia216 <[email protected]> * update the response and update the readme Signed-off-by: yxia216 <[email protected]> * update the chat_template Signed-off-by: yxia216 <[email protected]> * update data Signed-off-by: yxia216 <[email protected]> * add test of chat temmplate for tokenizer Signed-off-by: yxia216 <[email protected]> * jinja2 template format Signed-off-by: yxia216 <[email protected]> * use a simpler chat template --------- Signed-off-by: yxia216 <[email protected]> * bump to vllm0.6.3 (kserve#4001) Signed-off-by: yxia216 <[email protected]> * Feature: Add hf transfer (kserve#4000) * Add hf transfer Signed-off-by: tjandy98 <[email protected]> * Add hf transfer env Signed-off-by: tjandy98 <[email protected]> --------- Signed-off-by: tjandy98 <[email protected]> * Fix snyk scan null error (kserve#3974) Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Update quick install script (kserve#4005) Signed-off-by: Johnu George <[email protected]> * Local Model Node CR (kserve#3978) * init CR Signed-off-by: Gavin Li <[email protected]> * make generate Signed-off-by: Gavin Li <[email protected]> * make manifests Signed-off-by: Gavin Li <[email protected]> * black format Signed-off-by: Gavin Li <[email protected]> * fix generated python code Signed-off-by: Gavin Li <[email protected]> * feedback Signed-off-by: Gavin Li <[email protected]> * more feedback Signed-off-by: Gavin Li <[email protected]> * black format Signed-off-by: Gavin Li <[email protected]> * make manifests Signed-off-by: Gavin Li <[email protected]> --------- Signed-off-by: Gavin Li <[email protected]> * Reduce E2Es dependency on CI environment (2) (kserve#4008) Reduce E2Es dependency on CI environment Some code of the E2Es assume the environment is GitHub, because it is referring to GitHub-specific variables. This PR focuses on the `kserve/custom-model-grpc` container image, so that no Python code of the E2Es using this image is referencing the `github_sha` variable. Also, a small improvement on the `get_isvc_endpoint` utility function is done to use the schema in the endpoint specified in the status of the InferenceService, rather than hard-coding to plain-text HTTP. This adds compatibility for CI environments where KServe ConfigMap has been configured with `urlScheme: https` for the Ingress. Signed-off-by: Edgar Hernández <[email protected]> * Allow GCS to download single file (kserve#4015) allow gcs to download single file fixes kserve#4013 Signed-off-by: Spolti <[email protected]> * bump to vllm0.6.3.post1 (kserve#4023) Signed-off-by: yxia216 <[email protected]> * Set default for SamplingParams.max_tokens in OpenAI requests if unset (kserve#4020) * Set default for SamplingParams.max_tokens in OpenAI requests if unset Signed-off-by: Kevin Mingtarja <[email protected]> * Fix lint Signed-off-by: Kevin Mingtarja <[email protected]> * Fix formatting Signed-off-by: Kevin Mingtarja <[email protected]> --------- Signed-off-by: Kevin Mingtarja <[email protected]> * Add tools functionality to vLLM (kserve#4033) * Add tools to chat template Signed-off-by: Arjun Bhalla <[email protected]> Linting Signed-off-by: Arjun Bhalla <[email protected]> add test Signed-off-by: Arjun Bhalla <[email protected]> Fix linting manually Signed-off-by: Arjun Bhalla <[email protected]> * Fix linting Signed-off-by: Arjun Bhalla <[email protected]> --------- Signed-off-by: Arjun Bhalla <[email protected]> Signed-off-by: Arjun Bhalla <[email protected]> Co-authored-by: Arjun Bhalla <[email protected]> * Use apt-get upgrade for CVE fixes Signed-off-by: Dan Sun <[email protected]> * For vllm users, our parser should be able to support both - and _ (kserve#3933) Signed-off-by: yxia216 <[email protected]> * Add tools unpacking for vLLM (kserve#4035) * Add tools to chat template Signed-off-by: Arjun Bhalla <[email protected]> Linting Signed-off-by: Arjun Bhalla <[email protected]> add test Signed-off-by: Arjun Bhalla <[email protected]> Fix linting manually Signed-off-by: Arjun Bhalla <[email protected]> * Fix linting Signed-off-by: Arjun Bhalla <[email protected]> * Add tools unpacking for vllm Signed-off-by: Arjun Bhalla <[email protected]> * Add sanity check test Signed-off-by: Arjun Bhalla <[email protected]> --------- Signed-off-by: Arjun Bhalla <[email protected]> Signed-off-by: Arjun Bhalla <[email protected]> Co-authored-by: Arjun Bhalla <[email protected]> * Multi-Node Inference Implementation (kserve#3972) Signed-off-by: jooho lee <[email protected]> * Enhance InjectAgent to Handle Only HTTPGet, TCP Readiness Probes (kserve#4012) * Fix readiness probe logic and update test scenarios for HTTPGet, TCPSocket, and Exec handling Signed-off-by: Snehomoy <[email protected]> * Update: Refactor logic for readiness probe handling Signed-off-by: Snehomoy <[email protected]> * Apply gofmt formatting to agent_injector.go Signed-off-by: Snehomoy <[email protected]> * Added logger to replace fmt.Printf for better consistency and observability Signed-off-by: Snehomoy <[email protected]> * Formatted file using goimports with -local Signed-off-by: Snehomoy <[email protected]> --------- Signed-off-by: Snehomoy <[email protected]> * Feat: Fix memory issue by replacing io.ReadAll with io.Copy (kserve#4017) (kserve#4018) * Feat: Fix memory issue by replacing io.ReadAll with io.Copy (kserve#4017) Previously, io.ReadAll was causing out-of-memory problems when downloading large files from GCS. This change replaces io.ReadAll() with io.Copy() to stream data and prevent excessive memory usage. Signed-off-by: ops-jaeha <[email protected]> * Feat: Fix add newline at end of file to satisfy golang lint Signed-off-by: ops-jaeha <[email protected]> * Feat: Refact log Info for golang lint (kserve#4017) Signed-off-by: ops-jaeha <[email protected]> --------- Signed-off-by: ops-jaeha <[email protected]> * Update alibiexplainer example (kserve#4004) chore: Fix CVE-2024-26130 - NULL Pointer Dereference - Upgrade cryptography to version 42.0.4 or higher. Update Python version to match KServe 0.14.0 Update tensorflow, tensorflow-io-gcs-filesystem and dill libraries Signed-off-by: Spolti <[email protected]> * Fix huggingface build runs out of storage in CI (kserve#4044) Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Update snyk scan to include new images (kserve#4042) Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Introducing KServe Guru on Gurubase.io (kserve#4038) Signed-off-by: Kursat Aktas <[email protected]> * Fix Hugging Face server EncoderModel not returning probabilities (kserve#4024) * Fix huggingface srever not work with return_probabilities Signed-off-by: oplushappy <[email protected]> * Fix pytest huggingface server assertion error Signed-off-by: oplushappy <[email protected]> * Fix the lint error and Add approx for assertion Signed-off-by: oplushappy <[email protected]> * Parse string output to dictionary for accurate assertion Signed-off-by: oplushappy <[email protected]> * Fix linting error Signed-off-by: oplushappy <[email protected]> --------- Signed-off-by: oplushappy <[email protected]> * Add deeper readiness check for transformer (kserve#3348) * Add deeper readiness and liveness check for transformer Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add unit tests Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * put the feature behind flag Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Update tests Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * resolve comments Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Make use of inference client Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add e2e test Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Make inference client singleton and lazy initialize Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Raise 503 If server is not ready / live Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Add test for custom transformer with rest protocol Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Fix CI running out of space Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Increase memory limit Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Check for model ready Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Webhook debug Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Address reviews Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Check for retry count in grpc client Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Update python/kserve/kserve/model_server.py Co-authored-by: Dan Sun <[email protected]> Signed-off-by: Sivanantham <[email protected]> --------- Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Signed-off-by: Sivanantham <[email protected]> Co-authored-by: Dan Sun <[email protected]> * Fix Starlette Denial of service (DoS) via multipart/form-data (kserve#4006) chore: Fix CVE-2024-47874 Signed-off-by: Spolti <[email protected]> * remove duplicated import "github.com/onsi/gomega" (kserve#4051) remove duplicated import Signed-off-by: carlory <[email protected]> * Fix localmodel controller name in snyk scan workflow (kserve#4054) Signed-off-by: Sivanantham Chinnaiyan <[email protected]> * Fix azure blob storage access key env not mounted (kserve#4064) * add storageaccesskey to azure env builder Signed-off-by: bentohset <[email protected]> * update integration and unit test for azure storage access key Signed-off-by: bentohset <[email protected]> * fix formatting Signed-off-by: bentohset <[email protected]> --------- Signed-off-by: bentohset <[email protected]> * Storage Initializer support single digit azure DNS zone ID (kserve#4070) * support single digit azure zone id Signed-off-by: bentohset <[email protected]> * add single digit azure dns zone id tests Signed-off-by: bentohset <[email protected]> * fix formatting Signed-off-by: bentohset <[email protected]> --------- Signed-off-by: bentohset <[email protected]> * support text embedding task in huggingfaceserver Signed-off-by: Kevin Mingtarja <[email protected]> * fix lint errors Signed-off-by: Kevin Mingtarja <[email protected]> * format code Signed-off-by: Kevin Mingtarja <[email protected]> * bring back enhancements after getting kserve up-to-date (#42) * improve dockerfile, makefile, readme * support custom classification labels, refactor postprocess * support text embedding task * improve support for token classification (named entity recognition) * use self.model_config.id2label by default (#45) * minor cleanup and fixes after rebase * use approx in test_input_padding * revert token_classification changes * fix test --------- Signed-off-by: Gregory Keith <[email protected]> Signed-off-by: Andrews Arokiam <[email protected]> Signed-off-by: Dan Sun <[email protected]> Signed-off-by: Jin Dong <[email protected]> Signed-off-by: Sivanantham Chinnaiyan <[email protected]> Signed-off-by: jooho lee <[email protected]> Signed-off-by: Jooho Lee <[email protected]> Signed-off-by: Jooho Lee <[email protected]> Signed-off-by: Curtis Maddalozzo <[email protected]> Signed-off-by: jordanyono <[email protected]> Signed-off-by: Edgar Hernández <[email protected]> Signed-off-by: udai <[email protected]> Signed-off-by: yxia216 <[email protected]> Signed-off-by: tjandy98 <[email protected]> Signed-off-by: Johnu George <[email protected]> Signed-off-by: Gavin Li <[email protected]> Signed-off-by: Spolti <[email protected]> Signed-off-by: Kevin Mingtarja <[email protected]> Signed-off-by: Arjun Bhalla <[email protected]> Signed-off-by: Arjun Bhalla <[email protected]> Signed-off-by: Snehomoy <[email protected]> Signed-off-by: ops-jaeha <[email protected]> Signed-off-by: Kursat Aktas <[email protected]> Signed-off-by: oplushappy <[email protected]> Signed-off-by: Sivanantham <[email protected]> Signed-off-by: carlory <[email protected]> Signed-off-by: bentohset <[email protected]> Signed-off-by: Kevin Mingtarja <[email protected]> Signed-off-by: Kevin Mingtarja <[email protected]> Co-authored-by: gfkeith <[email protected]> Co-authored-by: Andrews Arokiam <[email protected]> Co-authored-by: Dan Sun <[email protected]> Co-authored-by: Jin Dong <[email protected]> Co-authored-by: Sivanantham <[email protected]> Co-authored-by: Jooho Lee <[email protected]> Co-authored-by: jordanyono <[email protected]> Co-authored-by: Curtis Maddalozzo <[email protected]> Co-authored-by: Edgar Hernández <[email protected]> Co-authored-by: udaij12 <[email protected]> Co-authored-by: hustxiayang <[email protected]> Co-authored-by: tjandy98 <[email protected]> Co-authored-by: Johnu George <[email protected]> Co-authored-by: Gavin Li <[email protected]> Co-authored-by: Filippe Spolti <[email protected]> Co-authored-by: Arjun Bhalla <[email protected]> Co-authored-by: Arjun Bhalla <[email protected]> Co-authored-by: Snehomoy.M <[email protected]> Co-authored-by: 이재하 <[email protected]> Co-authored-by: Kursat Aktas <[email protected]> Co-authored-by: oplushappy <[email protected]> Co-authored-by: 杨朱 · Kiki <[email protected]> Co-authored-by: Benjamin Toh <[email protected]>
What this PR does / why we need it:
Bump to vLLM0.6.2 to support multimodal models.
vLLM0.6.2 requires newer version of transformers '>=4.45.0`, some related changes are:
It does not support default chat templates for models: 🚨 No more default chat templates huggingface/transformers#31733.
Thus, add an optional
chat template
member inCreateChatCompletionRequest
class, which is similar to https://github.com/vllm-project/vllm/blob/9aaf14c62e16a7c74b5192a44d01a78125dab2fc/vllm/entrypoints/openai/protocol.py#L239In tokenizer,
clean_up_tokenization_spaces
would be set as false by default, check details: [BUG] GPT-2 tokenizer is NOT invertible huggingface/transformers#31884. Thus, change the assertions in testtest_input_padding_with_pad_token_not_specified
Also added a simple unit test of
chat_template
for tokenizer.Type of changes