Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPA improvements #386

Merged
merged 15 commits into from
Sep 11, 2024
Merged

HPA improvements #386

merged 15 commits into from
Sep 11, 2024

Conversation

eero-t
Copy link
Contributor

@eero-t eero-t commented Sep 3, 2024

Continuation of / depends on: #327

Description

Helm charts need fixes for HPA to work (with less changes) out of the box:

  • Increase probe timings to avoid scaled up CPU pods never getting to Ready state
  • Fix current (completely wrong) resource limit/request settings for inferencing components
  • Reduce default TGI scaling values to avoid issues when values are not properly configured
  • Change ChatQnA Helm chart to work with Helm installed version of Prometheus (WIP)
  • Related Helm README updates & fixes

Additionally PR also adds:

  • Support for enabling HPA-scaling in ChatQnA chart only for relevant subcharts

(Best reviewed by checking individual commits.)

Issues

n/a

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Dependencies

n/a

Tests

Manual testing for the changes (before splitting them to separate commits & documenting them).

@eero-t eero-t force-pushed the hpa-improvements branch 2 times, most recently from 1989441 to 3b451ba Compare September 3, 2024 16:47
@eero-t
Copy link
Contributor Author

eero-t commented Sep 3, 2024

Force pushed pre-commit CI fixes (as there were no review comments yet).

@eero-t
Copy link
Contributor Author

eero-t commented Sep 3, 2024

If README.md HPA section is too long, it could be moved to its own HPA.md file.

@lianhao
Copy link
Collaborator

lianhao commented Sep 4, 2024

If README.md HPA section is too long, it could be moved to its own HPA.md file.

Yes, that would be a good idea.

@eero-t
Copy link
Contributor Author

eero-t commented Sep 4, 2024

As to the generated manifests: https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector/config/HPA/

I think those should be thrown away rather than updated.

Using HPA with GMC when inferencing is done on CPU does not make much sense. GMC allows e.g. changing model at run-time, but when scaling the CPU services, it means that deployment resource requests etc should be updated accordingly (based on used model, data type used with it, TGI/TEI versions and underlying node resources), and I don't think GMC has any support for that.

(Using HPA with GMC for Gaudi accelerated services would be a bit different matter, but current HPA rules are just for CPU charts.)

@irisdingbj
Copy link
Collaborator

@eero-t To make HPA support in GMC is the target and @ichbinblau is working on this already. GMC allows to change the model dynamically but it will also support static pipiline.

@eero-t
Copy link
Contributor Author

eero-t commented Sep 4, 2024

Not having used Helm before, I had not realized that gaudi-values.yaml overrides just specified parts from values.yaml file (in top-level and sub-charts), instead of replacing the values.yaml content completely. I.e. enabling HPA in values.yaml means it being enabled also when gaudi-values.yaml (or nv-values.yaml) is used.

This got me thinking...

If settings needed for scaling (resource requests corresponding to specified model, probe timings etc) were in separate files, user could enable HPA without needing to modify values.yaml files in top-level + sub-charts.

Enabling HPA would then look something like this, for CPU:

helm install chatqna ... -f chatqna/hpa-values.yaml -f common/tgi/cpu-values.yaml -f common/tei/cpu-values.yaml

And for Gaudi:

helm install chatqna ... -f chatqna/hpa-values.yaml -f chatqna/gaudi-values.yaml

Thoughts?

@eero-t
Copy link
Contributor Author

eero-t commented Sep 4, 2024

Note: above does not fix the issue for (CPU) scaling, of scaled pods' resource requests needing to match their resource usage with the LLM model specified for them => overriding the model with a global Helm value, or by GMC, is not really a good idea as then there's a mismatch between those.

When Gaudi or GPU is used, that's less of a issue, as long as single device per Pod is OK. If larger models would require multiple devices, or smaller models would require (for better utilization / cost reasons) sharing GPUs between pods, then changing models would also be bad idea, if resource requests are not changed accordingly.

@lianhao
Copy link
Collaborator

lianhao commented Sep 5, 2024

Not having used Helm before, I had not realized that gaudi-values.yaml overrides just specified parts from values.yaml file (in top-level and sub-charts), instead of replacing the values.yaml content completely. I.e. enabling HPA in values.yaml means it being enabled also when gaudi-values.yaml (or nv-values.yaml) is used.

This got me thinking...

If settings needed for scaling (resource requests corresponding to specified model, probe timings etc) were in separate files, user could enable HPA without needing to modify values.yaml files in top-level + sub-charts.

Enabling HPA would then look something like this, for CPU:

helm install chatqna ... -f chatqna/hpa-values.yaml -f common/tgi/cpu-values.yaml -f common/tei/cpu-values.yaml

And for Gaudi:

helm install chatqna ... -f chatqna/hpa-values.yaml -f chatqna/gaudi-values.yaml

Thoughts?

In helm chart, values.yaml is alread used. Any addtional value files specified by -f <xxx.yaml> is just to add patch to the default values.yaml, not to replace it.

Since in component helm charts(those charts under common directory, HPA is disabled by default, i.e. in their values.yamlfiles HPA is set to false), there is no need to put HPA related settings in e2e helm chart's(i.e. chatqna) values.yaml file. What we need here is a separate hpa.yaml file in e2e helm chart, which enable the corresponding HPA for components and set default optimal configurations for the out-of-box e2e workload.

So helm install chatqna ... -f chatqna/hpa-values.yaml -f chatqna/gaudi-values.yaml seems a more propriate way to go, compared to helm install chatqna ... -f chatqna/hpa-values.yaml -f common/tgi/cpu-values.yaml -f common/tei/cpu-values.yaml. I don't think we should put separate hpa-values.yaml files in component helm chart, because it could be dependent on model and workload. So I would suggest put all the adjustable HPA related settings in hpa-values.yaml in top level e2e helm chart, in which we should also tell users to adjust its settings for their own model case if needed.

@eero-t
Copy link
Contributor Author

eero-t commented Sep 5, 2024

Since in component helm charts(those charts under common directory, HPA is disabled by default, i.e. in their values.yamlfiles HPA is set to false), there is no need to put HPA related settings in e2e helm chart's(i.e. chatqna) values.yaml file.

If user does NOT use -f hpa-values.yaml, (disabled) HPA setting is still needed in values.yaml as Helm errors on custom metrics template.

Btw. Theresa just mentioned that Helm refuses to install PrometheusAdapter custom metrics config (for ChatQnA) over configMap that has been installed by another Helm chart, the Prometheus one... (I hadn't noticed this because I use Helm to generate manifest, and then apply that after reviewing the results, I don't install things directly with Helm. Also my Prometheus install comes from kube-prometheus manifests, not from Helm chart for that.)

=> I'm not sure how to best handle that for 1.0 milestone. I could test doing all installs directly with Helm and find some way to workaround any issues, but that will take at least few days. Another option is to document what I've actually tested ie. use Helm just to produce manifests, and installing those...

I would suggest put all the adjustable HPA related settings in hpa-values.yaml in top level e2e helm chart, in which we should also tell users to adjust its settings for their own model case if needed.

Because suitable subcomponent resource requests & probe timings differ between Gaudi and CPU (besides depending on model), and are needed for scaling to work in general (not just for HPA), I think they should be in separate file from generic HPA setup. I.e. top-level chart would have cpu-values.yaml that contains values that are relevant when subcharts use CPU, regardless of whether HPA is used:

  • ChatQnA TGI/TEI on CPU: -f cpu-values.yaml
    • With HPA scaling: -f cpu-values.yaml -f hpa-values.yaml
  • ChatQnA TGI/TEI on Gaudi: -f gaudi-values.yaml
    • With HPA scaling: -f gaudi-values.yaml -f hpa-values.yaml

Model should also be set in top-level chart device values file, not values.yaml because model (and data type used with it) dictates the resource usage, how much RAM it needs, does it fit into single device, or need to be sharded over multiple ones etc.

@eero-t
Copy link
Contributor Author

eero-t commented Sep 5, 2024

FYI: I filed Prometheus-operator + Prometheus-adapter tickets on getting proper support for updating Prometheus-adapter, needed to properly fix custom metrics installation:

That would allow reasonable automation for custom metrics, so I'd appreciate thumb-ups in the tickets. :-)

@eero-t eero-t force-pushed the hpa-improvements branch 2 times, most recently from 656aee0 to cba8b18 Compare September 5, 2024 19:19
@eero-t
Copy link
Contributor Author

eero-t commented Sep 5, 2024

I split HPA & CPU values to their own values files, and fixed merge conflict with main (which required force pushing). If those look fine, I can drop the Fix: commits, and last commit reverting them.

PS. In some later PR, extra variables can be added for HPA rules and TGI command line options, so that they match underlying HW better.

(Gaudi TGI can start much faster, so faster HPA scaling values could be used for Gaudi. There are Gaudi specific options that can provide significantly faster throughput for it, and I really hope there being also TGI options that can reduce its excruciatingly slow startup with CPU warmup. Newer CPUs could also use bfloat16 instead of float32, which would improve TGI perf a bit in addition to halving its memory usage.)

@eero-t
Copy link
Contributor Author

eero-t commented Sep 5, 2024

Interestingly CI automatically started running tests for the new values files:

  • chatqna, xeon, cpu-values
  • chatqna, xeon, hpa-values

However, hpa-values requires Prometheus install first so that required API types are recognized. And on Xeon, while cpu-values works fine alone, hpa-values is supposed to be used only with cpu-values.

There was also unrelated (Gaudi guardrails) test failure due to:
FailedScheduling 4m40s (x2 over 10m) default-scheduler 0/2 nodes are available: 1 Insufficient habana.ai/gaudi

@lianhao
Copy link
Collaborator

lianhao commented Sep 6, 2024

Since in component helm charts(those charts under common directory, HPA is disabled by default, i.e. in their values.yamlfiles HPA is set to false), there is no need to put HPA related settings in e2e helm chart's(i.e. chatqna) values.yaml file.

If user does NOT use -f hpa-values.yaml, (disabled) HPA setting is still needed in values.yaml as Helm errors on custom metrics template.

Btw. Theresa just mentioned that Helm refuses to install PrometheusAdapter custom metrics config (for ChatQnA) over configMap that has been installed by another Helm chart, the Prometheus one... (I hadn't noticed this because I use Helm to generate manifest, and then apply that after reviewing the results, I don't install things directly with Helm. Also my Prometheus install comes from kube-prometheus manifests, not from Helm chart for that.)

=> I'm not sure how to best handle that for 1.0 milestone. I could test doing all installs directly with Helm and find some way to workaround any issues, but that will take at least few days. Another option is to document what I've actually tested ie. use Helm just to produce manifests, and installing those...

I would suggest put all the adjustable HPA related settings in hpa-values.yaml in top level e2e helm chart, in which we should also tell users to adjust its settings for their own model case if needed.

Because suitable subcomponent resource requests & probe timings differ between Gaudi and CPU (besides depending on model), and are needed for scaling to work in general (not just for HPA), I think they should be in separate file from generic HPA setup. I.e. top-level chart would have cpu-values.yaml that contains values that are relevant when subcharts use CPU, regardless of whether HPA is used:

  • ChatQnA TGI/TEI on CPU: -f cpu-values.yaml

    • With HPA scaling: -f cpu-values.yaml -f hpa-values.yaml
  • ChatQnA TGI/TEI on Gaudi: -f gaudi-values.yaml

    • With HPA scaling: -f gaudi-values.yaml -f hpa-values.yaml

Model should also be set in top-level chart device values file, not values.yaml because model (and data type used with it) dictates the resource usage, how much RAM it needs, does it fit into single device, or need to be sharded over multiple ones etc.

The top-leve values.yaml is meant for cpu cases without HPA. We want users to be able to install helm chart without applying addtional value files on any standard xeon servers if they want to have a try.

How about we do the followings:

# generic CPU cases without HPA:(this is equal to applying by -f values.yaml)
helm install chatqna chatqna --set global.HFTOKEN ...

# generic CPU cases with HPA:
helm install chatqna chatqna -f hpa-for-cpu-value.yaml --set global.HFTOKEN ...

# gaudi case without HPA
helm install chatqna chatqna -f gaudi-values.yaml --set global.HFTOKEN ...

# gaudi case with HPA
helm install chatqna chatqna -f guadi-values.yaml -f hpa-for-gaudi-values.yaml --set global.HFTOKEN...

As for the PrometheusAdapter custom metrics config, maybe we should have it documented here before prometheus-operator/kube-prometheus#2504 is resolved

@lianhao
Copy link
Collaborator

lianhao commented Sep 6, 2024

Interestingly CI automatically started running tests for the new values files:

  • chatqna, xeon, cpu-values
  • chatqna, xeon, hpa-values

However, hpa-values requires Prometheus install first so that required API types are recognized. And on Xeon, while cpu-values works fine alone, hpa-values is supposed to be used only with cpu-values.

There was also unrelated (Gaudi guardrails) test failure due to: FailedScheduling 4m40s (x2 over 10m) default-scheduler 0/2 nodes are available: 1 Insufficient habana.ai/gaudi

Please contact @daisy-ycguo about the CI to skip some value files in CI. We might need to come up a unified way of how to skip some cases, now we have nv-values.yaml to be skipped, and I expect more to come.

@chensuyue chensuyue added this to the v1.0 milestone Sep 6, 2024
@poussa
Copy link
Collaborator

poussa commented Sep 6, 2024

How about we do the followings:

# generic CPU cases without HPA:(this is equal to applying by -f values.yaml)
helm install chatqna chatqna --set global.HFTOKEN ...

# generic CPU cases with HPA:
helm install chatqna chatqna -f hpa-for-cpu-value.yaml --set global.HFTOKEN ...

# gaudi case without HPA
helm install chatqna chatqna -f gaudi-values.yaml --set global.HFTOKEN ...

# gaudi case with HPA
helm install chatqna chatqna -f guadi-values.yaml -f hpa-for-gaudi-values.yaml --set global.HFTOKEN...

As for the PrometheusAdapter custom metrics config, maybe we should have it documented here before prometheus-operator/kube-prometheus#2504 is resolved

This looks sane to me.

@eero-t
Copy link
Contributor Author

eero-t commented Sep 6, 2024

As CPU values being in separate top-level chart (instead of subcharts) was fine, I dropped the resource & probe timing updates to common/*/values.yaml files, the commit reverting them, and rebased again to main. I.e. there were no code or doc changes in the rebase.

@eero-t
Copy link
Contributor Author

eero-t commented Sep 6, 2024

The top-level values.yaml is meant for cpu cases without HPA.

I thought it was intended for Gaudi because it worked so badly on Xeon...

Without suitable probe timings and resources, out-of-the-box Xeon experience is rather awful, if user happens to run multiple different services, or does kubectl scale --replicas on the backends, regardless of whether HPA is used or not.

However, according to comments in TGI values.yaml file, OPEA does not want to specify resources by default, because somebody may want to try OPEA in a setup (minikube) that does not really have resources for it (I guess user could then switch to a smaller model, which would mean default resource values mismatching it).

Therefore I put CPU timings + resource values to a separate cpu-values.yaml file that.

Different CPU resource requests are needed for different models and different data types. I.e. if things should work out of the box, there need to be file per set of models, and data types specified for TGI & TEI, which in future could look something like this:

  • cpu-mistral7b-bgebase.yaml
  • cpu-some-other-models.yaml
  • etc

Note that because it's not possible to share Gaudi, case for that is simpler. One needs different resource spec only for models that do not fit into single Gaudi, but need to be sharded over several of them. If model needs to be split, only then CPU side resource usage can be relevant (due to CPU side DeepSpeed overhead).

We want users to be able to install helm chart without applying additional value files on any standard xeon servers if they want to have a try.

Extra file is needed for Gaudi and Nvidia, so it would be consistent to have one also for CPU.

# gaudi case with HPA
helm install chatqna chatqna -f guadi-values.yaml -f hpa-for-gaudi-values.yaml --set global.HFTOKEN...

Whether one uses Gaudis or CPUs is not a relevant distinction for HPA enabling and replica limits. If cluster has enough of Gaudis or CPUs, the same values can be used.

@eero-t
Copy link
Contributor Author

eero-t commented Sep 6, 2024

Issues raised by Theresa when testing this PR, which must be fixed before merging:

  • Helm-installed Prometheus does not find ServiceMonitors
    • => Add Chart values for Prometheus Helm install release name & namespace
  • Custom metrics config overwriting fails with Helm-installed Prometheus-adapter
    • Did not find a way to handle that reasonable with OPEA Helm charts, and Prometheus upstream does not seem keen on doing anything with the adapter (except replace it)
    • => decided it's best if Helm creates configMap which name does NOT match with Prometheus one, and README has instructions on how to copy that over Prometheus-adapter configMap (before user restarts adapter). That way it works same regardless of how Prometheus is installed

Other feedback from internal reviews:

  • Already merged HPA PR does not conform to Helm best practices (Add HPA support to ChatQnA #327)
  • Gaudi fine-tuning for HPA
    • => Pre-existing resource requests in gaudi-values.yaml and probe timings in values.yaml files are probably fine for Gaudi, but would be good to verify
  • Support multiple separate HPA-scaled ChatQnA instances in same cluster
    • => prefix custom metric names with ChatQnA instance name

Note: support for ChatQnA instances being in separate namespaces will definitely come only in some future PR. Fiddling with Prometheus RBAC rules is clearly out of scope for this one.

Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me but we need to find out a way to unifying the naming of those helm value yaml files to meet the following scenarios, and make CICD to recognize those

  • value files for different workload feature variants: e.g. gaurdrails-values.yaml
  • value files for different optional feature: e.g. hpa-values.yaml,
  • value files for different vendor specific hw: e.g. gaudi-values.yaml
  • the combinations of the above 3

But this is another story, not related to HPA

@lianhao
Copy link
Collaborator

lianhao commented Sep 10, 2024

I believe the 10 mins wait time for deployment ready(ROLLOUT_TIMEOUT_SECONDS) in helm CI test is too short for cpu-values.yaml configuration sometimes, because sometimes we have multiple CI tests running on the same node, especially for GMC

So that HPA scaling rules use custom metrics for correct set of TGI/TEI instances.

Signed-off-by: Eero Tamminen <[email protected]>
* Use custom metrics configMap name that does NOT match one installed
  by Prometheus, because Helm does not allow overwriting object
  created by another Helm chart (like using manifests would).

* Add Prometheus release name to serviceMonitors. Otherwise Helm
  installed Prometheus does not find serviceMonitors. Alternative
  would be using:
  prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false

Reported by Theresa.

Signed-off-by: Eero Tamminen <[email protected]>
@eero-t
Copy link
Contributor Author

eero-t commented Sep 10, 2024

  • Helm-installed Prometheus does not find ServiceMonitors
  • Custom metrics config overwriting fails with Helm-installed Prometheus-adapter
  • Already merged HPA PR does not conform to Helm best practices
  • Support multiple separate HPA-scaled ChatQnA instances in same cluster

Pushed fixes for these issues, meaning:

  • release label for monitors
  • Custom metric install has now multiple steps, not just adapter restart
  • Renamed template files to best practices
  • Custom metrics have not chart specific prefix

Theresa is working on GMC support for HPA, so it could come after 1.0, but not as part of this PR.

Thank you @eero-t @ichbinblau would you confirm we need HPA manifest or not fot GMC to support HPA?

Discussed with Theresa, and I will be doing minimal manifest update (later today). I.e. fix non-HPA TGI/TEI deployments, and reduce HPA/ dir content to fixed HPA + custom metric rules and serviceMonitors. Deployment resource limits will come later for GMC.

Handling manifest generation properly for deployments required changing replica and terminationTimeout settings to depend on something else than HPA. Termination timeout really depends on whether pod is accelerated, not whether HPA is used, so there's no new accelDevice value for that (empty meaning CPU i.e. non-accelerated / slow).

Timeout is question of Pod slowness i.e. is it accelerated, not HPA.

Replicas need to be set only when count is set to something else than
the default value (1).  This will work also for HPA, while making
GMC manifest generation easier.

Signed-off-by: Eero Tamminen <[email protected]>
How to install Prometheus with Helm and fix doc issues.

Having a manual step that replaces existing PrometheusAdapter config
with the generated one should work in all cases (it's a workaround
for Helm refusing to overwrite object created by another chart).

Signed-off-by: Eero Tamminen <[email protected]>
Instead of replacing deployments with HPA/ ones, apply fixes
directly to the normal deployment manifests. K8s default is
1 replica, so that can be dropped, which works also better
with HPA.

Because resource requests are model specific, and GMC is used to
change model, HPA/ manifests won't help with that, GMC needs to take
care of that (eventually).

Signed-off-by: Eero Tamminen <[email protected]>
And current Helm charts contents.

Signed-off-by: Eero Tamminen <[email protected]>
@eero-t
Copy link
Contributor Author

eero-t commented Sep 10, 2024

Merged pre-commit.ci changes to earlier commits, applied several fixes to the documentation and updated GMC manifests.

GMC manifest update was done manually, because current Helm charts do not support differentiating deployments enough based on whether it's an accelerated (= much faster) deployment. Somebody would need to change all deployment templates to apply proper timings based on whether CPU or accelerator (e.g. Gaudi) is used, similarly to how I did with termination timeouts.

However, that's a generic problem in the charts, so it belongs to some other PR, not into this v1.0 milestone one.

(While the problem is generic, its effect will be multiplied when number of pods is scaled, like happens with HPA.)

@irisdingbj is the PR now OK from your side?

Variable was left-over from "Fixes for Helm installed Prometheus version".

Signed-off-by: Eero Tamminen <[email protected]>
Signed-off-by: Eero Tamminen <[email protected]>
Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eero-t Please see my embedded comments. @yongfengdu any comment on this?

@@ -110,12 +110,14 @@ spec:
port: http
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manifest files are under microservices-connector/config/manifests directory is generated by helm-charts/update_manifests.sh automatically. So if you want to change the default probe settings, please change it in corresponding component's helm chart, i.e. helm-chart/common/tei/values.yaml. Otherwise, these values will be easily overwritten.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lianhao Yes. Using cpu-values.yaml in ChatQnA papers over the (CPU) issues with current Helm charts, but a proper fix would be needed for generating the GMC manifests. However, that is not HPA specific problem, and requires more discussion[1][2] so IMHO it was clearly out of scope for this PR (although scaling makes these problems much more visible).

[1] What's IMHO needed to fix things for probe timings:

  • Common components' value files include different probe timing sections for CPU and for accelerators
  • Their deployment templates select one based on .Values.accelDevice value (empty for CPU)
  • All <device>-values.yaml files set appropriate <subchart>.accelDevice value (not just ChatQnA)
  • GMC device variant manifests are generated for all relevant components, not just TGI

(I don't think probe timings would need to be fine-tuned based on which model is used.)

[2] What's IMHO needed to fix resource requests:

  • Current sub-optimal component arguments are optimized, and resulting resource requirements are profiled, for all relevant models
    • For example on SPR, CPU TGI data type can be set to bfloat16, which halves its memory usage
    • Observability support + suitable Grafana dashboards will help with profiling
  • Instead of subcomponent model & corresponding resources being specified in top-level chart, helm install command uses suitable model+resource+args file from given component, like this:
    • -f common/tgi/gaudi/neural-chat-7b.yaml
    • -f common/teirerank/gaudi/bge-reranker-base.yaml
    • -f common/tei/cpu/bge-base.yaml
    • -f common/data-prep/gpu/values.yaml
    • (These would provide values with subchart prefix/heading so they can be used from top-level charts)
  • There could also be a global option for ignoring (CPU side) resource requests that can be used when things need to be re-profiled
  • GMC applies resource specs generated from these, when user changes model

If there are combinations of above which are common between different top-level charts, I would suggest Makefile merging relevant ones to common files (e.g. -f common/gaudi-model-defaults.yaml), to avoid duplicating things that may need to be updated whenever args, model, or image versions get updated.

Copy link
Collaborator

@lianhao lianhao Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eero-t Thanks for the suggestion. I think we need further discussion about this. I created the draft PR for us to further discuss. Please discuss at #431

@yongfengdu yongfengdu merged commit 8d86fff into opea-project:main Sep 11, 2024
20 checks passed
lianhao added a commit to lianhao/GenAIInfra that referenced this pull request Sep 25, 2024
Sync the mismatch between helm chart and GMC manifests introduced by PR opea-project#386

Signed-off-by: Lianhao Lu <[email protected]>
lianhao added a commit to lianhao/GenAIInfra that referenced this pull request Oct 8, 2024
Sync the mismatch between helm chart and GMC manifests introduced by PR opea-project#386

Signed-off-by: Lianhao Lu <[email protected]>
daisy-ycguo pushed a commit that referenced this pull request Oct 11, 2024
Sync the mismatch between helm chart and GMC manifests introduced by PR #386

Signed-off-by: Lianhao Lu <[email protected]>
@eero-t eero-t deleted the hpa-improvements branch October 18, 2024 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants