Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

has scaling from zero degraded in performance? #9104

Closed
Morriz opened this issue Aug 19, 2020 · 21 comments
Closed

has scaling from zero degraded in performance? #9104

Morriz opened this issue Aug 19, 2020 · 21 comments
Labels
area/autoscale kind/question Further information is requested lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@Morriz
Copy link

Morriz commented Aug 19, 2020

In 0.12 it was around 8-10 secs. In 0.16 it's between 14-18 secs. Same stack same resources.

@Morriz Morriz added the kind/question Further information is requested label Aug 19, 2020
@vagababov
Copy link
Contributor

No it has been strictly consistent. Especially in the ranges you're quoting.

@mattmoor
Copy link
Member

@Morriz one thing to potentially look at is whether the number of K8s services (possibly lots of Revisions?) in your namespace has grown. We noticed with "service links" that cold start deployment degrades as the number of services increases, and we create two services per Revision.

#8498

In recent versions of Knative, you can disable service links at the PodSpec level with enableServiceLinks: false, and if that resolves your issue, you might consider opting out of them globally, by changing enable-service-links: false in config-defaults. We plan to do this ourselves soon, but are in a deprecation period.

@mattmoor
Copy link
Member

If that doesn't resolve the issues you are seeing, would it be possible to try and produce a small repro case that we can try poking at, including more information about your environment (K8s version, provider, ...)

thanks again for reaching out.

@mattmoor
Copy link
Member

/area autoscaling

@knative-prow-robot
Copy link
Contributor

@mattmoor: The label(s) area/autoscaling cannot be applied, because the repository doesn't have them

In response to this:

/area autoscaling

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mattmoor
Copy link
Member

/area scaling

@knative-prow-robot
Copy link
Contributor

@mattmoor: The label(s) area/scaling cannot be applied, because the repository doesn't have them

In response to this:

/area scaling

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Morriz
Copy link
Author

Morriz commented Aug 20, 2020

@Morriz one thing to potentially look at is whether the number of K8s services (possibly lots of Revisions?) in your namespace has grown. We noticed with "service links" that cold start deployment degrades as the number of services increases, and we create two services per Revision.

#8498

In recent versions of Knative, you can disable service links at the PodSpec level with enableServiceLinks: false, and if that resolves your issue, you might consider opting out of them globally, by changing enable-service-links: false in config-defaults. We plan to do this ourselves soon, but are in a deprecation period.

Tnx. I will try that tomorrow and report back.

@Morriz
Copy link
Author

Morriz commented Aug 21, 2020

@mattmoor I tried enableServiceLinks: false but got:

could not be patched: admission webhook "validation.webhook.serving.knative.dev" denied the request: validation failed: must not set the field(s): spec.template.spec.enableServiceLinks

Setting the global enable-service-links: "false" (had too quote the boolean) did nothing to the awakened new pod. I still saw it get a ton of env vars pointing to all the service ips.

@mattmoor
Copy link
Member

Yeah, you have to quote because yaml. You need to roll out a new Revision for it to take effect, and you should see it in the Revision's PodSpec.

@mattmoor
Copy link
Member

the must not set the field bit shouldn't happen on 0.16 🤔

@Morriz
Copy link
Author

Morriz commented Aug 22, 2020

Yeah, you have to quote because yaml. You need to roll out a new Revision for it to take effect, and you should see it in the Revision's PodSpec.

I removed the ksvc beforehand. Applied the configuration. Then redeployed. Should create a fresh first revision I presume, no?

@Morriz
Copy link
Author

Morriz commented Aug 22, 2020

the must not set the field bit shouldn't happen on 0.16 🤔

We are running gcr.io/knative-releases/knative.dev/serving-operator/cmd/manager@sha256:16e7c267645b77e0fb8f2adb7c2706288647d6ce8d25d585b2b91c36dbef81e5
which was installed from 0.16, but that sha is nowhere to be found anymore. I think your sha releases are confusing btw.

@Morriz
Copy link
Author

Morriz commented Aug 22, 2020

Also, your docs are confusing. I can't find out where to go for the operator from the github readme's (it says it's now readonly without a deprecation warning linking to the new source), only from the knative.dev docs.

@Morriz
Copy link
Author

Morriz commented Aug 22, 2020

And the latest https://knative.dev/docs/install/knative-with-operators/ link to the operator yaml is dead (I created an issue already)

@mattmoor
Copy link
Member

Since the image is public in GCR, you can pop it in your browser and it shows tags. I see a tag with 0.14

cc @houshengbo @Cynocracy @evankanderson

I think this is using the split operator (serving-operator), and we've been moving to a unified operator.

@Morriz
Copy link
Author

Morriz commented Aug 22, 2020

aha, I see...I must be messing up memories...getting old

@Morriz
Copy link
Author

Morriz commented Aug 25, 2020

Ok, we are now at 0.16 and links are turned off....still same slow 15 secs ;(

@vagababov
Copy link
Contributor

Can you instrument the image pull time? Since if anything it should much faster than 0.12...

@markusthoemmes
Copy link
Contributor

Do we have any handle/reproducer/test on this to be able to measure whether we've regressed?

@github-actions
Copy link

github-actions bot commented Jan 4, 2021

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2021
@github-actions github-actions bot closed this as completed Feb 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/autoscale kind/question Further information is requested lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

5 participants