Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K9s running very slowly when opening Secrets in namespace with lots of secrets #280

Closed
RothAndrew opened this issue Aug 5, 2019 · 22 comments
Labels
bug Something isn't working question Further information is requested

Comments

@RothAndrew
Copy link




Describe the bug
K9s slows down to the point where it is unusable when opening Secrets in a namespace with lots of secrets.

I have a namespace with 163 secrets. Most of them are from Helm, tracking deployment versions. Opening that namespace and navigating to secrets slows K9s down so much that it is unusable. I have to terminate the terminal window and open a new one.

To Reproduce
Steps to reproduce the behavior:

  1. Open K9s
  2. Navigate to the namespace you want (in my case, I press 2)
  3. SHIFT+colon
  4. sec
  5. ENTER
    The list of secrets appear, but K9s is too slow to be useful anymore

Expected behavior
K9s doesn't slow down.

Screenshots
If applicable, add screenshots to help explain your problem.

A video would be more useful, but I would need to redact a significant amount. If I have time this evening, I'll see if I can reproduce it on my cluster at home with something fake.

Versions (please complete the following information):

  • OS: MacOS Mojave 10.14.6
  • K9s v0.7.12
  • K8s v1.12.7

Additional context
I'm on a corporate-managed laptop with antivirus and firewall junk so if nobody is able to reproduce that may be it, but I hope not...

Seems like ~50 secrets is when K9s starts to get bogged down a little, and towards ~100 secrets it starts getting really slow.

@derailed derailed added the bug Something isn't working label Aug 5, 2019
@derailed
Copy link
Owner

derailed commented Aug 5, 2019

@RothAndrew Thank you for this report! Great find!! I'll try to repro on one of my clusters and see what might be happening here. I understand the sensitivity of this matter so please don't share screen or video if I can't repro we will figure out another way to share/discuss.

@derailed derailed added the question Further information is requested label Aug 6, 2019
@derailed
Copy link
Owner

derailed commented Aug 6, 2019

@RothAndrew So I took a quick look and spinned up 200 secrets on a cluster both remote and local and K9s has NOT slowed down a bit ;( So sheer volume is not an issue here. There is something else going on. Are you saying the view shows the secrets but then describing a secret or viewing a secret yaml is slow or loading secrets then going to another view is slow? Or is loading the view in the first place is slow? Also how big are those secrets ie how much data is in there? What kind of Helm charts are you using ie public or private charts? I know Helm is heavy on cms but would no think there is so much secret data being stored? but I could be wrong...

@RothAndrew
Copy link
Author

Helm has the option to store the data as secrets instead of configmaps for security. We are using that option, so the secrets have pretty much the same data as those configmaps would have.

@RothAndrew
Copy link
Author

Loading the list of secrets happens pretty quickly, but then K9s slows down. Pressing arrow up/down, or pressing Shift+colon, typing another resource type, and hitting enter, takes like 30 seconds to catch up and actually work. The instant it does though, and it switches to pods for example, K9s is instantly fast again

@derailed
Copy link
Owner

derailed commented Aug 6, 2019

@RothAndrew Thank you for the reply and extra info! Hum so I have tried secrets and up-down arrowing and switching on a couple clusters with 200+ secrets and k9s is still cool. Also tried on an Istio cluster that keeps lots of secrets 130+ and K9s perf is still good. Yikes! I'll dig some more and see if I can repro this. Also would you mind changing the refresh rate to say 5s or 10s (k9s -r 5) and see if this changes anything? In the mean time if you see anything else please lmk. Thank you Andrew!

@RothAndrew
Copy link
Author

I'll try changing the refresh rate, that's a good idea that I hadn't thought of.

Don't feel like you have to bend over backwards on my account. It's not like I'm gonna stop using your amazing tool :)

Tomorrow morning I'll make a video and email it to you.

@derailed
Copy link
Owner

derailed commented Aug 6, 2019

Ah! Thank you so much @RothAndrew for your kind words!! I am trying to have a tool that we can all use to our satisfaction. A tall order may be... But I know we all have stuff to do and don't want the tool to get in the way. So if we can figure out a resolve for you, I think that would be great as others will benefit too. In your particular case, I am thinking that it might be a latency issue talking to the K8s api server and think perhaps K9s is staggering the refreshes. Bumping the rate high enough might shade some light... I hope??

@RothAndrew
Copy link
Author

I changed the refresh to 30, and it is smooth for 30 seconds, then freezes for 5 seconds, then is smooth again, so I think we are on to something.

@RothAndrew
Copy link
Author

Yep, it's definitely a server refresh timing thing. Sometimes the hang is 2 seconds, sometimes 5, sometimes ~7, but never under 2, which is why the default of 2 showed unrelenting slowness.

@RothAndrew
Copy link
Author

RothAndrew commented Aug 6, 2019

Is it possible to make the server refresh a non-blocking asynchronous action? This is in Golang right? I don't know Golang, but in Javascript I would have it return a promise so it doesn't block the UI thread.

Of course, this isn't a webpage, it's a terminal, so that could be completely irrelevant :)

@RothAndrew
Copy link
Author

Anyways, feel free to close this when you're ready, since we've determined what the problem is.

@fardin01
Copy link

fardin01 commented Aug 29, 2019

@derailed when I try to open a context/cluster with a big number of namespaces (~70), the speed is incredibly slow. How could I debug this? Is there some sort of cache I could clean?

@derailed
Copy link
Owner

derailed commented Aug 30, 2019

@fardin01 Thank you for reporting this! I am in the process of reworking K9s core to improve with speed based on @RothAndrew initial report. I think some resources are indeed more expensive than others to fetch from the server ie secrets (Andrew's instance). If you don't mind adding a few more details on your particular issue ie type of views you are using, how many total resources you are viewing, etc... Any little bits of details will help me zero in on how to improve the experience.

In the immediate, you can specify a -n as a K9s CLI arg and see if this helps a bit, you can then switch namespace from within K9s. Please let me know if this helps some... Thank you!!

@fardin01
Copy link

Thank you @derailed. The danger here is that k9s is so cool and useful (🎖) that my daily work has been a little affected by this slowness 😆

In ctx view I have a bunch of clusters but I'll take prod and test for this.
In total prod has 8 namespaces, 325 pods, 127 secrets, 844 ConfigMaps etc.
test has 74 namespaces, 1937 pods, 1178 secrets, 2451 ConfigMaps etc.

Speed in prod has been the same. It's smooth and easy. In test however, it can take anywhere between 5 to 15 seconds to load the context (if it doesn't freeze all together). Once I'm in the test cluster responsiveness is generally slow.

Let me know if you need any more specific info or even logs. Thank you :)

@derailed
Copy link
Owner

derailed commented Sep 3, 2019

@fardin01 Thank you for sending this information! I can see this being an issue at the moment with K9s having this kind of load. Does launching K9s with a namespace help in your test cluster?

@maximzxc
Copy link

maximzxc commented Sep 4, 2019

Also noticed huge lags in configmaps section, only ~250 records there though, seems like internet connection speed somehow related to it, with a fast internet connection it works faster.

@fardin01
Copy link

fardin01 commented Sep 4, 2019

@derailed yes. Doing k9s -n all shows 1890 pods from every namespace in test cluster very smoothly.

@paivagustavo
Copy link
Contributor

paivagustavo commented Oct 6, 2019

I think I have found the root cause of this!

When we make a request via client-go for structured data, we get the full object. In other words, when we do a rr, err := s.DialOrDie().CoreV1().Secrets(ns).List(opts) we fetch not only the name of all secrets, but all the data the is stored in the secret as well.

That began to be noticeable with few hundred of objects, especially with helm objects, these could contains big base64/gzipped strings. Which makes we fetch a huge amount of data from kubernetes and is very expensive to parse out this data.

If we fetch these (config maps and secrets) with the unstructured api, the same we use to fetch CRDs, we could only get the tabular data + metadata.

In my cluster, this reduced the load time of 5700 config maps from 17-20 seconds down to 2-3 secs.

@derailed do you think this is a reasonable solution?

There is a very raw version of this in my branch if anyone is willing to test it out.

@derailed
Copy link
Owner

derailed commented Oct 6, 2019

@paivagustavo Brilliant!!

You are correct here but can't quiet explain it. I had looked at going generic with secrets when @RothAndrew posted this but I did not see a significant gain then ;(

After your post, I looked at configmaps instead and I do see the delta. You are so right! Looking at the configmap schema there is indeed a binaryData field that would need to come down the wire and be unmarshalled in the typed case but not using the generic lister. case. So excellent catch!!

Tho I went back and looked at secret again. There should not be a huge delta there as secret schema seemingly closely match the what the generic lister pulls down. So wire time + unmarshalling should be a wash... Yet timing these calls now yield ~50% gain. I'll need to take a deeper pass there, to see why.

I think we should be ok going generic with these 2 resources as K9s displays the same data as kubectl (ie no custom columns). I definitely see the boost for cms but not sold on secrets but I may be missing something.

@paivagustavo
Copy link
Contributor

@derailed How are you looking this schema? The v1.Secret struct from kubernetes does contain a map Data that contains the secret data.

// Secret holds secret data of a certain type. The total bytes of the values in
// the Data field must be less than MaxSecretSize bytes.
type Secret struct {
	metav1.TypeMeta `json:",inline"`

	metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`

	Data map[string][]byte `json:"data,omitempty" protobuf:"bytes,2,rep,name=data"`
        ...

And it does return the data when I do a k8s.NewSecret(c).List(""), so in the end this is a win on both ConfigMaps and Secrets :)

Maybe I am looking on the wrong place, but I think this should help both cases.

I'll send a PR soon if you agree with me.

@paivagustavo
Copy link
Contributor

Hi @RothAndrew, @fardin01 and @maximzxc. The improvements that i have mentioned was merged on master.

Could you guys check if the views of Secrent and ConfigMaps are now faster and if we can close this issue?

@maximzxc
Copy link

It works like a charm now. Thank you @paivagustavo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants