-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce ZFS related processes/tasks #118
Comments
Nothing ZFS related is active in the system i.e. there is no filesystem. Only a single pool. |
Yes, I agree the number of threads is crazy but it is the same thread mix as used on Solaris. With new Linux kernels we may be able to fix this by leveraging the new cmwq support. Here's a link for the interested reader. |
why does ZFS need these many threads anyway? I mean how many CPUs does it expect to run on? And why are they even there when not even a single FS is present in the system? I am not sure if CMWQ made it into the kernel. We need to check that. |
For the most part they just use it to maximum concurrency to the pool. There are other ways to go about this of course. But they had some nice existing infrastructure for creating thread pools, and it simplified the code, so I suspect they just used them. I don't think this is something to get hung up on in the short term, all the threads are usually idle and light weight, but I agree it's annoying and we should look at what can/should be done. |
There is really only 8kB or so allocated to each thread for the task_struct and the stack, so the actual resource usage is very little (150 threads shown here is only about 1MB of RAM). I think it bothers people more just to see the threads. I recall it may be possible to "hide" all of the duplicate threads if they are created as "light weight" threads instead of full threads, similar to what multi-threaded programs like java do, and they won't show up in "ps" output unless "-L" is used: [root@twoshoes ~]# ps -ef | grep rsyslog This shows that rsyslogd PID 1157 is really made up of 5 lightweight threads (LWP 1157..1161, 27796). I've been thinking about doing this for the Lustre threads as well, since people complain about the number of threads at times, but it is actually a tempest in a teacup for the amount of memory used vs. the convenience of the programming model. That said, I haven't actually looked at how this might be done within the kernel. |
Hi, I was looking into code and found that the reason behind this is zio_taskqueues. const zio_taskq_info_t zio_taskqs[ZIO_TYPES][ZIO_TASKQ_TYPES] = { zfs/module/zfs/spa.c Above output given in I believe as a part of zfs port taskqueues are used on the linux. I am having one question though, are there any performance implications of using solaris taskqueues in linux environment for current ZFS implementations on linux ? |
These values are they same as on Solaris, however that doesn't mean they are known to the best values for Linux. It would be interesting to do a performance study on changes certain thread counts to see how it helps/hurts performance. More generically there has been some performance concern about the spl taskq implementation. However, at this point it's been pretty well tuned and works quite well for most workloads. I suspect there are still some scaling issues which could be improved but in practice they don't start limiting performance until your pushing data at about 3 GB/s. Still we will want to fix that going forward. |
Thanks Brian for quick reply. Do you have any specific scenario's which can be tested for performance ? |
Not specifically, I would suggest you test a workload of interest to you and see if it performs well enough for your needs. |
Should the number of ZFS workers/threads be related to the number of processors on the system? |
They are. For example right now a writer thread (does compression, etc) is created for 3/4 (75%) of all CPUs. These are naned I did a quick check of the kernel source code a while back and it doesn't look like there's an easy way to make kernel threads into a single bundled process. Which is a shame because that would be a nice way to clean up the process list in the short term. |
I've given this some thought and I think the best solution here is to dynamically create and destroy the taskq threads as needed. They account for the vast majority of ZFS threads on the system. Allowing them to drop to zero when the system is idle and increase to a specified maximum value when needed should work well. The logic for this in the taskq code doesn't need to be fancy. Since creating and destroying threads under Linux is cheap I'd suggest creating a new thread on demand if there's room for it and there are pending it's for the taskq. When a work item finishes it could destroy the thread if there are no more pending items to process. |
Resolved by openzfs/spl@f7a973d and aa9af22. |
Signed-off-by: satbir <[email protected]>
We get EOVERFLOW (which causes the "Value too large" message) from zap_entry_read() when trying to read a ZAP entry and the provided buffer is too small to hold the entire value. In this case we are doing zap_lookup_impl() in the bookmark zap object (ds_bookmarks_obj). The dsl_bookmark_phys_t is one word longer on Linux than on (our version of) illumos. (The additional word is zbm_ivset_guid, used by the encryption feature.) We aren't creating any bookmarks on linux (before migration completes), but there are some additional code paths that can result in updating existing bookmarks to use the new, larger size which is incompatible with illumos. Specifically, dsl_bookmark_sync_done() updates bookmarks that are at or after the most recent snapshot, whenever there's a write to the dataset. This update increases the size to be incompatible with illumos. Based on preliminary testing, that seems to be the case we're hitting. Additionally, when we destroy a snapshot that has bookmarks at or just before it, we will update those bookmarks via dsl_bookmark_ds_destroyed(). The solution is to maintain the existing bookmark length when updating the bookmark's FBN values. External issue: DLPX-67744
These are apart from the worker threads.
The text was updated successfully, but these errors were encountered: