-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs-0.6.0rc8 (repo) mkdir can hang on ARM 32bit system #749
Comments
Thanks for filing the bug, what would be helpful would be to dump the stacks for the processes which are stuck. You can do this by the running the following command as root It's possible the work @ryao has been doing to improve the memory management and support swap devices will help with this. But it's hard to say without seeing the stacks. |
git pull latest, just in case but only spl had some kmem changes.
(I assume you only need the mkdir stack, and not the whole thing) |
The entire stack trace would be more helpful. |
My apologies; |
Nothing looks horribly wrong from the stack traces. Your mkdir is waiting on the syncing txg which appears to be waiting on a disk I/O. That's nothing abnormal about that. However you might be suffering from some of the other known 32-bit issues. Can you try the followings.
|
I have tried option 1) by specifying the bootarg line, changing vmalloc=384M to vmalloc=500M. No difference.
No difference
No difference. Stack trace of zfs create; |
Ok, so I still had zfs-0.6.0rc8 on disk, which works. So I load all .ko's except for zfs.ko, and it still works.
So, there is something in github HEAD that was not in 0.6.0rc8, and it in the zfs.ko module. I can use all kernel modules from HEAD, except zfs.ko from 0.6.0rc8 tarball, and it still works. (and a little pleasing to know arm divmod, which is in spl, is not at all related) Does that give us a lead? |
The post -rc8 change set is relatively small. You could bisect these changes to find the offending patch. However, if I were to make an educated guess I'd try reverting 302f753 first. This was the most disruptive change and while it's been seriously tested on x86_64 there may have been unexpected 32-bit side effects. c421831 Define the needed ISA types for ARM 7101140 Revert "Disable direct reclaim on zvols" 3462fa2 Add mdadm and bc dependencies b39d3b9 Linux 3.3 compat, iops->create()/mkdir()/mknod() ce90208 Disable direct reclaim on zvols 518b487 Update ARC memory limits to account for SLUB internal fragmentation 302f753 Integrate ARC more tightly with Linux afec56b Add zfs_mdcomp_disable module option ad60af8 Illumos #2067: uninitialized variables in zfs(1M) may make snapshots und 95bcd51 Illumos #1946: incorrect formatting when listing output of multiple pool 187632d Illumos #952: separate intent logs should be obvious in 'zpool iostat' o ebf8e3a Illumos #1909: disk sync write perf regression when slog is used post oi 409dc1a Use KM_PUSHPAGE in l2arc_write_buffers cf81b00 ZFS list snapshot property alias 10b7549 ZFS snapshot alias 7d5cd71 Illumos #1346: zfs incremental receive may leave behind temporary clones 22cd4a4 Illumos #1475: zfs spill block hold can access invalid spill blkptr 5ffb9d1 Illumos #1951: leaking a vdev when removing an l2cache device b129c65 OS-926: zfs panic in zfs_fill_zplprops_impl() 3adfc40 Illumos #1680: zfs vdev_file_io_start: validate vdev before using vdev_t 109491a Improve error message consistency f4605f0 Document the zle compression algorithm f0fd83b Export additional dsl symbols 1f0d8a5 Fixed a NULL pointer dereference bug in zfs_preumount 2ce9d0e Make Gentoo initscript use modinfo 847de12 Print human readable error message for ENOENT fc41c64 Properly expose the mfu ghost list kstats 9fc6070 Remove hard-coded 80 column output |
First time with "git bisect", and I guess I have to reluctantly admit that to be pretty damn useful. :) Outcome of bisect is:
|
To re-iterate what bisect already told us, I went to "master", and reverted only "302f753f1657c05a4287226eeda1f53ae431b8a7" and it does indeed work. Doing some heavy use, I did find yet another trouble, but is separate from this. Is it known? Should I create a new ticket?
So, I got 1189 files copied! |
OK, well that's progress. We'll of need to dig in to exactly why those VM improvements cause trouble for a 32-bit ARM systems. But frankly until we move away from vmalloc() entirely 32-bit system are going to be a bit dodgy. As for the sort lockups those are already known and usually not-fatal. They are just advisory that something it taking longer than expected. With both dedup and compression enabled that's perhaps not to surprising. |
Actually, the soft-lockups are not temporary, they are stalled for good. I left it over the weekend and it does not recover. And lockups still happen with compression and dedup off. But as you say, 32bit has issues.. |
Bummer, well you should probably open a new issue for the soft lockup so we can at least track it. Please include the fact that it's a 32-bit system and the contents of dmesg which will have the full stacks associated with those soft lockup messages. |
Oh, and on an unrelated note I may set up an ARM and PPC VM to ensure I don't break those builds going forward. |
One of the troubles with the lockup, is that it is indeed 'locked up'. I can not run more commands, nor do the logs after a reboot contain any messages. Including serial console. |
Hey guys, not sure if I should add on here, or re-create a ticket. I was distracted for a bit. Trying zfs-0.6.0rc11 Compile is 100% without modifications, very nice. Compiled with debug (removed Werror though). Even though we know the patch revision that I can roll out to fix things, here is what happens with sources intact; Note the newer kernel. Linux cubox 3.4.6 #8 PREEMPT Sun Jul 22 14:18:53 MDT 2012 armv7l GNU/Linux
Note that first allocate error, not entirely sure what I am doing wrong already. Everything still works though.
Process never returns. What does ZFS_DEBUG do, I was expecting some more messages (but I suppose the tage errors are those). "echo t >/proc/sysrq-trigger" output here: http://lundman.net/ftp/kern3.log (Not sure why it looks bad at start, that is how it is on the machine /var/log/messages, unless the write corrupted) |
The echo 0 >/proc/sys/kernel/spl/debug/mask |
Ah heh, well, I had hoped to turn on debug, so that perhaps I would get some hints as to where the problem lies. Setting debug/mask to 0 just seems to shut it up. Anyway, I went with the old trusted 'litter everything with prints' style debugging, changing all dprintf to printk, as well as a few of my own:
for ever. No change in the number either. Guess it is stuck in a busy loop.
Because I like to shoot from the hip, I changed the return to "0", and we get:
I would have expected crashes, not that it pretends to work.
I assume it isn't just that easy (cos it never is), so you probably have a deeper understanding of the problem.
|
@lundman No, it really could be just that easy. Very little of the code is architecture specific so I don't expect any breakage except in regards to memory management. ZFS likes to proactively keep tabs on the available system resources and throttle its behavior accordingly. It seems those heuristics are just going horribly wrong on the more constrained 32-bit system. This this case it appears the logic which throttles the TXG formation is causing the issue. Handily, there's a kstat file to observe this behavior, can you post the contents of Looking at I'd also be interested in adding a printk to that if conditional when it fails to print those values. It appear you already did some of this printing the available memory (~300MB) should be plenty so I suspect |
Oh sorry, yes I forgot those printk's. I have rsynced the kernel, then built it all, and created swap and attached. This version of zfs is running very well (so far).
Outputs:
Ah I see, so all systems will less memory than 8646911284635238400 is considered small. You guys with your super computers... |
Actually, it can. There's no guarantee that Anyway, it's clear we need to make sure this is a 64-bit type. We'll need to promote it as you did to a 'unsigned long long' and update the module_param() chunk at the bottom... there may be other issues like this for ARM.... The extra stacks on the console are a known issue which was fixed in master. Grab the latest source and they'll go away, or you can ignore them for now since they are entirely harmless. It's just some debugging which is this case is related to dedup. Why don't you kick the tires a little bit more and see how it hangs together. Your blazing new trail here. |
@lundman The issue is the following:
zfs_write_limit_max will be 32-bits on a 32-bit system and 64-bits on a 64-bit system. However, ARC will always access it as a 64-bit value, causing it to read an adjacent value in the upper 32-bits. This also means that we are clobbering the adjacent value during ARC initialization. This could explain the issues that people are having on 32-bit systems, which always seemed to revolve around this code. Lacking the time needed to setup a 32-bit system, the best that I could do was eyeball this and unfortunately, this is a subtle detail that is unnoticeable unless looking at multiple files. Other implementations appear to make this 64-bit, but the Linux kernel neither permits us to specify 64-bit module parameters on 32-bit systems nor would they make sense, so we should just replace uint64_t with unsigned long in the extern declaration. It would be a good idea to review all variables declared With that said, I have opened pull request #1034 to address this issue. @lundman, I am confident that this fix is correct, but it still needs to be tested. After you have tested it, I would appreciate it if you would also test the following: It was originally intended to improve memory requirements on 32-bit systems, although any benefit that it provided was completely negated by this issue. |
@lundman Looking through my email, it appears that you deleted the message where you said that you found the extern issue. In the future, please don't do that. I spent some time on this after reading |
I didn't delete anything, but there sure is something weird going on. One comments looks like it is from me, but I didn't write it :) But yes, it is definitely allocating space for 32bit, but arc.c is accessing 64bit. I simply trusted that GCC would warn me about that, but turns out it can only do that at linking time, and we are making kernel models, which relies on insmod to do it. Or such is my understanding. Let me catch up on the proposed patches and I will try them out. |
Your understanding is correct. This is a perfect example of why global variables can be dangerous in C. |
I did some grep'ing, externs are few, and no other uint64s around. Changing my fix with yours just to double check. |
As an additional comment, it might be wise to either remove the |
It seems to me that the ryao@a6d6269 patch is not entirely correct? Isn't the non-KERNEL code patch leaving an empty 'if' conditional, without a statement? Or is the if conditional immediately following supposed to be the statement? The indentation goes crazy in either case.
|
Thanks for catching that. ryao/zfs@2074c38 should correct it. |
Heh ok. So I apologise for the primitive testing, but as a quick test:
Then I deleted that, compiled with ryao@2074c38 and rebooted.
Patch most likely does not affect this output, but it also did not break anything. Both version output a large amount of:
So that code is triggered a bit more normal. Actually, I should probably revert back to returning an error again. |
How much RAM do you have? It seems like the present issue is that ZFS doesn't scale well to the amount of memory that your system has. Would you provide the contents of /proc/spl/kstat/zfs/arcstats? Also, is the mkdir issue still present? |
The advice that FreeBSD gives to people on i386 might be helpful: http://wiki.freebsd.org/ZFSTuningGuide#i386 The equivalent to vm.kmem_size on Linux is the vmalloc kernel commandline parameter. With that said, it would be useful to have the output of |
Most of my testing is on the CuBox, which has 1GB ram. Although, I think I boot with vmalloc=384M if that matters. Interestingly, I changed the throttle function back to return EAGAIN instead of my "0", and:
Which is quite a bit slower. But I suppose it has to retry after all. But more importantly, with the 32/64bit fix, all the ARM/32bit problems are gone. So mkdir is not having issues. I would consider this issue to be closed with the ryao@4b431ae fix. On the extern issue, I suppose compiling ZFS statically into the kernel would have shown that problem, but I have never tried myself. Can ZFS be compiled into the kernel? |
I guess I should also increase the vmalloc variable. My background is Sun so I am somewhat new to the linux specific tweaks. |
I am not sure if compiling ZFS into the kernel would have exposed the extern issue. However, you can compile ZFS into your kernel. You will need to compile the SPL into the kernel too. See: https://github.com/zfsonlinux/zfs/blob/master/README.markdown With that said, I think you might want to try tuning the default module parameters. Here are the values that the FreeBSD wiki suggested: zfs_arc_max=41943040 |
@lundman With regard to vmalloc, increasing it is probably a good idea. It might be worthwhile trying to set vmalloc. The kernel has a 4GB address space and you only have 1GB of RAM. Dedicating 1GB of the address space to virtual memory should minimize the effect of external fragmentation in that space with no side effects. |
Oh neat, so you can make static kernels. Thanks for the information. I just checked my uboot, and perhaps I don't boot with vmalloc after all, either way, playing with your numbers, I get:
Quite the difference. Possible due to that there is not a single warning from throttle given. |
That is great news. It sounds like we might be able to make ARC pick better default values on systems with less memory. Would you post the contents of |
Ok, setting vmalloc=1G did not work so well, it kept shooting my processes when it ran out of memory. So, 800M test worked, result
A few seconds less. Loading ZFS with default values: Hmm old on, need to figure out how to attach files |
You can wrap the output with three tildas at the beginning and three tildas at the end to form a block quotation. As for what happened with vmalloc=1GB, that suggests that my understanding of that tunable was incorrect. I will need to look into that, although that will have to wait for when I have more time. |
Ah they were just a bit long, so I thought attachment was better. Turns out you can't attach anyway. Default:
and with insmod ./module/zfs/zfs.ko zfs_arc_max=41943040 zfs_vdev_cache_size=5242880
|
Commit c409e46 introduced a number of module parameters. Of these, ARC unconditionally expected zfs_write_limit_max to be 64-bit. Unfortunately, the largest size integer module parameter that Linux supports is unsigned long, which varies in size depending on the host system's native word size. The effect was that on 32-bit systems, ARC incorrectly performed 64-bit operations on a 32-bit value by reading the neighboring 32 bits as the upper 32 bits of the 64-bit value. We correct that by changing the extern declaration to use the unsigned long type. This should make ARC correctly treat zfs_write_limit_max as a 32-bit value on 32-bit systems. Closes openzfs#749 Reported-by: Jorgen Lundman <[email protected]> Signed-off-by: Richard Yao <[email protected]>
@ryao Thanks for digging in to this and getting to the real root cause, but I tweaked your patch slightly. This would have been caught as a conflicting type by gcc if the These zfs_write_limit_* tunings should really be private to @lundman Thanks for getting the fix tested so promptly. Hopefully, this get's us back to things being largely functional for 32-bit systems. I'm sure there's still lots of tuning to be done but we can work through that in other issues. |
@behlendorf This will make ZFS functional on ARM systems, but i386 systems require some additional work. I asked @amospalla in IRC to test this patch patches for me on his i386 system and found an additional problem that is conveniently wrapped by #if defined(__i386) ... #endif. I outlined what is happening there in issue #831. |
Commit c409e46 introduced a number of module parameters. This required several types to be changed to accomidate the required module parameters Linux macros. Unfortunately, arc.c contained its own extern definition of the zfs_write_limit_max variable and its type was not updated to be consistent with its dsl_pool.c counterpart. If the variable had been properly marked extern in a common header, then gcc would have generated a warning and this would not have slipped through. The result of this was that the ARC unconditionally expected zfs_write_limit_max to be 64-bit. Unfortunately, the largest size integer module parameter that Linux supports is unsigned long, which varies in size depending on the host system's native word size. The effect was that on 32-bit systems, ARC incorrectly performed 64-bit operations on a 32-bit value by reading the neighboring 32 bits as the upper 32 bits of the 64-bit value. We correct that by changing the extern declaration to use the unsigned long type and move these extern definitions in to the common arc.h header. This should make ARC correctly treat zfs_write_limit_max as a 32-bit value on 32-bit systems. Reported-by: Jorgen Lundman <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#749
Creating this ticket after conversations with behlendorf, to attempt to track down other issues.
Initial porting work was done with zfs-0.6.0rc8, on PREEMPTIVE kernels, but recompiled kernel, and using github repository versions now. Tempted to go back to old setup as I don't remember seeing the issue then. (I could install a whole Ubuntu image after all)
At the moment, IO can stall/hang (per processes) on write operands (for example mkdir), but also on commands like "zfs create" which presumably also causes write requests. Interestingly, after reboot the "zfs create" filesystem does exist.
Symptoms are that the process executed never returns. But the OS itself keeps functioning.
For example:
Which does not return;
Strangely enough, if I strace "zfs create" it also dies in mkdir:
There are some compile time warnings, I will mention just in case they are relevant;
Recompiling ZFS with --enable-debug did not seem to make any more output that I can observe.
The text was updated successfully, but these errors were encountered: