Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AOF rewrite should call fdatasync per second to avoid block the main thread's write(2) #1019

Closed
springside opened this issue Mar 21, 2013 · 21 comments

Comments

@springside
Copy link

On my 64G memory & 100MB/s HD test machine, when 10G AOF file is rewrited, the main thread's write(2) would be blocked from 7 second to 14 second , whole redis would be blocked.

"Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis." time and time again.

I change the system parameter to let the system flush to disk more often, avoid to flush too much data one time
sysctl vm.dirty_bytes=33554432(32M)

i think it is good to modify the AOF rewrite code, let it call fdatasync directly.

@antirez
Copy link
Contributor

antirez commented Mar 21, 2013

Thanks for your report, do you mean to call fdatasync in the child process that writes the AOF so that it avoids to accumulate too big buffers to flush on disk? We did something like that in replication during RDB transfer with good results actually so this probably makes sense for AOF as well.

@springside
Copy link
Author

yes, and when i change the parameter to 32M, no busy log and no blocking any more :)

100.00% <= 317 milliseconds.

if fdatasync only call per second, it will be 100M on my test machie, it still cause 2-4 seconds blocking.

@antirez
Copy link
Contributor

antirez commented Mar 21, 2013

That's cool, however did you checked if the time to produce the AOF file changed in a measurable way? Thanks.

@springside
Copy link
Author

I am running "redis-benchmark -r 100000008 -n 2500000 -d 256 -t SET" when AOF rewrite

Use 1 minutes to write 8.6G AOF file, when vm.dirty_bytes=33554432 (32M) vm.dirty_background_bytes=33554432(32M)

[3106] 21 Mar 17:38:01.059 * Background append only file rewriting started by pid 3292
[3106] 21 Mar 17:38:08.584 * Background AOF buffer size: 80 MB
[3106] 21 Mar 17:38:15.594 * Background AOF buffer size: 180 MB
[3106] 21 Mar 17:38:23.596 * Background AOF buffer size: 280 MB
[3106] 21 Mar 17:38:30.790 * Background AOF buffer size: 380 MB
[3106] 21 Mar 17:38:37.022 * Background AOF buffer size: 480 MB
[3106] 21 Mar 17:38:43.235 * Background AOF buffer size: 580 MB
[3292] 21 Mar 17:39:02.891 * SYNC append only file rewrite performed

After back to the default value of RedHat Enterprise 6, vm.dirty_ratio = 20, vm.dirty_background_ratio = 10. (6.4G in my 64G machine), use 1.5 minutes to write the same file.

[3106] 21 Mar 17:42:02.811 * Background append only file rewriting started by pid 3323
[3106] 21 Mar 17:42:10.569 * Background AOF buffer size: 80 MB
[3106] 21 Mar 17:42:15.616 * Background AOF buffer size: 180 MB
[3106] 21 Mar 17:42:20.640 * Background AOF buffer size: 280 MB
[3106] 21 Mar 17:42:25.695 * Background AOF buffer size: 380 MB
[3106] 21 Mar 17:42:29.110 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
[3106] 21 Mar 17:43:08.007 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
[3323] 21 Mar 17:43:26.497 * SYNC append only file rewrite performed

and redis-benchmark report:

100.00% <= 37420 milliseconds

@antirez
Copy link
Contributor

antirez commented Mar 21, 2013

Very interesting, thanks! I'll provide a branch with explicit fdatasync() call to see if the results are comparable to what you get modifying kernel parameters. Thanks for your help, it is appreciated.

@springside
Copy link
Author

Looking forward it, i will test it for you. Thanks for you extreme quickly reply.

antirez added a commit that referenced this issue Apr 3, 2013
This prevents the kernel from putting too much stuff in the output
buffers, doing too heavy I/O all at once. So the goal of this commit is
to split the disk pressure due to the AOF rewrite process into smaller
spikes.

Please see issue #1019 for more information.
@antirez
Copy link
Contributor

antirez commented Apr 3, 2013

Dear @springside, I just implemented what discussed here in the issue-1090 branch, please could you verify that the good effects now that we moved the idea inside Redis instead of modifying the kernel parameters are still here? :-)

The branch is a fork of unstable, but you'll be able to easily cherry pick the two commits implementing this into 2.6 if you need to, or I can make a 2.6 branch with the two commits for you if you want.

Thank you in advance!

@antirez
Copy link
Contributor

antirez commented Apr 3, 2013

Sorry I was wrong... cherry-pick is not possible so I'm creating a 2.6-issue-1090 branch right now.

antirez added a commit that referenced this issue Apr 3, 2013
This prevents the kernel from putting too much stuff in the output
buffers, doing too heavy I/O all at once. So the goal of this commit is
to split the disk pressure due to the AOF rewrite process into smaller
spikes.

Please see issue #1019 for more information.
@antirez
Copy link
Contributor

antirez commented Apr 3, 2013

done :-)

@springside
Copy link
Author

thank you, but i am enjoying my vacation , and the test machine can only connect at office network.
I will test it at April 7, thank you for your quick action again.

@antirez
Copy link
Contributor

antirez commented Apr 4, 2013

No rush! 7 April is perfect :-) Have a great vacation!

@springside
Copy link
Author

Hi @antirez, the problem is gone.

@springside
Copy link
Author

Hi @antirez , when would the code merge to 2.6/2,8 branch ?

@antirez
Copy link
Contributor

antirez commented Apr 24, 2013

Right now :-)

antirez added a commit that referenced this issue Apr 24, 2013
This prevents the kernel from putting too much stuff in the output
buffers, doing too heavy I/O all at once. So the goal of this commit is
to split the disk pressure due to the AOF rewrite process into smaller
spikes.

Please see issue #1019 for more information.
antirez added a commit that referenced this issue Apr 24, 2013
This prevents the kernel from putting too much stuff in the output
buffers, doing too heavy I/O all at once. So the goal of this commit is
to split the disk pressure due to the AOF rewrite process into smaller
spikes.

Please see issue #1019 for more information.
antirez added a commit that referenced this issue Apr 24, 2013
This prevents the kernel from putting too much stuff in the output
buffers, doing too heavy I/O all at once. So the goal of this commit is
to split the disk pressure due to the AOF rewrite process into smaller
spikes.

Please see issue #1019 for more information.
@antirez
Copy link
Contributor

antirez commented Apr 24, 2013

Merged everywhere, now adding support to enable/disable it in redis.conf. By default this will be enabled. Thank you for your help!

@antirez antirez closed this as completed Apr 24, 2013
@antirez
Copy link
Contributor

antirez commented Apr 24, 2013

Feature made configurable and committed, so now we are set for a release... but I've another pending thing in Redis Sentinel that I need to address before committing.

@yoav-steinberg
Copy link
Contributor

Looking at the fix it seems like when writing more than 32MB (for example a 100MB key) we'll only get a single fsync. Am I missing something?
I've actually dealt with this in the past and came up with this patch: https://github.com/GarantiaData/redis/commit/4fe3d6bd40decabf1ddd3bdb5b69b87521e6233d

@charsyam
Copy link
Contributor

charsyam commented May 2, 2013

@yoav-steinberg hi guy.
I have a question. when do you update "r->processed_bytes"?
I might think because of it, only it will call fsync, if (len > r->io.file.fsync_interval)

so you should add bytes_written to r->processed_bytes before returing rioWrite

@yoav-steinberg
Copy link
Contributor

@charsyam, processed_bytes is updated in the rio layer (not specific to files). This is another change in the GarantiaData fork. You can find it here: https://github.com/GarantiaData/redis/commit/3aab06bcc7460b46c96fbb8d3b704ab1227c6be4

@charsyam
Copy link
Contributor

charsyam commented May 2, 2013

@yoav-steinberg Hi, I might think I found another bug in your code.(but, it's trivial)

if bytes_written becomes larger than io.file.fsync_interval and if len remains,
after that, fsync will be called until loop is finished.

static size_t rioFileWrite(rio *r, const void *buf, size_t len) {
    size_t bytes_written = 0;
    while (len) {
        size_t bytes_to_write = (r->io.file.fsync_interval && r->io.file.fsync_interval < len) ? r->io.file.fsync_interval : len;
        if (fwrite((char*)buf + bytes_written,bytes_to_write,1,r->io.file.fp) != 1)
            return 0;
        bytes_written += bytes_to_write;
        len -= bytes_to_write;

        //There is a possibility to call fsync many times(not intented)
        //if bytes_written is larger than io.file.fsync_interval and if there are more data in current loop
        if (r->io.file.fsync_interval && r->processed_bytes/r->io.file.fsync_interval < (r->processed_bytes + bytes_written)/r->io.file.fsync_interval)
            fsync(fileno(r->io.file.fp));
    }
    return 1;
 }

@yoav-steinberg
Copy link
Contributor

@charsyam, keen eye and sorry for keeping posting these links :) anyway I fixed that bug shortly after, see: https://github.com/GarantiaData/redis/commit/785cf9297733e32110daed8ca8b16169ea4288b5

JackieXie168 pushed a commit to JackieXie168/redis that referenced this issue Aug 29, 2016
This prevents the kernel from putting too much stuff in the output
buffers, doing too heavy I/O all at once. So the goal of this commit is
to split the disk pressure due to the AOF rewrite process into smaller
spikes.

Please see issue redis#1019 for more information.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants