Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard-coded limit of nested reply depth #794

Closed
jeffreylovitz opened this issue Apr 21, 2020 · 12 comments
Closed

Hard-coded limit of nested reply depth #794

jeffreylovitz opened this issue Apr 21, 2020 · 12 comments
Assignees
Milestone

Comments

@jeffreylovitz
Copy link

Hello,

Hiredis has the following restriction:

hiredis/read.c

Lines 418 to 423 in 0501c62

/* Set error for nested multi bulks with depth > 7 */
if (r->ridx == 8) {
__redisReaderSetError(r,REDIS_ERR_PROTOCOL,
"No support for nested multi bulk replies with depth > 7");
return REDIS_ERR;
}

This is a limitation that RedisGraph can hit when processing complex but sane queries.

Could this limit be relaxed?

@maguec
Copy link

maguec commented Apr 21, 2020

👍

@michael-grunder michael-grunder added this to the 1.0.0 milestone Apr 22, 2020
@michael-grunder michael-grunder self-assigned this Apr 22, 2020
@michael-grunder
Copy link
Collaborator

I think it's reasonable to remove this limit, and certainly before we release v1.0.0.

@jeffreylovitz
Copy link
Author

@michael-grunder That's fantastic, thanks!

michael-grunder added a commit to michael-grunder/hiredis that referenced this issue Apr 25, 2020
This commit removes the nested multi-bulk depth limitation of 7.
We do this by switching to pointer to pointer indirection and
growing the stack in chunks when needed.

Initial benchmarking indicates the worse cache locality seems to be
offset by the reduction in overall redisReader size, resulting in
nearly identical parsing performance to the exiting master branch.

See: redis#794, redis#421
@michael-grunder
Copy link
Collaborator

michael-grunder commented Apr 25, 2020

I took a crack at removing the depth limitation over on my fork of hiredis.

Link to the branch in question

The change switches from a hard-coded stack to a dynamic one that grows as needed. I would have expected this to degrade performance (due to the worse cache locality) but it appears offset by the smaller redisReader size, because the benchmarks are nearly identical.

I'd love for other people to give it a try though. If it seems reasonably stable I'll open a PR here and we can more formally discuss the change.

Edit: Looks like this does incur a ~12% penalty when parsing things like pure integer replies (sample benchmark here). I'm going to rework the logic to lazily allocate our stack such that we can avoid the allocations unless we need them.

Cheers,
Mike

@jeffreylovitz
Copy link
Author

I'd be happy to do some testing! Would you prefer to share scripts like raw-bench-unlimited-depth for comparability, or should I test as I see fit?

@michael-grunder
Copy link
Collaborator

I just put together a quick and dirty repository to directly test the changes to hiredis.

I cleaned it up a bit and uploaded it here

The only place where I'm seeing a noticeable performance degradation is when I literally do something like this:

size_t benchReader(const char *protofile) {
    redisReader *rdr = redisReaderCreate();
    size_t count = 0;
    char buffer[32768];
    size_t read;
    FILE *fp = openOrAbort(protofile);

    while ((read = fread(buffer, 1, sizeof(buffer), fp)) != 0) {
        redisReaderFeed(rdr, buffer,  read);
        count += consumeReplies(rdr);
    }

    if (read > 0) {
        redisReaderFeed(rdr, buffer, read);
        count += consumeReplies(rdr);
    }

    redisReaderFree(rdr);
    fclose(fp);

    return count;
}

In the above code I am manually creating a redisReader and feeding it raw protocol data. When all of the data are integer replies I see the slowdown. More complex protocol data performs as well in either branch.

You can use the PHP script to generate your own raw protocol data:

$ php resp-protogen/proto.php --count 1000000 --type multibulk --bulkleaf rand >| /tmp/mbk.1m.proto

I think one can argue that the raw reader benchmark is a micro benchmark that is likely acceptable given that the slowdown fades into noise when I use hiredis as a reply parser (and especially if I'm, actually communicating with Redis).

Anyway, let me know if you have any questions

@jeffreylovitz
Copy link
Author

I ran the raw benchmark scripts and have placed their outputs here:
https://gist.github.com/jeffreylovitz/760359528751c5a42b2c7f9f5710fe7d

The ./bench-* scripts are only reporting 1 reply per run, as this condition never triggers:
https://github.com/michael-grunder/reader-bench/blob/master/bench.c#L37

@michael-grunder
Copy link
Collaborator

michael-grunder commented Apr 27, 2020

The ./bench-* scripts are only reporting 1 reply per run, as this condition never triggers:
https://github.com/michael-grunder/reader-bench/blob/master/bench.c#L37

My apologies. This process won't really work now because it expects the protocol data to be separated by a delimiter character, which I removed after realizing that I was also bench-marking my protoFile code in addition to hiredis.

I added the option back if you would like to use the non-raw benchmarking programs too

Example usage:

# Note the --delim '^' argument to proto.php
php resp-protogen/proto.php --count 100000 --type multibulk --bulkleaf rand --delim '^' >| /tmp/mbk.100k.delim.proto
./bench-master /tmp/mbk.100k.delim.proto 
./bench-unlimited-depth /tmp/mbk.100k.delim.proto

This specific test is where the new code is slower for me:

Raw protocol parsing only integer replies

Thanks for running the benchmarks on another system and please let me know if you encounter any bugs/crashes/etc.

@jeffreylovitz
Copy link
Author

Oh, that's fine! I was just trying for completeness.

I performed a few raw benchmark tests that will hopefully focus on your integer reply case:
https://gist.github.com/jeffreylovitz/760359528751c5a42b2c7f9f5710fe7d

For me, the difference between master and unlimited-depth is effectively nonexistent.

@michael-grunder
Copy link
Collaborator

That is interesting!

Perhaps it comes down to AMD CPUs being less clever for this specific workload (I'm running a Ryzen 1800X).

Given these numbers I don't think it's worth doing something more exotic (and error prone). I'll go ahead and open the PR so more people can try the change out and provide input.

Thanks again for running the tests on your machine!

@jeffreylovitz
Copy link
Author

Happy to help! Thanks for getting to this so quickly!

michael-grunder added a commit that referenced this issue May 4, 2020
* Remove nested depth limitation.

This commit removes the nested multi-bulk depth limitation of 7.
We do this by switching to pointer to pointer indirection and
growing the stack in chunks when needed.

See: #794, #421
@michael-grunder
Copy link
Collaborator

Closing via #797

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants