Hard-coded limit of nested reply depth #794

jeffreylovitz · 2020-04-21T22:06:02Z

Hello,

Hiredis has the following restriction:

Lines 418 to 423 in 0501c62

    
           /* Set error for nested multi bulks with depth > 7 */ 
        
           if (r->ridx == 8) { 
        
               __redisReaderSetError(r,REDIS_ERR_PROTOCOL, 
        
                   "No support for nested multi bulk replies with depth > 7"); 
        
               return REDIS_ERR; 
        
           }

This is a limitation that RedisGraph can hit when processing complex but sane queries.

Could this limit be relaxed?

maguec · 2020-04-21T22:07:23Z

👍

michael-grunder · 2020-04-22T00:35:55Z

I think it's reasonable to remove this limit, and certainly before we release v1.0.0.

jeffreylovitz · 2020-04-22T01:36:50Z

@michael-grunder That's fantastic, thanks!

This commit removes the nested multi-bulk depth limitation of 7. We do this by switching to pointer to pointer indirection and growing the stack in chunks when needed. Initial benchmarking indicates the worse cache locality seems to be offset by the reduction in overall redisReader size, resulting in nearly identical parsing performance to the exiting master branch. See: redis#794, redis#421

michael-grunder · 2020-04-25T23:00:41Z

I took a crack at removing the depth limitation over on my fork of hiredis.

Link to the branch in question

The change switches from a hard-coded stack to a dynamic one that grows as needed. I would have expected this to degrade performance (due to the worse cache locality) but it appears offset by the smaller redisReader size, because the benchmarks are nearly identical.

I'd love for other people to give it a try though. If it seems reasonably stable I'll open a PR here and we can more formally discuss the change.

Edit: Looks like this does incur a ~12% penalty when parsing things like pure integer replies (sample benchmark here). I'm going to rework the logic to lazily allocate our stack such that we can avoid the allocations unless we need them.

Cheers,
Mike

jeffreylovitz · 2020-04-27T15:13:20Z

I'd be happy to do some testing! Would you prefer to share scripts like raw-bench-unlimited-depth for comparability, or should I test as I see fit?

michael-grunder · 2020-04-27T17:51:27Z

I just put together a quick and dirty repository to directly test the changes to hiredis.

I cleaned it up a bit and uploaded it here

The only place where I'm seeing a noticeable performance degradation is when I literally do something like this:

size_t benchReader(const char *protofile) {
    redisReader *rdr = redisReaderCreate();
    size_t count = 0;
    char buffer[32768];
    size_t read;
    FILE *fp = openOrAbort(protofile);

    while ((read = fread(buffer, 1, sizeof(buffer), fp)) != 0) {
        redisReaderFeed(rdr, buffer,  read);
        count += consumeReplies(rdr);
    }

    if (read > 0) {
        redisReaderFeed(rdr, buffer, read);
        count += consumeReplies(rdr);
    }

    redisReaderFree(rdr);
    fclose(fp);

    return count;
}

In the above code I am manually creating a redisReader and feeding it raw protocol data. When all of the data are integer replies I see the slowdown. More complex protocol data performs as well in either branch.

You can use the PHP script to generate your own raw protocol data:

$ php resp-protogen/proto.php --count 1000000 --type multibulk --bulkleaf rand >| /tmp/mbk.1m.proto

I think one can argue that the raw reader benchmark is a micro benchmark that is likely acceptable given that the slowdown fades into noise when I use hiredis as a reply parser (and especially if I'm, actually communicating with Redis).

Anyway, let me know if you have any questions

jeffreylovitz · 2020-04-27T19:46:25Z

I ran the raw benchmark scripts and have placed their outputs here:
https://gist.github.com/jeffreylovitz/760359528751c5a42b2c7f9f5710fe7d

The ./bench-* scripts are only reporting 1 reply per run, as this condition never triggers:
https://github.com/michael-grunder/reader-bench/blob/master/bench.c#L37

michael-grunder · 2020-04-27T20:10:26Z

The ./bench-* scripts are only reporting 1 reply per run, as this condition never triggers:
https://github.com/michael-grunder/reader-bench/blob/master/bench.c#L37

My apologies. This process won't really work now because it expects the protocol data to be separated by a delimiter character, which I removed after realizing that I was also bench-marking my protoFile code in addition to hiredis.

I added the option back if you would like to use the non-raw benchmarking programs too

Example usage:

# Note the --delim '^' argument to proto.php
php resp-protogen/proto.php --count 100000 --type multibulk --bulkleaf rand --delim '^' >| /tmp/mbk.100k.delim.proto
./bench-master /tmp/mbk.100k.delim.proto 
./bench-unlimited-depth /tmp/mbk.100k.delim.proto

This specific test is where the new code is slower for me:

Raw protocol parsing only integer replies

Thanks for running the benchmarks on another system and please let me know if you encounter any bugs/crashes/etc.

jeffreylovitz · 2020-04-27T20:57:34Z

Oh, that's fine! I was just trying for completeness.

I performed a few raw benchmark tests that will hopefully focus on your integer reply case:
https://gist.github.com/jeffreylovitz/760359528751c5a42b2c7f9f5710fe7d

For me, the difference between master and unlimited-depth is effectively nonexistent.

michael-grunder · 2020-04-27T21:58:16Z

That is interesting!

Perhaps it comes down to AMD CPUs being less clever for this specific workload (I'm running a Ryzen 1800X).

Given these numbers I don't think it's worth doing something more exotic (and error prone). I'll go ahead and open the PR so more people can try the change out and provide input.

Thanks again for running the tests on your machine!

jeffreylovitz · 2020-04-27T22:06:29Z

Happy to help! Thanks for getting to this so quickly!

* Remove nested depth limitation. This commit removes the nested multi-bulk depth limitation of 7. We do this by switching to pointer to pointer indirection and growing the stack in chunks when needed. See: #794, #421

michael-grunder · 2020-05-04T19:24:02Z

Closing via #797

michael-grunder added this to the 1.0.0 milestone Apr 22, 2020

michael-grunder self-assigned this Apr 22, 2020

michael-grunder mentioned this issue Apr 22, 2020

Remove the depth limitation #421

Closed

michael-grunder mentioned this issue Apr 28, 2020

Remove nested depth limitation. #797

Merged

michael-grunder closed this as completed May 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hard-coded limit of nested reply depth #794

Hard-coded limit of nested reply depth #794

jeffreylovitz commented Apr 21, 2020

maguec commented Apr 21, 2020

michael-grunder commented Apr 22, 2020

jeffreylovitz commented Apr 22, 2020

michael-grunder commented Apr 25, 2020 •

edited

Loading

jeffreylovitz commented Apr 27, 2020

michael-grunder commented Apr 27, 2020

jeffreylovitz commented Apr 27, 2020

michael-grunder commented Apr 27, 2020 •

edited

Loading

jeffreylovitz commented Apr 27, 2020

michael-grunder commented Apr 27, 2020

jeffreylovitz commented Apr 27, 2020

michael-grunder commented May 4, 2020

Hard-coded limit of nested reply depth #794

Hard-coded limit of nested reply depth #794

Comments

jeffreylovitz commented Apr 21, 2020

maguec commented Apr 21, 2020

michael-grunder commented Apr 22, 2020

jeffreylovitz commented Apr 22, 2020

michael-grunder commented Apr 25, 2020 • edited Loading

jeffreylovitz commented Apr 27, 2020

michael-grunder commented Apr 27, 2020

jeffreylovitz commented Apr 27, 2020

michael-grunder commented Apr 27, 2020 • edited Loading

jeffreylovitz commented Apr 27, 2020

michael-grunder commented Apr 27, 2020

jeffreylovitz commented Apr 27, 2020

michael-grunder commented May 4, 2020

michael-grunder commented Apr 25, 2020 •

edited

Loading

michael-grunder commented Apr 27, 2020 •

edited

Loading