Add new replicate module API to bypass command validation #1357

sungming2 · 2024-11-26T07:53:07Z

Issue #1175

Problem

Performance degradation occurs due to the sanity check(lookupCommandByCString) in the VM_Replicate function, which converts const char* into sds and free them for command dictionary lookup. This check is mainly used for debugging purpose, but it is unnecessary for trusted modules. The user seeks a way to bypass this check for performance gains.

Solution

Adding a module API with flags (preferably enums, e.g., SKIP_VALIDATION) allows trusted modules to bypass validation for performance improvements while retaining flexibility for handling failure scenarios on replicas.

Test

Unit test

Follow-up work

Valkey document update required: https://valkey.io/topics/modules-api-ref/#section-commands-replication-api

sungming2 · 2024-11-26T07:59:46Z

src/valkeymodule.h

+typedef enum {
+    VALKEYMODULE_FLAG_DEFAULT = 0,     /* Default behavior */
+    VALKEYMODULE_FLAG_SKIP_VALIDATION, /* Skip validation */
+} ValkeyModuleFlag;


Can I get some suggestions if this naming is too general

Is there benefit set the flag as enum, more flags later? Can we just use int skip (0 for non-skip, 1 for skip) ?

Signed-off-by: Seungmin Lee <[email protected]>

codecov · 2024-11-26T08:18:12Z

Codecov Report

Attention: Patch coverage is 3.84615% with 25 lines in your changes missing coverage. Please review.

Project coverage is 70.77%. Comparing base (469d41f) to head (24e6573).
Report is 1 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/module.c	3.84%	25 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1357      +/-   ##
============================================
+ Coverage     70.62%   70.77%   +0.14%     
============================================
  Files           117      117              
  Lines         63313    63324      +11     
============================================
+ Hits          44716    44817     +101     
+ Misses        18597    18507      -90

Files with missing lines	Coverage Δ
src/module.c	`9.64% <3.84%> (-0.01%)`	⬇️

... and 9 files with indirect coverage changes

sungming2 · 2024-11-26T20:36:54Z

Open a pr for documentation of this: valkey-io/valkey-doc#192

hwware · 2024-11-27T16:03:10Z

src/valkeymodule.h

+typedef enum {
+    VALKEYMODULE_FLAG_DEFAULT = 0,     /* Default behavior */
+    VALKEYMODULE_FLAG_SKIP_VALIDATION, /* Skip validation */
+} ValkeyModuleFlag;


Is there benefit set the flag as enum, more flags later? Can we just use int skip (0 for non-skip, 1 for skip) ?

hwware · 2024-11-27T16:04:37Z

src/valkeymodule.h

@@ -1092,6 +1097,8 @@ VALKEYMODULE_API int (*ValkeyModule_StringToStreamID)(const ValkeyModuleString *
 VALKEYMODULE_API void (*ValkeyModule_AutoMemory)(ValkeyModuleCtx *ctx) VALKEYMODULE_ATTR;
 VALKEYMODULE_API int (*ValkeyModule_Replicate)(ValkeyModuleCtx *ctx, const char *cmdname, const char *fmt, ...)
    VALKEYMODULE_ATTR;
+VALKEYMODULE_API int (*ValkeyModule_ReplicateWithFlag)(ValkeyModuleCtx *ctx, ValkeyModuleFlag flag, const char *cmdname, const char *fmt, ...)


If ValkeyModuleFlag can be set as int, here we can pass int value;

hwware · 2024-11-27T16:05:42Z

src/module.c

+/* Helper function for VM_Replicate and VM_ReplicateWithFlag to replicate the specified command
+ * and arguments to replicas and AOF, as effect of execution of the calling command implementation.
+ * Skip command validation if the ValkeyModuleFlag is set to VALKEYMODULE_FLAG_SKIP_VALIDATION. */
+int moduleReplicate(ValkeyModuleCtx *ctx, ValkeyModuleFlag flag, const char *cmdname, const char *fmt, va_list ap) {


Pls move this function to just above the VM_Replicate, reference the function zsetInitLexRange. It is a helper function too.

hwware · 2024-11-27T16:12:22Z

The PR comes from a performance issue about current VM_Replicated, then could you pls provide the result how much performance improvement on this PR?

sungming2 · 2024-11-28T01:09:19Z

I don't see notable performance improvement in the multiple tests (1. benchmark 2. CPU utilization, 3. CPU profiling with perf) so I wonder if it is worth introducing this new API. @hwware @hpatro

1. Benchmark

With validation

====== eval return redis.call("propagate-test.validation") 0 ======                                                     
  100000 requests completed in 0.71 seconds
  50 parallel clients
  74 bytes payload
  keep alive: 1
  host configuration "save": 3600 1 300 100 60 10000
  host configuration "appendonly": no
  multi-thread: no

Latency by percentile distribution:
0.000% <= 0.111 milliseconds (cumulative count 1)
50.000% <= 0.295 milliseconds (cumulative count 54440)
75.000% <= 0.335 milliseconds (cumulative count 75261)
87.500% <= 0.383 milliseconds (cumulative count 88850)
93.750% <= 0.407 milliseconds (cumulative count 94702)
96.875% <= 0.423 milliseconds (cumulative count 96981)
98.438% <= 0.479 milliseconds (cumulative count 98634)
99.219% <= 0.503 milliseconds (cumulative count 99282)
99.609% <= 0.599 milliseconds (cumulative count 99611)
99.805% <= 0.791 milliseconds (cumulative count 99811)
99.902% <= 0.999 milliseconds (cumulative count 99903)
99.951% <= 1.367 milliseconds (cumulative count 99952)
99.976% <= 1.615 milliseconds (cumulative count 99976)
99.988% <= 1.679 milliseconds (cumulative count 99989)
99.994% <= 1.687 milliseconds (cumulative count 99995)
99.997% <= 1.799 milliseconds (cumulative count 99997)
99.998% <= 1.831 milliseconds (cumulative count 99999)
99.999% <= 1.895 milliseconds (cumulative count 100000)
100.000% <= 1.895 milliseconds (cumulative count 100000)

Cumulative distribution of latencies:
0.000% <= 0.103 milliseconds (cumulative count 0)
3.697% <= 0.207 milliseconds (cumulative count 2697)
60.307% <= 0.303 milliseconds (cumulative count 60307)
94.702% <= 0.407 milliseconds (cumulative count 94702)
99.282% <= 0.503 milliseconds (cumulative count 99282)
99.620% <= 0.607 milliseconds (cumulative count 99620)
99.725% <= 0.703 milliseconds (cumulative count 99725)
99.826% <= 0.807 milliseconds (cumulative count 99826)
99.878% <= 0.903 milliseconds (cumulative count 99878)
99.904% <= 1.007 milliseconds (cumulative count 99904)
99.919% <= 1.103 milliseconds (cumulative count 99919)
99.928% <= 1.207 milliseconds (cumulative count 99928)
99.939% <= 1.303 milliseconds (cumulative count 99939)
99.962% <= 1.407 milliseconds (cumulative count 99962)
99.968% <= 1.503 milliseconds (cumulative count 99968)
99.975% <= 1.607 milliseconds (cumulative count 99975)
99.995% <= 1.703 milliseconds (cumulative count 99995)
99.998% <= 1.807 milliseconds (cumulative count 99998)
100.000% <= 1.903 milliseconds (cumulative count 100000)

Summary:
  throughput summary: 141043.72 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.305     0.104     0.295     0.415     0.495     1.895

Bypass validation

====== eval return redis.call("propagate-test.novalidation") 0 ======                                                     
  100000 requests completed in 0.69 seconds
  50 parallel clients
  76 bytes payload
  keep alive: 1
  host configuration "save": 3600 1 300 100 60 10000
  host configuration "appendonly": no
  multi-thread: no

Latency by percentile distribution:
0.000% <= 0.119 milliseconds (cumulative count 1)
50.000% <= 0.279 milliseconds (cumulative count 52272)
75.000% <= 0.327 milliseconds (cumulative count 76134)
87.500% <= 0.367 milliseconds (cumulative count 87689)
93.750% <= 0.399 milliseconds (cumulative count 95285)
96.875% <= 0.423 milliseconds (cumulative count 97098)
98.438% <= 0.471 milliseconds (cumulative count 98588)
99.219% <= 0.543 milliseconds (cumulative count 99244)
99.609% <= 0.759 milliseconds (cumulative count 99619)
99.805% <= 0.927 milliseconds (cumulative count 99810)
99.902% <= 1.087 milliseconds (cumulative count 99903)
99.951% <= 1.271 milliseconds (cumulative count 99952)
99.976% <= 1.527 milliseconds (cumulative count 99977)
99.988% <= 1.575 milliseconds (cumulative count 99989)
99.994% <= 1.623 milliseconds (cumulative count 99994)
99.997% <= 1.663 milliseconds (cumulative count 99997)
99.998% <= 1.687 milliseconds (cumulative count 99999)
99.999% <= 1.719 milliseconds (cumulative count 100000)
100.000% <= 1.719 milliseconds (cumulative count 100000)

Cumulative distribution of latencies:
0.000% <= 0.103 milliseconds (cumulative count 0)
4.562% <= 0.207 milliseconds (cumulative count 4562)
67.223% <= 0.303 milliseconds (cumulative count 67223)
96.219% <= 0.407 milliseconds (cumulative count 96219)
99.021% <= 0.503 milliseconds (cumulative count 99021)
99.438% <= 0.607 milliseconds (cumulative count 99438)
99.562% <= 0.703 milliseconds (cumulative count 99562)
99.678% <= 0.807 milliseconds (cumulative count 99678)
99.787% <= 0.903 milliseconds (cumulative count 99787)
99.858% <= 1.007 milliseconds (cumulative count 99858)
99.910% <= 1.103 milliseconds (cumulative count 99910)
99.944% <= 1.207 milliseconds (cumulative count 99944)
99.954% <= 1.303 milliseconds (cumulative count 99954)
99.956% <= 1.407 milliseconds (cumulative count 99956)
99.973% <= 1.503 milliseconds (cumulative count 99973)
99.993% <= 1.607 milliseconds (cumulative count 99993)
99.999% <= 1.703 milliseconds (cumulative count 99999)
100.000% <= 1.807 milliseconds (cumulative count 100000)

Summary:
  throughput summary: 144300.14 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.294     0.112     0.279     0.399     0.503     1.719

2. CPU utilization

while continuously calling VM_Replicate/ReplicateWithFlag with benchmark tool (benchmark -n -l)

With validation

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                    
  33349 sungming  20   0  144672  25056   5888 R  54.0   0.2  12:51.11 valkey-server

Bypass validation

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                    
  33349 sungming  20   0  144672  25064   5888 S  52.8   0.2  13:29.67 valkey-server

3. Performance profiling with `perf`

with validation

-    8.10%     0.10%  valkey-server    valkey-server       [.] VM_Replicate                                                    ▒
   - 8.00% VM_Replicate                                                                                                        ▒
      - 7.65% moduleReplicate                                                                                                  ▒
         - 3.08% lookupCommandBySdsLogic                                                                                       ▒
            - 1.21% dictFetchValue                                                                                             ▒
                 0.88% dictFind                                                                                                ▒
            - 1.10% sdssplitlen.constprop.1                                                                                    ▒
                 0.56% _sdsnewlen                                                                                              ▒
         - 1.91% moduleCreateArgvFromUserFormat                                                                                ▒
              1.12% createEmbeddedStringObject                                                                                 ▒
           0.85% _sdsnewlen                                                                                                    ▒
           0.60% alsoPropagate

bypass validation

-    3.84%     0.14%  valkey-server    valkey-server       [.] VM_ReplicateWithFlag                                            ▒
   - 3.70% VM_ReplicateWithFlag                                                                                                ▒
      - 3.47% moduleReplicate                                                                                                  ▒
         - 2.04% moduleCreateArgvFromUserFormat                                                                                ▒
              1.14% createEmbeddedStringObject                                                                                 ▒
              0.64% valkey_realloc                                                                                             ▒
           0.59% alsoPropagate

sungming2 force-pushed the module-skip-validation branch from 18763c9 to c54ba74 Compare November 26, 2024 07:54

sungming2 commented Nov 26, 2024

View reviewed changes

Add new replicate module API to bypass command validation

24e6573

Signed-off-by: Seungmin Lee <[email protected]>

sungming2 force-pushed the module-skip-validation branch from c54ba74 to 24e6573 Compare November 26, 2024 08:02

hpatro requested review from hpatro and hwware November 26, 2024 17:18

hpatro added the needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. label Nov 26, 2024

sungming2 mentioned this pull request Nov 26, 2024

Add ValkeyModule_ReplicateWithFlag description valkey-io/valkey-doc#192

Open

hwware reviewed Nov 27, 2024

View reviewed changes

hwware mentioned this pull request Nov 28, 2024

lookupCommandByCString in VM_Replicate #1175

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new replicate module API to bypass command validation #1357

Add new replicate module API to bypass command validation #1357

sungming2 commented Nov 26, 2024

sungming2 Nov 26, 2024

hwware Nov 27, 2024

codecov bot commented Nov 26, 2024 •

edited

Loading

sungming2 commented Nov 26, 2024

hwware Nov 27, 2024

hwware Nov 27, 2024

hwware Nov 27, 2024

hwware commented Nov 27, 2024

sungming2 commented Nov 28, 2024 •

edited

Loading

Add new replicate module API to bypass command validation #1357

Are you sure you want to change the base?

Add new replicate module API to bypass command validation #1357

Conversation

sungming2 commented Nov 26, 2024

Issue #1175

Problem

Solution

Test

Follow-up work

sungming2 Nov 26, 2024

Choose a reason for hiding this comment

hwware Nov 27, 2024

Choose a reason for hiding this comment

codecov bot commented Nov 26, 2024 • edited Loading

Codecov Report

sungming2 commented Nov 26, 2024

hwware Nov 27, 2024

Choose a reason for hiding this comment

hwware Nov 27, 2024

Choose a reason for hiding this comment

hwware Nov 27, 2024

Choose a reason for hiding this comment

hwware commented Nov 27, 2024

sungming2 commented Nov 28, 2024 • edited Loading

1. Benchmark

With validation

Bypass validation

2. CPU utilization

With validation

Bypass validation

3. Performance profiling with perf

with validation

bypass validation

codecov bot commented Nov 26, 2024 •

edited

Loading

sungming2 commented Nov 28, 2024 •

edited

Loading

3. Performance profiling with `perf`