Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements for TLS with I/O threads #1271

Merged
merged 1 commit into from
Nov 18, 2024

Conversation

uriyage
Copy link
Contributor

@uriyage uriyage commented Nov 7, 2024

Main thread profiling revealed significant overhead in TLS operations, even with read/write offloaded to I/O threads:

Perf results:

10.82% 8.82% valkey-server libssl.so.3 [.] SSL_pending # Called by main thread after I/O completion

10.16% 5.06% valkey-server libcrypto.so.3 [.] ERR_clear_error # Called for every event regardless of thread handling

This commit further optimizes TLS operations by moving more work from the main thread to I/O threads:

Improve TLS offloading to I/O threads with two main changes:

  1. Move ERR_clear_error() calls closer to SSL operations

    • Currently, error queue is cleared for every TLS event
    • Now only clear before actual SSL function calls
    • This prevents unnecessary clearing in main thread when operations
      are handled by I/O threads
  2. Optimize SSL_pending() checks

    • Add TLS_CONN_FLAG_HAS_PENDING flag to track pending data
    • Move pending check to follow read operations immediately
    • I/O thread sets flag when pending data exists
    • Main thread uses flag to update pending list

Performance improvements:
Testing setup based on https://valkey.io/blog/unlock-one-million-rps-part2/

Before:

  • SET: 896,047 ops/sec
  • GET: 875,794 ops/sec

After:

  • SET: 985,784 ops/sec (+10% improvement)
  • GET: 1,066,171 ops/sec (+22% improvement)

Copy link

codecov bot commented Nov 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.68%. Comparing base (1c18c80) to head (e65c2a0).
Report is 21 commits behind head on unstable.

Additional details and impacted files
@@            Coverage Diff            @@
##           unstable    #1271   +/-   ##
=========================================
  Coverage     70.68%   70.68%           
=========================================
  Files           114      115    +1     
  Lines         63150    63158    +8     
=========================================
+ Hits          44637    44643    +6     
- Misses        18513    18515    +2     
Files with missing lines Coverage Δ
src/tls.c 100.00% <ø> (ø)

... and 23 files with indirect coverage changes

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I'd like @madolson to take a look too.

Impressing performance result, and this is with more lines deleted than added!

Does the change have any performance impact without I/O threads enabled?

@zuiderkwast zuiderkwast requested a review from madolson November 8, 2024 00:26
@madolson
Copy link
Member

madolson commented Nov 8, 2024

Will check tomorrow, but @uriyage please fix the DCO when you get a chance.

@ranshid
Copy link
Member

ranshid commented Nov 10, 2024

Looks good to me, but I'd like @madolson to take a look too.

Impressing performance result, and this is with more lines deleted than added!

Does the change have any performance impact without I/O threads enabled?

I would be surprised if there was any performance improvement when io-threads are not used. This will actually cause SSL_pending and ERR_clear_error to be called multiple times in some scenarios where we are reading multiple times on the read handler. BUT I think those cases are rare and AFAIK are mostly limited to cases like the connSyncReadLine.

src/tls.c Outdated
ERR_clear_error();
int ret = SSL_read(conn->ssl, ptr, size);
ret = updateStateAfterSSLIO(conn, ret, 0);
int ret = connTLSRead(conn_, ptr, size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will cause us to potentially keep updating the event in updateStateAfterSSLIO. Same for the connTLSSyncReadLine. Maybe you want to create an internal function which will also get the update_event indication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I reverted this change. Instead, we will call updatePendingFlags explicitly.

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than Ran's comment, this looks good to me. Nice perf results!

@madolson madolson added performance release-notes This issue should get a line item in the release notes pending-missing-dco For PRs that are blocked because they are missing a dco labels Nov 12, 2024
@uriyage
Copy link
Contributor Author

uriyage commented Nov 16, 2024

Looks good to me, but I'd like @madolson to take a look too.

Impressing performance result, and this is with more lines deleted than added!

Does the change have any performance impact without I/O threads enabled?

As expected, there is no performance difference without I/O threads as the behavior remains the same.

I would be surprised if there was any performance improvement when io-threads are not used. This will actually cause SSL_pending and ERR_clear_error to be called multiple times in some scenarios where we are reading multiple times on the read handler. BUT I think those cases are rare and AFAIK are mostly limited to cases like the connSyncReadLine.

The only change is indeed for connSyncReadLine, but as we use it to call the read syscall one byte at a time, I believe performance is not our concern in this case.

@uriyage uriyage force-pushed the tls_io_threads_improvments branch from 62adf80 to e65c2a0 Compare November 16, 2024 17:25
@zuiderkwast zuiderkwast removed the pending-missing-dco For PRs that are blocked because they are missing a dco label Nov 16, 2024
@madolson madolson merged commit 94113fd into valkey-io:unstable Nov 18, 2024
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance release-notes This issue should get a line item in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants