Fix regression in do_write(s) causing significant performance issues when using large (>10meg) writes #706

jaymzjulian · 2023-12-12T23:26:42Z

Adjust the buffer write function to clear the buffer once, rather than piece by piece. This avoids a case where a large write (in our case, around 70mbytes) will consume 100% of CPU. This takes a webrick GET request via SSL from around 200kbyts/sec and consuming 100% of a core, to line speed on gigabit ethernet and 6% cpu utlization.

(The reason was that removing the head of the buffer was a full copy every single write block, which turned out to be 16k for us, hence every 16k copying the 77meg of data that went in the front of it).

Passes tests, and has been tested on our systems with chef-zero

lib/openssl/buffering.rb

rhenium · 2024-01-13T10:32:15Z

Could you share the benchmark?

I'm not sure how this change can be a massive performance improvement, since

(The reason was that removing the head of the buffer was a full copy every single write block, which turned out to be 16k for us, hence every 16k copying the 77meg of data that went in the front of it).

this is not supposed to happen.

jaymzjulian · 2024-01-14T04:36:53Z

huh, what's wild is that it was right in my test harness, but I managed to lose it somewhere in committing - I lost it when I was addressing the exception issue with a bad copy/paste, and then it didn't hit when I was testing it locally.... clearly, I need to be more careful with tests! (I've fixed this in the PR now)

…

On Sat, Jan 13, 2024 at 2:25 AM Kazuki Yamaguchi ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In lib/openssl/buffering.rb <#706 (comment)>: > @@ -345,13 +345,18 @@ def do_write(s) @wbuffer << s @wbuffer.force_encoding(Encoding::BINARY) @sync ||= false - if @sync or @wbuffer.size > BLOCK_SIZE - until @wbuffer.empty? - begin - nwrote = ***@***.***) - rescue Errno::EAGAIN - retry + buffer_size = @wbuffer.size + if @sync or buffer_size > BLOCK_SIZE + nwrote = 0 + begin + while nwrote < buffer_size do + begin + nwrote += ***@***.***) ⬇️ Suggested change - nwrote += ***@***.***) + nwrote += ***@***.***[nwrote, buffer_size - nwrote]) @wbuffer is no longer updated after each syswrite so this change is required. — Reply to this email directly, view it on GitHub <#706 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGAHFZBDD7LUAH5D3LW5WDYOJOLDAVCNFSM6AAAAABASIHSN6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQMJZHE3DSMRZGY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

jaymzjulian · 2024-01-14T04:59:20Z

So my original test is serving a large file via chef-zero, but I managed to make a small reproduce case since it makes sense that someone other us needs to be able to reproduce it. This matches the behaviour of that exactly, and is fast in ruby 2.7, but very very slow in ruby 3.0: ``` require 'webrick' require 'webrick/https' class FileReader < WEBrick::HTTPServlet::AbstractServlet def do_GET(request, response) response.content_type = "binary/octet-stream" response.body = IO.read(File.expand_path("~/webroot/test.tgz")) end end root = File.expand_path '~/webroot' cert_name = 'CN=localhost' server = WEBrick::HTTPServer.new :Port => 8443, :DocumentRoot => root, :SSLEnable => true, :SSLCertName => cert_name server.mount "/", FileReader trap 'INT' do server.shutdown end server.start ``` With this patch, it's as fast in ruby 3.0 as well. While I realise that using IO.read is not the most ruby efficient way to do things, it still shouldn't regress to 1% of performance between releases EDIT: the file i'm using for this is a 80mbyte tar.gz file - i test by ensuring i can curl and pipe thgat into tar so that i know it's returning valid data

…

On Sat, Jan 13, 2024 at 2:32 AM Kazuki Yamaguchi ***@***.***> wrote: Could you share the benchmark? I'm not sure how this change can be a massive performance improvement, since (The reason was that removing the head of the buffer was a full copy every single write block, which turned out to be 16k for us, hence every 16k copying the 77meg of data that went in the front of it). this is not supposed to happen. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

junaruga · 2024-01-14T10:10:33Z

Apart from the main topic, the code block on the comment above is not displayed properly on the page. Perhaps the code block syntax may not work when replying via email.

jaymzjulian · 2024-01-20T06:56:40Z

Apart from the main topic, the code block on the comment above is not displayed properly on the page. Perhaps the code block syntax may not work when replying via email.

Apologies, it does not

require 'webrick'
require 'webrick/https'

class FileReader < WEBrick::HTTPServlet::AbstractServlet
  def do_GET(request, response)
    response.content_type = "binary/octet-stream"
    response.body = IO.read(File.expand_path("~/webroot/test.tgz"))
  end
end

root = File.expand_path '~/webroot'
cert_name = 'CN=localhost'
server = WEBrick::HTTPServer.new :Port => 8443, :DocumentRoot => root,
:SSLEnable => true, :SSLCertName => cert_name
server.mount "/", FileReader

trap 'INT' do server.shutdown end
server.start

is what I was going for with the test case code

ioquatix · 2024-02-03T00:33:49Z

Similar issue: https://blog.mattstuchlik.com/2024/01/31/sneaky-one-liner.html

cc @s7nfo

s7nfo · 2024-02-03T05:50:12Z

Very cool, this looks exactly like what I ran into.

this is not supposed to happen.

I suggest running ‘ltrace -e memcpy’ against a reproducer like I did in the blog post, the issue becomes very apparent.

ioquatix · 2024-02-03T07:19:03Z

Ruby Strings for IO are surprisingly hard to get correct / "efficient" and there are a bunch of land mines we need to avoid...

rhenium · 2024-02-05T09:56:17Z

I reproduced it with the webrick example using an older Ruby version. This is a regression introduced in commit acc8079, which went to ruby/openssl v2.1.3 and Ruby 2.6. #484 also reported this, but I didn't realize it was this bad.

This occurs in Ruby <= 3.1 where string[0, n] = "" would copy the content if string is already a shared string. It's fixed in Ruby 3.2.

I didn't merge this PR right away because this costs one extra string object allocation per write, but we should fix it if it affects Ruby 3.1 which has more than 1 year until EOL.

Ruby Strings for IO are surprisingly hard to get correct / "efficient" and there are a bunch of land mines we need to avoid...

Agreed...

rhenium · 2024-02-05T10:04:58Z

I think this should be applied to stable branches too.

@jaymzjulian could you rebase this on top of maint-3.0 branch (the oldest branch we fix non-secuirty bugs) and fix commit message formatting?

rhenium · 2024-02-05T10:06:50Z

lib/openssl/buffering.rb

+      begin
+        while nwrote < buffer_size do
+          begin
+            nwrote += syswrite(@wbuffer[nwrote..buffer_size])


Suggested change

nwrote += syswrite(@wbuffer[nwrote..buffer_size])

nwrote += syswrite(@wbuffer[nwrote, buffer_size - nwrote])

can save one Range object allocation in the loop.

Oh hey, this is totally better - updated!

This causes significant performance issues when using large (>10meg) writes Fix by adjusting the buffer write function to clear the buffer once, rather than piece by piece, avoiding a case where a large write (in our case, around 70mbytes) will consume 100% of CPU. This takes a webrick GET request via SSL from around 200kbyts/sec and consuming 100% of a core, to line speed on gigabit ethernet and 6% cpu utlization.

jaymzjulian · 2024-02-05T23:04:29Z

I think this should be applied to stable branches too.

@jaymzjulian could you rebase this on top of maint-3.0 branch (the oldest branch we fix non-secuirty bugs) and fix commit message formatting?

This is now rebased to target maint-3.0, as well as with better formatting in the commit message. Thanks!

ioquatix · 2024-02-06T02:31:44Z

@rhenium are you good to merge this?

rhenium · 2024-03-21T11:45:54Z

(Sorry for the long delay)

Yes, it looks good to me. Thank you!

ioquatix requested a review from rhenium December 28, 2023 10:28

ioquatix approved these changes Dec 28, 2023

View reviewed changes

ioquatix reviewed Dec 28, 2023

View reviewed changes

lib/openssl/buffering.rb Outdated Show resolved Hide resolved

jaymzjulian force-pushed the fix-large-buffered-write-regression branch from 2dee91d to d114c5e Compare January 1, 2024 04:30

rhenium reviewed Jan 13, 2024

View reviewed changes

lib/openssl/buffering.rb Outdated Show resolved Hide resolved

jaymzjulian force-pushed the fix-large-buffered-write-regression branch from d114c5e to 0140ba8 Compare January 14, 2024 04:34

jaymzjulian force-pushed the fix-large-buffered-write-regression branch 4 times, most recently from d3b3d14 to 5b008f8 Compare January 14, 2024 05:07

s7nfo mentioned this pull request Feb 5, 2024

Do not truncate strings inefficiently ruby/ruby#9832

Closed

rhenium reviewed Feb 5, 2024

View reviewed changes

jaymzjulian changed the base branch from master to maint-3.0 February 5, 2024 23:03

jaymzjulian force-pushed the fix-large-buffered-write-regression branch from 5b008f8 to 0768730 Compare February 5, 2024 23:03

jaymzjulian force-pushed the fix-large-buffered-write-regression branch from 0768730 to d4389b4 Compare February 5, 2024 23:03

rhenium merged commit 1bb0ce2 into ruby:maint-3.0 Mar 21, 2024
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix regression in do_write(s) causing significant performance issues when using large (>10meg) writes #706

Fix regression in do_write(s) causing significant performance issues when using large (>10meg) writes #706

jaymzjulian commented Dec 12, 2023 •

edited

Loading

rhenium commented Jan 13, 2024

jaymzjulian commented Jan 14, 2024 via email

jaymzjulian commented Jan 14, 2024 via email •

edited

Loading

junaruga commented Jan 14, 2024 •

edited

Loading

jaymzjulian commented Jan 20, 2024

ioquatix commented Feb 3, 2024

s7nfo commented Feb 3, 2024

ioquatix commented Feb 3, 2024

rhenium commented Feb 5, 2024

rhenium commented Feb 5, 2024

rhenium Feb 5, 2024

jaymzjulian Feb 5, 2024

jaymzjulian commented Feb 5, 2024

ioquatix commented Feb 6, 2024

rhenium commented Mar 21, 2024

	nwrote += syswrite(@wbuffer[nwrote..buffer_size])
	nwrote += syswrite(@wbuffer[nwrote, buffer_size - nwrote])

Fix regression in do_write(s) causing significant performance issues when using large (>10meg) writes #706

Fix regression in do_write(s) causing significant performance issues when using large (>10meg) writes #706

Conversation

jaymzjulian commented Dec 12, 2023 • edited Loading

rhenium commented Jan 13, 2024

jaymzjulian commented Jan 14, 2024 via email

jaymzjulian commented Jan 14, 2024 via email • edited Loading

junaruga commented Jan 14, 2024 • edited Loading

jaymzjulian commented Jan 20, 2024

ioquatix commented Feb 3, 2024

s7nfo commented Feb 3, 2024

ioquatix commented Feb 3, 2024

rhenium commented Feb 5, 2024

rhenium commented Feb 5, 2024

rhenium Feb 5, 2024

Choose a reason for hiding this comment

jaymzjulian Feb 5, 2024

Choose a reason for hiding this comment

jaymzjulian commented Feb 5, 2024

ioquatix commented Feb 6, 2024

rhenium commented Mar 21, 2024

jaymzjulian commented Dec 12, 2023 •

edited

Loading

jaymzjulian commented Jan 14, 2024 via email •

edited

Loading

junaruga commented Jan 14, 2024 •

edited

Loading