Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Lift the thread limit to enable full concurrency #2145

Merged
merged 1 commit into from
Apr 2, 2020

Conversation

MOZGIII
Copy link
Contributor

@MOZGIII MOZGIII commented Mar 25, 2020

After switching to tokio-compat, the thread limit should not be required.

Closes #391.
Closes #1696.

We're not merging this until we do #1696.

@Hoverbear Hoverbear added domain: networking Anything related to Vector's networking type: performance labels Mar 25, 2020
@Hoverbear Hoverbear requested a review from LucioFranco March 25, 2020 17:27
After switching to tokio-compat, the thread limit should not
be required.

Signed-off-by: MOZGIII <[email protected]>
@MOZGIII MOZGIII force-pushed the thread-limit-lift branch from 08ab248 to 6305861 Compare March 26, 2020 13:58
@MOZGIII
Copy link
Contributor Author

MOZGIII commented Mar 26, 2020

/test -t tcp_to_tcp_performance -c big-vms

1 similar comment
@MOZGIII
Copy link
Contributor Author

MOZGIII commented Mar 26, 2020

/test -t tcp_to_tcp_performance -c big-vms

@github-actions
Copy link

Test harness invocation requested by #2145 (comment) is complete!

Something went wrong, see log for more details.

You can check the execution log to learn more!

@binarylogic
Copy link
Contributor

/test -t tcp_to_tcp_performance -c big-vms

@MOZGIII
Copy link
Contributor Author

MOZGIII commented Mar 28, 2020

@binarylogic it's unfortunately broken again. ☹️ See vectordotdev/vector-test-harness#45
I think we desperately need a CI flow for the test harness itself.

@github-actions
Copy link

Test harness invocation requested by #2145 (comment) is complete!

Something went wrong, see log for more details.

You can check the execution log to learn more!

@MOZGIII
Copy link
Contributor Author

MOZGIII commented Apr 1, 2020

/test -t tcp_to_tcp_performance -c big_vms

@binarylogic
Copy link
Contributor

🥁

@github-actions
Copy link

github-actions bot commented Apr 1, 2020

Test harness invocation requested by #2145 (comment) is complete!

Something went wrong, see log for more details.

You can check the execution log to learn more!

@github-actions
Copy link

github-actions bot commented Apr 1, 2020

Test harness invocation requested by #2145 (comment) is complete!


                                   __   __  __
                                   \ \ / / / /
                                    \ V / / /
                                     \_/  \/

                                   V E C T O R

--------------------------------------------------------------------------------
Test Comparison
Test name: tcp_to_tcp_performance
Test configuration: big_vms
Subject: vector
Versions: 
--------------------------------------------------------------------------------
Metric         
---------------
Test count     
Duration (avg) 
Duration (max) 
CPU sys (max)  
CPU usr (max)  
Load 1m (avg)  
Mem used (max) 
Disk read (avg)
Disk read (sum)
Disk writ (sum)
Net recv (avg) 
Net recv (sum) 
Net send (sum) 
TCP estab (avg)
TCP syn (avg)  
TCP close (avg)
--------------------------------------------------------------------------------
W = winner

You can check the execution log to learn more!

@binarylogic
Copy link
Contributor

It worked! I’ll fix the output so the results are displayed.

@MOZGIII
Copy link
Contributor Author

MOZGIII commented Apr 2, 2020

Great! This may be related: vectordotdev/vector-test-harness#47

@binarylogic
Copy link
Contributor

/test -t tcp_to_tcp_performance -c 16-cores

@github-actions
Copy link

github-actions bot commented Apr 2, 2020

Test harness invocation requested by #2145 (comment) is complete!

Something went wrong, see log for more details.

You can check the execution log to learn more!

@binarylogic
Copy link
Contributor

/test -t tcp_to_tcp_performance -c 16_cores

@binarylogic
Copy link
Contributor

binarylogic commented Apr 2, 2020

Here are the new results:

➜ aws-vault exec vector -- ./bin/compare -s vector -t tcp_to_tcp_performance -c 16_cores -v 0.6.0 -v 0.7.0 -v 0.8.2 -v nightly/2020-03-30 -v dev-thread-limit-lift-1-6305861 -r

                                   __   __  __
                                   \ \ / / / /
                                    \ V / / /
                                     \_/  \/

                                   V E C T O R

--------------------------------------------------------------------------------
Test Comparison
Test name: tcp_to_tcp_performance
Test configuration: 16_cores
Subject: vector
Versions: 0.6.0 0.7.0 0.8.2 nightly/2020-03-30 dev-thread-limit-lift-1-6305861
--------------------------------------------------------------------------------
Metric          | 0.6.0              | 0.7.0              | 0.8.2             | dev-thread-limit-lift-1-630... | nightly/2020-03-30
----------------|--------------------|--------------------|-------------------|--------------------------------|-------------------
Test count      | 1                  | 1                  | 1                 | 1                              | 1                 
Duration (avg)  | 63s                | 63s                | 64s               | 61s                            | 63s               
Duration (max)  | 63s                | 63s                | 64s               | 61s                            | 63s               
CPU sys (max)   | 0.8 W              | 0.9 (+17%)         | 2.5 (+227%)       | 1.1 (+49%)                     | 0.9 (+16%)        
CPU usr (max)   | 24.7 (+13%)        | 24.7 (+13%)        | 24.6 (+13%)       | 23.3 (+7%)                     | 21.7 W            
Load 1m (avg)   | 1.9 (+5%)          | 1.9 (+5%)          | 1.9 (+5%)         | 1.8 W                          | 1.9 (+2%)         
Mem used (max)  | 358.8 MiB (+1%)    | 353.7 MiB W        | 355.8 MiB (+0%)   | 381.5 MiB (+7%)                | 364.3 MiB (+2%)   
Disk read (avg) | 562.5 kib/s (-21%) | 558.5 kib/s (-21%) | 657.3 kib/s (-7%) | 453.4 kib/s (-36%)             | 712.8 kib/s W     
Disk read (sum) | 34.6 MiB (+28%)    | 34.4 MiB (+27%)    | 41.1 MiB (+52%)   | 27 MiB W                       | 43.9 MiB (+62%)   
Disk writ (sum) | 4.5 MiB (+43%)     | 3.1 MiB W          | 17.2 MiB (+454%)  | 27.4 MiB (+782%)               | 20.5 MiB (+562%)  
Net recv (avg)  | 97 MiB/s W         | 94.7 MiB/s (-2%)   | 76.5 MiB/s (-21%) | 61 MiB/s (-37%)                | 59 MiB/s (-39%)   
Net recv (sum)  | 6 gib W            | 5.8 gib (-2%)      | 4.8 gib (-19%)    | 3.6 gib (-39%)                 | 3.6 gib (-39%)    
Net send (sum)  | 6 gib              | 5.8 gib            | 4.8 gib           | 3.6 gib                        | 3.6 gib           
TCP estab (avg) | 429                | 427                | 428               | 424                            | 451               
TCP syn (avg)   | 0                  | 0                  | 0                 | 0                              | 0                 
TCP close (avg) | 0                  | 0                  | 0                 | 0                              | 0                 
--------------------------------------------------------------------------------
W = winner

I have suspicion that this is not actually raising the thread limit since nothing really changed. It would be nice to investigate CPU core usage when this test is actually running.

@binarylogic
Copy link
Contributor

I updated the above results with more versions. In 0.2.0 (yes, a pretty old version), removing the thread cap resulted in a sharp decline in throughput. I would expect to see the same if tokio-compat didn't fix it. The only way to know is to see if we can reproduce the decline on the 0.8.X branch before we introduced tokio-compat.

@binarylogic
Copy link
Contributor

Looks like this change fixed it 😄

➜ aws-vault exec vector -- ./bin/compare -s vector -t tcp_to_tcp_performance -c 16_cores -v nightly/2020-03-30 -v dev-thread-limit-lift-1-6305861 -v dev-v0-8-thread-limit-test-1-7963032 

                                   __   __  __
                                   \ \ / / / /
                                    \ V / / /
                                     \_/  \/

                                   V E C T O R

--------------------------------------------------------------------------------
Test Comparison
Test name: tcp_to_tcp_performance
Test configuration: 16_cores
Subject: vector
Versions: nightly/2020-03-30 dev-thread-limit-lift-1-6305861 dev-v0-8-thread-limit-test-1-7963032
--------------------------------------------------------------------------------
Metric          | dev-thread-limit-lift-1-630... | dev-v0-8-thread-limit-test-... | nightly/2020-03-30
----------------|--------------------------------|--------------------------------|-------------------
Test count      | 1                              | 1                              | 1                 
Duration (avg)  | 61s                            | 61s                            | 63s               
Duration (max)  | 61s                            | 61s                            | 63s               
CPU sys (max)   | 1.1 (+28%)                     | 11.6 (+1221%)                  | 0.9 W             
CPU usr (max)   | 23.3 (+7%)                     | 50.2 (+131%)                   | 21.7 W            
Load 1m (avg)   | 1.8 W                          | 4.7 (+159%)                    | 1.9 (+2%)         
Mem used (max)  | 381.5 MiB (+4%)                | 375 MiB (+2%)                  | 364.3 MiB W       
Disk read (avg) | 453.4 kib/s (-36%)             | 423 kib/s (-40%)               | 712.8 kib/s W     
Disk read (sum) | 27 MiB (+7%)                   | 25.2 MiB W                     | 43.9 MiB (+74%)   
Disk writ (sum) | 27.4 MiB (+33%)                | 29.3 MiB (+42%)                | 20.5 MiB W        
Net recv (avg)  | 61 MiB/s W                     | 8.5 MiB/s (-85%)               | 59 MiB/s (-3%)    
Net recv (sum)  | 3.6 gib W                      | 521.2 MiB (-85%)               | 3.6 gib (0%)      
Net send (sum)  | 3.6 gib                        | 519 MiB                        | 3.6 gib           
TCP estab (avg) | 424                            | 423                            | 451               
TCP syn (avg)   | 0                              | 0                              | 0                 
TCP close (avg) | 0                              | 0                              | 0                 
--------------------------------------------------------------------------------
W = winner

You can see my comparison branch here. Unless I'm missing something, this appears to be resolved. I'd still like to verify that this branch is actually using all CPU cores. Or I may be misunderstandingn how Vector utilizes cores with the new runtime.

@MOZGIII
Copy link
Contributor Author

MOZGIII commented Apr 2, 2020

🎉 Finally! Getting results from the test harness is so time-consuming, but we're getting there.

@MOZGIII MOZGIII marked this pull request as ready for review April 2, 2020 19:48
@MOZGIII MOZGIII requested a review from lukesteensen as a code owner April 2, 2020 19:48
@LucioFranco
Copy link
Contributor

@binarylogic so I doubt it will use all cores because we can only partition the work on multiple cores based on each connection being on a single task/core. The issue that we were most likely seeing before was the work stealing scheduler was causing a lot of contention trying to steal tasks because of the amount of idle workers. Aka if we have more workers than tasks that are being executed those idle workers will put a lot of contention on the global task queue. Which can drastically slow it down. So this aligns well with what we chatted with @jonhoo about a while back :) and why he originally wrote tokio-io-pool. So this overall looks like it aligns with what we expected. I think we also have done more work to spawn more tasks to handle the load better. So I wouldn't be surprised if we didn't use all the CPU but were in fact just saturating our current workload with better work that isn't just creating contention on a single item.

@MOZGIII
Copy link
Contributor Author

MOZGIII commented Apr 2, 2020

Note that 0.8 doesn't have the patch that lifts thread limit, so we effectively run at 4 threads.

@MOZGIII
Copy link
Contributor Author

MOZGIII commented Apr 2, 2020

Note that 0.8 doesn't have the patch that lifts thread limit, so we effectively run at 4 threads.

That said, the comparison, nonetheless, shows that this PR can be merged and the high threads count is not a problem anymore.

@binarylogic
Copy link
Contributor

Let's merge it then 🚀

@binarylogic binarylogic changed the title perf: Lift the thread limit perf: Lift the thread limit to enable full concurrency Apr 2, 2020
@binarylogic binarylogic merged commit cb0da84 into master Apr 2, 2020
@binarylogic binarylogic deleted the thread-limit-lift branch April 2, 2020 21:04
binarylogic pushed a commit that referenced this pull request Apr 5, 2020
After switching to tokio-compat, the thread limit should not
be required.

Signed-off-by: MOZGIII <[email protected]>
@binarylogic binarylogic added type: enhancement A value-adding code change that enhances its existing functionality. domain: performance Anything related to Vector's performance and removed type: performance labels Aug 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: networking Anything related to Vector's networking domain: performance Anything related to Vector's performance type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Verify that tokio 0.2 fixes concurrency bottleneck Resolve concurrency issues
5 participants