Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ASP.NET CPU utilization by adjusting minWorkerThreads #325

Merged
merged 12 commits into from
Jun 7, 2013
Merged

Improve ASP.NET CPU utilization by adjusting minWorkerThreads #325

merged 12 commits into from
Jun 7, 2013

Conversation

MalcolmEvershed
Copy link
Contributor

Increase the number of threads that the .NET thread pool grows by when a burst of new requests come in. This helps increase CPU utilization and should be good for a 5-10% performance boost.

This change only increases minWorkerThreads from 1 to 8 to be conservative in case there's any concern over excessive context switching from too many threads.

On Ubuntu Server, the hard limit is 4096, which is less than the
soft limit of 8192 that we're trying to use. Thus, to really use
a soft limit of 8192, we need to also boost the hard limit to
8192.
Tests expect to be able to start their own desired http server on
Apache's configured port.
… net.core.somaxconn=5000

When running tests that accessed MySQL, I was getting errors like:

BENCHMARKING Query ... [MySQL] 2013/05/23 00:57:04 packets.go:31: EOF
2013/05/23 00:57:04 Error scanning world row: driver: bad connection
exit status 1

Debugging this showed that the web server box had more connections to MySQL
than expected (i.e. netstat on the MySQL box showed 100 TCP connections, but
the web server box showed 1200). So there was some sort of TCP problem, perhaps
lost ACKs as described in http://www.evanjones.ca/tcp-stuck-connection-mystery.html .

Looking at the MySQL documentation for back_log (http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_back_log),
it suggests that the OS setting needs to be greater than or equal to back_log. Thus,
we need to set net.core.somaxconn as big as our back_log value.

I tried this and the connection problems went away.
Sometimes the ActivePerl HTTP server returns a 404 for a URL that should work,
so retry it until it works, but up to a max number of retries. And delay
a little in between.
Match the overall policy of the other framework benchmarks.

On my setup, this improves the requests/sec of a test like 'json' by 3-5%.
This removes:

X-AspNetMvc-Version: 4.0
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET

I didn't see any performance improvement, but this is probably a good idea
in case the extra bytes cause another network packet to be needed,
compared to other frameworks.
We don't need to access URLs like this because we use MVC and URL routing.

On my setup, this improves the performance of the 'json' test by 13%.
MinWorkerThreads controls how many threads the .NET thread pool creates
when a new burst of requests come in. Increase this to properly size the
thread pool early on and improve CPU utilization, increasing throughput.

I saw gains of 5-10% at least.
@kppullin
Copy link
Contributor

kppullin commented Jun 7, 2013

Thanks for the detailed response! Your approach to finding the '8' value sounds solid.

Also a great idea on the IHttpHandler approach. We could go even further and do self hosting, bypass IIS altogether, though that's more of a "platform" test than a "framework" test in my eyes.

Finding the optimal values for min/max is as much art as science, and depends heavily on the load & latency profile. Since these benchmark DB queries are so simple and fast, fewer threads likely makes more sense as there isn't a lot of blocking on IO calls . (On the other hand, at my workplace we're setting these higher since we spend a lot more time blocking on longer running "real world" DB queries and would otherwise have a mostly idle CPU with the default ASP.NET settings.)

Interesting to hear that the MaxThreads value is so high. Are you running with .NET 4.5 by chance (I may have read this limit was removed in 4.5)? All my thread testing so far has tagetted 4.0 and it does use the documented max 20 threads per CPU setting.

Sorry about erroneous blog ling - that's another one that I'm dealing with at work and forgot these tests does make outbound HTTP calls.

However there are some other perf counters that would show 'queued' incoming requests. Again there is a default cap on queued items (i.e. those waiting for a thread) and when exceeded HTTP 503's are returned. This could possibly explain the 5XX errors from test run #5. I saw similar errors when running the ServiceStack and Nancy tests but didn't yet dig into the specific failure(s).

Lastly regarding the mysql & postgresql drivers, that does sound like unnecessary overhead. However that may just be a micro-optimization with little benefit. If anyone ever gets the chance I think we should throw a proper profiler at this and confirm where the issues are (unless you've done this already ; ).

@MalcolmEvershed
Copy link
Contributor Author

Yeah, I was running with .NET 4.5. This suggests that the max number of threads "depends on several factors".

Yeah, it's good to watch the various queues to avoid 503s from HTTP.SYS or ASP.NET. So far I haven't seen any, I think because the defaults are so high. I'm actually pretty surprised to hear that you hit 503s with even lighter weight frameworks. I've only seen 503s with IIS 6 on unrelated projects.

I have profiled the tests with the Windows Performance Toolkit (that's how I decided what to try tweaking) and if I'm interpreting the data properly, the majority of (non-IIS/ASP.NET) time spent in the 'db' test is in the MySql provider. So I don't think this is just a micro-optimization, but is reality backed by real data. I tried commenting out the Ping and 'use database` code in the MySql driver (just as a non-usable experiment) and I got about a 30% performance increase (from about 1500 requests/sec to over 2000). So I think this is all for real, but I welcome corrections.

Thanks for the discussion. Let me know if you have further questions or if you'd like me to review any new pull requests.

P.S. I think I now know how to have the best performing framework/test: have the lowest amount of overhead per request (in other words, no high-overhead routing code like ASP.NET MVC) and then have a really good database driver (like I suspect the MySql Java provider is, which elevates Java-related frameworks in the rankings). I am undecided on whether it is also important to have non-blocking I/O with a small number of threads because the high-performance of Gemini: if Gemini doesn't use non-blocking I/O, then its high performance (compared to something like Go which has goroutines and non-blocking I/O) suggests that non-blocking I/O isn't necessary for high throughput.

@bhauer
Copy link
Contributor

bhauer commented Jun 7, 2013

@MalcolmEvershed @kppullin Thanks so much for all of your detailed work on this. We really appreciate your efforts in making the ASP.NET tests as "production-class" as possible. Not being well versed in ASP.NET, I feel I don't have much to offer the conversation except that appreciation.

Malcolm, you're right that the tests we have to-date put a lot of emphasis on fundamentals such as request routing, JSON serialization, database drivers, connection pooling, and ORM. The Fortunes test was designed to incorporate server-side collections, sorting, and templates as well. In time, we hope to continue to broaden the tests to cover more functionality areas provided by frameworks. Once you have time for a breather, I'd also welcome any ideas you have for future tests.

It's true that having high-performance routing and database drivers is crucial for these tests. Given the time to do so, I'd like to include Microsoft SQL Server tests since I would expect ASP.NET's SQL Server driver is higher-performance than the MySQL driver. (Although I will add that I've been surprised at how many similar assumptions I've made that this project has disproved.)

I can answer the question you posed about Gemini: Gemini is not using a non-blocking MySQL driver; it is using the standard Java MySQL driver with a pool of 256 connections paired with a custom micro ORM. The application server used (Resin) is using one worker thread per request.

@kppullin
Copy link
Contributor

kppullin commented Jun 7, 2013

@MalcolmEvershed I should have been a bit more clear - I only saw the 500 level results based on the benchmark results, which categorize them as 5XX. They may have been 503's, but also any other 500 error : /

The performance increase from removing the PING & use DB is amazing! Thanks for the WPT link (I'm more familiar with one of the other popular commercial profiling tools).

I wonder if there'd be any performance gain from importing a native Java MySQL or PostgreSQL driver via IKVM. Downside is it'd be a challenge to use with any higher level ORM because these don't implement the ADO.NET interfaces.

@bhauer I'd love to see some SQL Server tests as well, including a hybrid OS approach where the "app server" frameworks are running on Ubuntu/Linux and call into SQL Server DB (on Windows of course).

@MalcolmEvershed
Copy link
Contributor Author

@bhauer, just to clarify, I didn't mean to criticize the fact that the current tests have placed an emphasis on request routing, JSON serialization, database drivers, etc. I think the stuff you're testing makes sense. Brief comments on the tests:

  1. When I look at the ASP.NET profiler results for the JSON test, very little time is spent actually doing JSON serialization (in other words, finding a JSON serializer 10x faster would probably not change the benchmark results much). The test seems to be primarily testing how fast you can move an almost-no-op request through the entire framework. Thus, from my point of view, there is no big reason to add a plaintext test because I suspect that the performance will be about the same. My advice would be to have a plaintext test to benchmark how fast the framework can get a no-op request through its pipeline, then increase how much data the JSON test needs to serialize to make the JSON benchmark do more JSON work (in other words, really benchmark the JSON serializer).
  2. I like the Fortunes test, it seems pretty realistic.
  3. The Gemini Update test seems like it might do a batch update, whereas the other test implementations just do a loop. This feels a bit unfair, since I think real production code would implement a helper method to batch this up. Maybe this means that the other test implementations just need to do this.

I spent a little time prototyping SQL Server tests, but I just don't have the bandwidth to finish them up (what a mess: no sshd, I had to substitute WCAT for wrk, etc.). Do you want me to send you a pull request with my incomplete prototype and you or someone else can fix them up?

Here's the results: on my setup, for ASP.NET hitting MySql, I can get about 1500 requests/sec for the db test with concurrency=256 (this is with the variety of improvements in my pull requests). Changing the client box to Windows Server 2012 with SQL Server 2012 SP1 Standard Evaluation, I can get about 2200 requests/sec. I checked with a packet sniffer and the SQL Server provider has none of overhead that the MySql or PostgreSQL providers have. Though frankly, I expected more than 2200 requests/sec, but I didn't do any tuning. My guess is that if the page was an IHttpHandler (to avoid the ASP.NET MVC URL routing overhead), this number could probably crack 3000 requests/sec.

Thanks for the answers about Gemini. What this really tells me is that the results of this benchmark project so far are more a reflection of how much tuning has been done than the architecture of these frameworks. Now that you've explained that Gemini doesn't have magic like goroutines or non-blocking I/O or magic thread concurrency management (like .NET 4), I really think that the other frameworks' numbers can be improved quite a bit by doing some profiling and further tuning. Thus, to really make these benchmarks more realistic, we really need experts in the other stacks to whip out their profilers and get to work. :)

(I suppose maybe Resin has some magic that I'm not familiar with, but I don't know.)

@bhauer
Copy link
Contributor

bhauer commented Jun 7, 2013

@kppullin Hybrid tests would indeed be interesting. I've worked on an application or two that were deployed on similar configurations, so it's not unheard of to pair Linux and Windows.

@MalcolmEvershed Thanks for the additional feedback. Some more thoughts:

First, thanks for taking the time to run the tests through the profiler. It really helps to have profiler data available rather than conjecture. In other cases, I've seen concerns about oddly trivial things such as the random number generator come up. In those cases, I have struggled with how to respond politely but productively. Of course, the random number generator was not in fact a significant contributor, but when it was being discussed, there was no profiling data available.

In most cases, the JSON serializer is not a major contributor to execution profile. However, especially at the higher tiers of performance, we've seen serializer selection make a difference. For ASP.NET, as you point out, it's not the low-hanging fruit.

I personally agree that the plaintext test is potentially of limited interest, at least from the perspective of full-stack users. However, we've been asked to include a plaintext test several times by the maintainers of platforms as a means to illustrate the phenomenally fast request-processing rate they've achieved. From their point of view, the inclusion of a JSON serializer they had no part in developing is an unfair burden. :)

I like the idea of a JSON test with a larger workload. That is test 13 from issue #133 (additional test types). Note that the order of the future tests in that issue is not significant. It might be reasonable to make that the next addition. Related: sooner or later, I'll rework the results site so that a single test is viewed at a time, selected from a menu, rather than scrolling through dozens of charts. The page is fairly overwhelming already with 5 test types visible.

You're right that the Updates test may be unfair if the implementation for other frameworks is not ideal. That's not what we intended. The specifications say that use of batch updates is acceptable but not required. If batch updates are available on other platforms, I would expect to see them used. I just reviewed the Servlet code and you're right: that should be using a batch update and it's not. I'll loop @pfalls1 in here. I personally can tweak the Servlet test case quickly.

For the work-in-progress SQL Server tests, if your work is in your Github profile, that's good enough as is. We can link an issue over to that fork. If you find the time to finish that up, we'd love the PR. But we too won't likely find the bandwidth to take it to completion any time soon. If you cannot, I hope someone else with a good background with .NET can complete the work you've started. It's reassuring that SQL Server is speedier than MySQL. (I'm not a Windows developer, but I use a Windows workstation and I consider SQL Server my favorite RDBMS although I don't get to work with it much.)

Finally, you hit on a key thing at the end of your comment. When we started this project, we did not know how the community would react, but we've been very impressed and humbled by the contributions of framework maintainers and fans. I absolutely love seeing improvements come as a side-effect. When the Go 1.1 release saw its database driver re-worked via a bug that traced back here, we were thrilled. The Play guys have been working hard on performance since Round 1. If each framework gets a tiny bit faster as a result of fans tweaking things here, I'll be very happy. Bottom line, like I said before, I really appreciate experts like yourself doing exactly what you've done: profile and tune. With some luck, maybe something that can help other ASP.NET sites will be uncovered.

@MalcolmEvershed
Copy link
Contributor Author

Here's my work in progress for the SQL Server tests. The files are self-documenting with instructions and TODOs. What's really needed is wrk for Windows and a replacement for the way the benchmarker uses ssh to kick off wrk. Since wrk is based on redis, I suppose that parts of redis for Windows could be used to port wrk to Windows if necessary. I tried go-wrk, but it seemed to be CPU bound for some reason.

Good discussion. I look forward to the various improvements you described. Cool stuff!

@pdonald
Copy link
Contributor

pdonald commented Jun 7, 2013

I think you are over complicating things @MalcolmEvershed. Amazon EC2 already has a Windows Server 2012 R2 image with Microsoft SQL Server 2012 installed on it, no need to download and install several GBs. There's no need to port client software to Windows, use the Ubuntu client box for running tests and simply point to the Windows instance with the SQL Server on it for ASP.NET SQL Server tests.

Then it's just configuring SQL Server to accept remote connections, configuring firewall, running DB scripts and that's pretty much it, no?

@MalcolmEvershed
Copy link
Contributor Author

@pdonald, I think you're right. I don't use EC2 and I don't have enough machines at home to do as you suggest, but what you're suggesting makes more sense. Please feel free to implement it that way and then I can delete my branch. Thanks.

@pfalls1 pfalls1 merged commit 938a92b into TechEmpower:master Jun 7, 2013
@bhauer
Copy link
Contributor

bhauer commented Jun 7, 2013

@MalcolmEvershed Just a quick follow-up on the batch updates: We've modified the Servlet test to use a batch update and observed no change in its performance. So it is interesting to note that at least for Servlet, using raw JDBC on Java, running a bundle of small updates, the use of batch operations is not significant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants