Restore ability to use null cache #194

kroymann · 2021-12-10T23:35:02Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

Per the discussion in #193, this PR restores the ability to effectively disable the caching logic through the use of a "NullCache". The most important part of this change is restoring the ability to stream the response directly from the processed image output, which is how it worked before v1.0.3. In v1.0.3, the WriterWorkers pattern was introduced, and a side effect of that change was that the response was now always streamed from the cache, which broke the ability to use a NullCache. To address this, I removed the ReadWorkers/WriterWorkers stuff and replaced it with a more standard reader/writer locking pattern using a synchronization library ported from my company's codebase. This made it so that the processed image stream could remain available beyond the section of code protected inside the writer lock, and thus be available for use in generating the response.

A key sticking point here (obviously) is benchmark testing this to see if performance was improved or degraded by this change.

… the cache

…orker could have processed the request while we were waiting for the lock, so recheck the cache

…s always called from outside the lock.

…ng "var" usages per ImageSharp style guidelines

CLAassistant · 2021-12-10T23:35:08Z

All committers have signed the CLA.

codecov · 2021-12-10T23:42:06Z

Codecov Report

Merging #194 (0a9a0fb) into master (5cb589c) will decrease coverage by 0.20%.
The diff coverage is 89.86%.

@@            Coverage Diff             @@
##           master     #194      +/-   ##
==========================================
- Coverage   84.80%   84.60%   -0.21%     
==========================================
  Files          50       55       +5     
  Lines        1448     1539      +91     
  Branches      199      228      +29     
==========================================
+ Hits         1228     1302      +74     
- Misses        165      181      +16     
- Partials       55       56       +1

Flag	Coverage Δ
unittests	`84.60% <89.86%> (-0.21%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
.../Synchronization/RefCountedConcurrentDictionary.cs	`76.00% <76.00%> (ø)`
src/ImageSharp.Web/Synchronization/AsyncKeyLock.cs	`83.33% <83.33%> (ø)`
.../ImageSharp.Web/Middleware/ImageSharpMiddleware.cs	`84.13% <89.36%> (-2.63%)`	⬇️
src/ImageSharp.Web/Synchronization/AsyncLock.cs	`95.23% <95.23%> (ø)`
...Sharp.Web/Synchronization/AsyncReaderWriterLock.cs	`98.48% <98.48%> (ø)`
...DependencyInjection/ServiceCollectionExtensions.cs	`100.00% <100.00%> (ø)`
...rp.Web/Synchronization/AsyncKeyReaderWriterLock.cs	`100.00% <100.00%> (ø)`
...p.Web/Middleware/ConcurrentDictionaryExtensions.cs	`0.00% <0.00%> (-50.00%)`	⬇️
...ching/LruCache/ConcurrentTLruCache{TKey,TValue}.cs	`43.75% <0.00%> (+1.78%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5cb589c...0a9a0fb. Read the comment docs.

JimBobSquarePants · 2021-12-14T06:33:34Z

@kroymann I'm struggling to find the time to benchmark this so if you have the opportunity please do.

deanmarcussen · 2021-12-14T10:19:03Z

I had a quick look this morning.

The results are both very promising, and suspicously close.

It either indicates that the LRU cache is blocking any meaningful test of this piece of functionality (I don't think so, but need to have a longer look at code to say for sure), or both locking arrangements are having no meaningful impact on performance, and we've at the limit of what we can serve (i.e. limited by disk speed, or the framework itself).

I'm tending to the second, as the most meaningful change I could make to impact rps was to set the samples project logging to "Warning" instead of "Debug". (basically drop all the expensive logging)

Need to find some time to look at the code changes, and do another run with the samples project updated to .NET 6.

JimBobSquarePants · 2021-12-14T10:54:29Z

Thanks @deanmarcussen

I think it might be an idea to try both this and master with a much reduced timespan for LRU cache. It's sitting at 5 minutes just now which will be interfering with results. It would be amazing if you could document the setup you use to run the benchmark as I had a go at reading the Crank docs again and felt a bit overwhelmed.

A .NET 6 run would definitely be a good idea.

sebastienros · 2021-12-14T17:58:15Z

src/ImageSharp.Web/Synchronization/RefCountedConcurrentDictionary.cs

+                this.RefCount = refCount;
+            }
+
+            public bool Equals(


I am getting error CS8767: Nullability of reference types in type of parameter 'other' of 'bool RefCountedValue.Equals(RefCountedValue other)' doesn't match implicitly implemented member 'bool IEquatable<RefCountedValue>.Equals(RefCountedValue other)' (possibly because of nullability attributes) when running the benchmarks

Weird. This compiles cleanly in both netcoreapp2.1 and 3.1 for me, and I'm able to run the benchmarks in both TFMs as well....? @sebastienros Anything special about how you're trying to run the benchmarks?

To be more precise, using crank, doing a web load benchmark independent from BDN. It works fine on master though. Full stack:

Command: dotnet publish ImageSharp.Web.Sample.csproj -c Release -o /tmp/benchmarks-agent/benchmarks-server-1/2ns2okc1.ov0/ImageSharp.Web/samples/ImageSharp.Web.Sample/published /p:MicrosoftNETCoreAppPackageVersion=3.1.21 /p:MicrosoftAspNetCoreAppPackageVersion=3.1.21 /p:MicrosoftNETCoreApp31PackageVersion=3.1.21 /p:MicrosoftNETPlatformLibrary=Microsoft.NETCore.App /p:RestoreNoCache=true --framework netcoreapp3.1 --self-contained -r linux-x64 Microsoft (R) Build Engine version 16.7.2+b60ddb6f4 for .NET Copyright (C) Microsoft Corporation. All rights reserved. Determining projects to restore... Restored /tmp/benchmarks-agent/benchmarks-server-1/2ns2okc1.ov0/ImageSharp.Web/src/ImageSharp.Web/ImageSharp.Web.csproj (in 297 ms). Restored /tmp/benchmarks-agent/benchmarks-server-1/2ns2okc1.ov0/ImageSharp.Web/samples/ImageSharp.Web.Sample/ImageSharp.Web.Sample.csproj (in 297 ms). Synchronization/RefCountedConcurrentDictionary.cs(229,25): error CS8767: Nullability of reference types in type of parameter 'other' of 'bool RefCountedValue.Equals(RefCountedValue other)' doesn't match implicitly implemented member 'bool IEquatable<RefCountedValue>.Equals(RefCountedValue other)' (possibly because of nullability attributes). [/tmp/benchmarks-agent/benchmarks-server-1/2ns2okc1.ov0/ImageSharp.Web/src/ImageSharp.Web/ImageSharp.Web.csproj] Exit code: 1

I just pushed a small change to this code that adjusts the nullability attributes when compiling with net5.0 or higher. This is preemptively getting ahead of any update to this codebase to target net6.0, and maybe it will address whatever issue you hit running benchmarks?

With your changes it builds on net6.0. At least it's unblocking me.

…et5.0 and above.

sebastienros · 2021-12-14T18:54:51Z

src/ImageSharp.Web/Synchronization/RefCountedConcurrentDictionary.cs

+        /// <summary>
+        /// Simple immutable tuple that combines a <typeparamref name="TValue"/> instance with a ref count integer.
+        /// </summary>
+        private class RefCountedValue : IEquatable<RefCountedValue>


Can you use struct records here instead ?

I benchmarked this using both class and struct for this type and determined that using a class executes more quickly and allocates less memory. I believe this happens because ConcurrentDictionary can use an optimized code path that leverages atomic writes when TValue is a class, but has to fall back on a less efficient path that allocates when TValue is a struct.

| Method | Mean | Error | StdDev | Gen 0 | Allocated | |-------------------------------- |----------:|---------:|---------:|-------:|----------:| | Class_GetAndReleaseNewKey | 129.92 ns | 0.554 ns | 0.518 ns | 0.0095 | 80 B | | Struct_GetAndReleaseNewKey | 147.59 ns | 0.687 ns | 0.573 ns | 0.0067 | 56 B | | Class_GetAndReleaseExistingKey | 142.32 ns | 1.159 ns | 1.027 ns | 0.0076 | 64 B | | Struct_GetAndReleaseExistingKey | 177.69 ns | 0.813 ns | 0.721 ns | 0.0134 | 112 B | | Class_GetExistingKey | 69.75 ns | 0.301 ns | 0.267 ns | 0.0038 | 32 B | | Struct_GetExistingKey | 89.81 ns | 0.682 ns | 0.638 ns | 0.0067 | 56 B |

I have not yet experimented with using record types (in part because this codebase is still targeting netcoreapp2.1 and 3.1).

Let's leave changing target frameworks to V2. I don't want your hard work delayed by our upstream work.

If not a struct record, tuples would still be good while keeping the code simpler. But I don't know about tfm requirements either, so you'll decide.

sebastienros · 2021-12-14T19:05:18Z

With 10 concurrent clients, 15s warmup and 15s measurement, on a resized url but without any cache header (etag, ...) so all request return the file.

| application                             | master      | enable_null_cache |          |
| --------------------------------------- | ----------- | ----------------- | -------- |
| CPU Usage (%)                           |          84 |                86 |   +2.38% |
| Cores usage (%)                         |       1,011 |             1,027 |   +1.58% |
| Working Set (MB)                        |         212 |               215 |   +1.42% |
| Private Memory (MB)                     |         719 |               718 |   -0.14% |
| Build Time (ms)                         |      10,384 |             4,225 |  -59.31% |
| Start Time (ms)                         |         221 |               235 |   +6.33% |
| Published Size (KB)                     |      93,043 |            93,067 |   +0.03% |
| .NET Core SDK Version                   |     6.0.101 |           6.0.101 |          |
| Max CPU Usage (%)                       |          84 |                85 |   +1.19% |
| Max Working Set (MB)                    |         221 |               224 |   +1.36% |
| Max GC Heap Size (MB)                   |         133 |               131 |   -1.50% |
| Size of committed memory by the GC (MB) |         149 |               150 |   +0.67% |
| Max Number of Gen 0 GCs / sec           |        3.00 |              3.00 |    0.00% |
| Max Number of Gen 1 GCs / sec           |        1.00 |              1.00 |    0.00% |
| Max Number of Gen 2 GCs / sec           |        0.00 |              0.00 |          |
| Max Time in GC (%)                      |        0.00 |              0.00 |          |
| Max Gen 0 Size (B)                      |  26,989,312 |        18,437,472 |  -31.69% |
| Max Gen 1 Size (B)                      |  16,276,192 |        17,172,176 |   +5.50% |
| Max Gen 2 Size (B)                      |   1,767,440 |         1,761,712 |   -0.32% |
| Max LOH Size (B)                        |   2,577,912 |         2,577,912 |    0.00% |
| Max Allocation Rate (B/sec)             | 273,655,584 |       264,959,240 |   -3.18% |
| Max GC Heap Fragmentation               |          42 |                29 |  -31.43% |
| # of Assemblies Loaded                  |          97 |                97 |    0.00% |
| Max Exceptions (#/s)                    |           0 |                 0 |          |
| Max Lock Contention (#/s)               |          20 |                78 | +290.00% |
| Max ThreadPool Threads Count            |          23 |                22 |   -4.35% |
| Max ThreadPool Queue Length             |           0 |                 1 |      +∞% |
| Max ThreadPool Items (#/s)              |     150,078 |           145,980 |   -2.73% |
| Max Active Timers                       |           1 |                 1 |    0.00% |
| IL Jitted (B)                           |     302,807 |           312,448 |   +3.18% |
| Methods Jitted                          |       3,634 |             3,715 |   +2.23% |


| load                | master  | enable_null_cache |         |
| ------------------- | ------- | ----------------- | ------- |
| CPU Usage (%)       |      16 |                16 |   0.00% |
| Cores usage (%)     |     187 |               190 |  +1.60% |
| Working Set (MB)    |      41 |                41 |   0.00% |
| Private Memory (MB) |     110 |               110 |   0.00% |
| Start Time (ms)     |     112 |               110 |  -1.79% |
| First Request (ms)  |     388 |               327 | -15.72% |
| Requests            | 382,623 |           377,625 |  -1.31% |
| Bad responses       |       0 |                 0 |         |
| Mean latency (us)   |     388 |               393 |  +1.33% |
| Max latency (us)    |   9,234 |             7,155 | -22.51% |
| Requests/sec        |  25,511 |            25,176 |  -1.31% |
| Requests/sec (max)  |  28,741 |            29,711 |  +3.38% |

JimBobSquarePants · 2021-12-15T04:44:46Z

@sebastienros Thanks for the numbers. I can see a bit of give and take but otherwise fairly even. For the additional feature I think the change is worth it.

JimBobSquarePants

Let's get this merged in. Perf is comparable and we can always iterate further.

kroymann added 10 commits December 8, 2021 23:00

Port synchronization library from RecRoom codebase

5f5f7bd

Use AsyncKeyReaderWriterLock instead of ReadWorker/WriterWorker

d0e2408

Send response directly from output stream rather than re-reading from…

4a37d3f

… the cache

If the writer lock isn't immediately available, then assume another w…

f804401

…orker could have processed the request while we were waiting for the lock, so recheck the cache

Refactor ProcessRequestAsync() slightly so that SendResponseAsync() i…

583b063

…s always called from outside the lock.

Port unit tests for synchronization library from RecRoom codebase

a6333af

Cleanup some overly verbose types that VisualStudio used when replaci…

b9dabaf

…ng "var" usages per ImageSharp style guidelines

Add some basic benchmark tests to measure synchronization primitives

198ddd3

Make AsyncLock code return IDisposables

9b539a3

Fix accidental indentation change

1747082

Add more unit-tests

e86f251

sebastienros reviewed Dec 14, 2021

View reviewed changes

Update RefCountedValue to support proper nullability attributes for n…

0a9a0fb

…et5.0 and above.

sebastienros reviewed Dec 14, 2021

View reviewed changes

JimBobSquarePants added the api label Dec 15, 2021

JimBobSquarePants approved these changes Dec 20, 2021

View reviewed changes

JimBobSquarePants merged commit fea0207 into SixLabors:master Dec 20, 2021

deanmarcussen mentioned this pull request Jan 13, 2022

Deleted unused ConcurrentDictionaryExtensions #199

Merged

4 tasks

kroymann deleted the enable_null_cache branch February 7, 2022 22:50

JimBobSquarePants mentioned this pull request Dec 20, 2022

Fixed build status shield #294

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore ability to use null cache #194

Restore ability to use null cache #194

kroymann commented Dec 10, 2021

CLAassistant commented Dec 10, 2021 •

edited

Loading

codecov bot commented Dec 10, 2021 •

edited

Loading

JimBobSquarePants commented Dec 14, 2021

deanmarcussen commented Dec 14, 2021

JimBobSquarePants commented Dec 14, 2021

sebastienros Dec 14, 2021

kroymann Dec 14, 2021

sebastienros Dec 14, 2021

kroymann Dec 14, 2021

sebastienros Dec 14, 2021

sebastienros Dec 14, 2021

kroymann Dec 14, 2021

JimBobSquarePants Dec 15, 2021

sebastienros Dec 15, 2021

sebastienros commented Dec 14, 2021

JimBobSquarePants commented Dec 15, 2021

JimBobSquarePants left a comment

Restore ability to use null cache #194

Restore ability to use null cache #194

Conversation

kroymann commented Dec 10, 2021

Prerequisites

Description

CLAassistant commented Dec 10, 2021 • edited Loading

codecov bot commented Dec 10, 2021 • edited Loading

Codecov Report

JimBobSquarePants commented Dec 14, 2021

deanmarcussen commented Dec 14, 2021

JimBobSquarePants commented Dec 14, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebastienros commented Dec 14, 2021

JimBobSquarePants commented Dec 15, 2021

JimBobSquarePants left a comment

Choose a reason for hiding this comment

CLAassistant commented Dec 10, 2021 •

edited

Loading

codecov bot commented Dec 10, 2021 •

edited

Loading