Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract cache key creation to ICacheKey #206

Merged

Conversation

ronaldbarendse
Copy link
Contributor

@ronaldbarendse ronaldbarendse commented Jan 27, 2022

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

This implements the feature discussed in #145 and is related to PR #147 that uses a setting to ignore the hostname.

To summarize: both IRequestParser and IImageProvider can return different source/processed images based on whatever they extract from the HttpContext, but the ImageSharpMiddleware always caches the image based on the lowercased request host, path and parsed commands.

This leads to creating cache keys that are either:

  1. Too unique, e.g. when you have multiple hostnames configured (even if it's just the www. prefix and domain root). This will generate and cache 2 images for example.com/imagesharp-logo.png and www.example.com/imagesharp-logo.png, which is a waste of processing (CPU/memory) and disk resources (there's also no cache trimming, so these won't get automatically cleaned up!).
  2. Not unique enough, e.g. when you use request headers, user authentication or even a query string (as only known commands are used in the cache key) to return a different source image.

The first commit in this PR simply moves the existing code to the default CacheKey implementation. The second one adds 4 alternative implementations that either use the absolute or relative URI and optionally lowercases that value. Maybe it's better to move these out to a test project or the documentation, but I wanted to share how easy implementing this new interface can be!

@codecov
Copy link

codecov bot commented Jan 27, 2022

Codecov Report

Merging #206 (eee0259) into master (41e0645) will increase coverage by 0%.
The diff coverage is 86%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master   #206   +/-   ##
=====================================
  Coverage      86%    86%           
=====================================
  Files          62     68    +6     
  Lines        1742   1791   +49     
  Branches      257    262    +5     
=====================================
+ Hits         1500   1556   +56     
+ Misses        176    167    -9     
- Partials       66     68    +2     
Flag Coverage Δ
unittests 86% <86%> (+<1%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/ImageSharp.Web/Caching/CacheKeyHelper.cs 76% <76%> (ø)
src/ImageSharp.Web/Caching/LegacyV1CacheKey.cs 100% <100%> (ø)
src/ImageSharp.Web/Caching/UriAbsoluteCacheKey.cs 100% <100%> (ø)
...p.Web/Caching/UriAbsoluteLowerInvariantCacheKey.cs 100% <100%> (ø)
src/ImageSharp.Web/Caching/UriRelativeCacheKey.cs 100% <100%> (ø)
...p.Web/Caching/UriRelativeLowerInvariantCacheKey.cs 100% <100%> (ø)
...DependencyInjection/ImageSharpBuilderExtensions.cs 94% <100%> (+14%) ⬆️
...DependencyInjection/ServiceCollectionExtensions.cs 100% <100%> (ø)
.../ImageSharp.Web/Middleware/ImageSharpMiddleware.cs 85% <100%> (+<1%) ⬆️
...ching/LruCache/ConcurrentTLruCache{TKey,TValue}.cs 55% <0%> (+2%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 41e0645...eee0259. Read the comment docs.

/// <summary>
/// Creates a cache key based on the lowercased request host, path and commands.
/// </summary>
public class CacheKey : ICacheKey
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only ship with the two implementations as described in #147 (comment) although I like your naming convention better.

  • UriAbsoluteCacheKey
  • UriRelativeCacheKey

I can't see a scenario where we would want a case sensitive cache key e.g UriAbsoluteCacheKey as that opens us up to potential DOS so we should always convert the string to lower invariant.

From a performance perspective we should use string.Create in conjunction with Span<char>.ToLowerInvarient() to ensure that string creation is snappy. Urihelper has optimized code you can use as a reference but we'd have to change a few things to do the required casing work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the OS (or file system configuration: https://docs.microsoft.com/en-us/windows/wsl/case-sensitivity#modify-case-sensitivity), the path can be case sensitive, which means different casing can resolve to different files and also that wrong casing wouldn't find the original image in the first place... So the cache hash should only be 'normalized' when the file system is case insensitive. I'm not sure if this can be easily determined at runtime, so adding a setting for this would be sensible.

The parsed commands are already case insensitive (or at least the keys), because the default implementation uses an InvariantIgnoreCase comparer (and gets them from the query string, which isn't case sensitive according to the RFCs). Although it ultimately depends on the processor, command values should also be case insensitive (enum parsing already ignores casing). It would be good to ensure this value is always lowercased, so changes in the command casing won't cause different cache hashes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Case sensitivity of file requests is dealt with via individual provider implementations. They don’t use the key to return the source image, they use the input uri. If nothing is found, nothing is cached.

Rethinking about this then you're correct but the implementations should be non inheriting since we don't want a virtual method as a default implementation on a fast path. We're already paying a virtualization cost via interfaces.

So, as you say, to cover our backs, your implementations would be:

  • UriAbsoluteCacheKey
  • UriRelativeCacheKey
  • UriAbsoluteCaseInsensitiveCacheKey
  • UriRelativeCaseInsensitiveCacheKey

You're gonna want custom implementations of the UriHelper methods for handling the concatenation + casing to avoid duplicate string generation for all scenarios since the caching of commands should, as you've rightly identified, be case insensitive. A processor not using the case insensitive CommandCollection for command key matching would be considered a bug in that implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly my point: the IImageProvider implementations correctly deal with the case sensitivity, but the cache key doesn't. So if you have multiple files that only differ in casing, it will only cache 1 generated image (with the metadata next to it), depending on which was requested first, the TLRU cache, file write times and other weird factors:

Different cased files

You can test this yourself by running the sample site in WSL and creating the copies using cp imagesharp-logo.png ImageSharp-logo.png.

Having a setting to disable lowercasing the cache key would at least make it easy to switch (if you're running into this edge case), but also ensure custom implementations of ICacheKey are exposed to this setting if it's passed into the Create method. And it requires less classes/implementations 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't add an option for this for two reasons.

  1. Each implementation would have to respect that boolean option (complexity).
  2. Each implementation would have to parse that boolean per request (performance).

Making the case sensitivity a detail of the registered implementation provides us with the most flexibility. Registering an ICacheKey implementation is no more difficult than setting an option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole point for adding the bool caseSensitive parameter is to make sure the implementations have to respect that setting (or at least are directly exposed to it).

The same can be said for the uint length parameter currently on ICacheHash.Create: all implementations need to ensure they generate a hash of the specified length. Not having a global setting might result in one implementation using a setting to be able to configure it and another always returning a fixed length hash (or just as worse: every implementation using their own setting).

Another very important thing about the CacheKeyCaseSensitive setting is that the need for enabling it depends on the OS/platform you're running you application on. You don't want to edit code/create separate builds to make your application work correctly for as case sensitive and insensitive file system 😖

Copy link
Contributor Author

@ronaldbarendse ronaldbarendse Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the setting is really a no-go, I'd suggest to only provide the case insensitive relative (default) and absolute implementations.

I can imagine that the above conflicting file casing issues probably don't occur often enough (if at all) to justify all the extra complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can smash out the two other implementations for you if you like. They wouldn’t take me long

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd very much like that, as I'm not yet familiar how to best work with string.Create and Span<char> 😉

I did create some additional tests and benchmarks, so it should be easy for you to validate the implementations 🎉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the previous cache key creation code as baseline benchmark and the UriAbsoluteCacheKey implementation is still faster and allocates less (even though that also includes the request scheme, thus creating a longer cache key):

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
Intel Core i7-10750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.101
[Host] : .NET Core 3.1.22 (CoreCLR 4.700.21.56803, CoreFX 4.700.21.57101), X64 RyuJIT
DefaultJob : .NET Core 3.1.22 (CoreCLR 4.700.21.56803, CoreFX 4.700.21.57101), X64 RyuJIT
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Allocated |
|-------------------- |---------:|--------:|--------:|------:|--------:|-------:|----------:|
| Baseline | 497.4 ns | 8.26 ns | 7.33 ns | 1.00 | 0.00 | 0.1106 | 696 B |
| UriRelativeCacheKey | 270.9 ns | 5.47 ns | 7.11 ns | 0.55 | 0.02 | 0.0587 | 368 B |
| UriAbsoluteCacheKey | 447.4 ns | 3.38 ns | 3.16 ns | 0.90 | 0.01 | 0.0939 | 592 B |

I'm sure you're able to make it even more performant by using string.Create/Span<char> directly, but the UriHelper methods are at least already outperforming the previous StringBuilder implementation 👍🏻

| UriAbsoluteCacheKey | 490.3 ns | 5.14 ns | 4.80 ns | 0.74 | 0.02 | 0.0763 | 320 B |
| UriAbsoluteCaseInsensitiveCacheKey | 475.4 ns | 4.18 ns | 3.71 ns | 0.71 | 0.02 | 0.0763 | 320 B |
*/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ronaldbarendse I added the new benchmarks here. Allocations are lower in the new version so would be interested to see what the numbers are on your more powerful computer.

For reference, here's your previous numbers.

Method Mean Error StdDev Ratio RatioSD Gen 0 Allocated
Baseline 497.4 ns 8.26 ns 7.33 ns 1.00 0.00 0.1106 696 B
UriRelativeCacheKey 270.9 ns 5.47 ns 7.11 ns 0.55 0.02 0.0587 368 B
UriAbsoluteCacheKey 447.4 ns 3.38 ns 3.16 ns 0.90 0.01 0.0939 592 B

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the previous implementations always lowercased the key, the speed is comparable for the relative and a bit faster for the absolute URI, but it indeed allocates less (although that's also partially expected, because the new ones don't include the scheme) 🎉

Method Mean Error StdDev Ratio RatioSD Gen 0 Allocated
LegacyV1CacheKey 504.2 ns 3.22 ns 3.02 ns 1.00 0.00 0.1106 696 B
UriRelativeCacheKey 291.8 ns 5.83 ns 7.17 ns 0.58 0.02 0.0467 296 B
UriRelativeCaseInsensitiveCacheKey 277.5 ns 1.58 ns 1.32 ns 0.55 0.00 0.0467 296 B
UriAbsoluteCacheKey 371.6 ns 4.07 ns 3.81 ns 0.74 0.01 0.0505 320 B
UriAbsoluteCaseInsensitiveCacheKey 372.7 ns 2.91 ns 2.72 ns 0.74 0.01 0.0505 320 B

@JimBobSquarePants
Copy link
Member

@ronaldbarendse

I've written my optimized keygen code and added a legacy version to allow backwards compatibility (turns out the legacy was adding an extra slash) think this is good to go. Thanks for your help here, this is a really great addition to the library!

I'll let you have a final read through before merging as I want to make sure I haven't missed anything.

Copy link
Contributor Author

@ronaldbarendse ronaldbarendse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some comments: the main thing to currently change, would be the default implementation (that should be the lowercased cache key).

Otherwise, the only remaining thing might be naming: I think we should rename UriRelativeCaseInsensitiveCacheKey to UriRelativeLowerInvariantCacheKey, as that's what it actually does (case insensitive would be more appropriate when comparing, right?).

src/ImageSharp.Web/Caching/CacheKeyHelper.cs Show resolved Hide resolved
| UriAbsoluteCacheKey | 490.3 ns | 5.14 ns | 4.80 ns | 0.74 | 0.02 | 0.0763 | 320 B |
| UriAbsoluteCaseInsensitiveCacheKey | 475.4 ns | 4.18 ns | 3.71 ns | 0.71 | 0.02 | 0.0763 | 320 B |
*/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the previous implementations always lowercased the key, the speed is comparable for the relative and a bit faster for the absolute URI, but it indeed allocates less (although that's also partially expected, because the new ones don't include the scheme) 🎉

Method Mean Error StdDev Ratio RatioSD Gen 0 Allocated
LegacyV1CacheKey 504.2 ns 3.22 ns 3.02 ns 1.00 0.00 0.1106 696 B
UriRelativeCacheKey 291.8 ns 5.83 ns 7.17 ns 0.58 0.02 0.0467 296 B
UriRelativeCaseInsensitiveCacheKey 277.5 ns 1.58 ns 1.32 ns 0.55 0.00 0.0467 296 B
UriAbsoluteCacheKey 371.6 ns 4.07 ns 3.81 ns 0.74 0.01 0.0505 320 B
UriAbsoluteCaseInsensitiveCacheKey 372.7 ns 2.91 ns 2.72 ns 0.74 0.01 0.0505 320 B

src/ImageSharp.Web/Caching/UriAbsoluteCacheKey.cs Outdated Show resolved Hide resolved
@JimBobSquarePants
Copy link
Member

Otherwise, the only remaining thing might be naming: I think we should rename UriRelativeCaseInsensitiveCacheKey to UriRelativeLowerInvariantCacheKey, as that's what it actually does (case insensitive would be more appropriate when comparing, right?).

Naming is hard... I chose CaseInsensitive since it described the operational behavior of the key rather than the means that it uses. I'm still torn between the two but leaning towards your convention.

@JimBobSquarePants
Copy link
Member

I'll fix the build in a mo.

@JimBobSquarePants
Copy link
Member

OK.... This time we're done.

@JimBobSquarePants JimBobSquarePants merged commit fe0b4e9 into SixLabors:master Feb 1, 2022
@JimBobSquarePants JimBobSquarePants added this to the v2.0.0 milestone Feb 1, 2022
@ronaldbarendse ronaldbarendse deleted the feature/cachekey-abstraction branch February 1, 2022 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants