Optimize reflection of F# types #9714

kerams · 2020-07-18T08:07:21Z

While compiling expression trees to Funcs for faster execution is probably an overkill for one-off reflection functions, I reckon this extra step is worth it for their precomputed counterparts, which tend to be used in performance-critical scenarios such as serialization.

If there's interest, I can similarly improve other PreCompute functions.

dnfadmin · 2020-07-18T08:07:35Z

All CLA requirements met.

abelbraaksma · 2020-07-18T09:48:09Z

Are you sure Compile method is available in .NET Core? I don't see it mentioned here: https://docs.microsoft.com/en-us/dotnet/api/system.data.objects.compiledquery.compile, at the bottom it only lists .NET Framework.

Oh wait, I think you're using this: https://docs.microsoft.com/en-us/dotnet/api/system.linq.expressions.expression-1.compile?view=netcore-3.1

abelbraaksma · 2020-07-18T10:00:45Z

This will greatly improve repeated calls, but I wonder where the threshold is, because compiling is quite expensive. I mean, is it beneficial from 5 calls up, or 500 calls?

I think this is a great improvement, but we should probably know how big the performance improvement is, and where it starts to improve. Did you test with BDN?

Would it be possible to have the best of both, for instance by calling it the old way, and lazily compile in a different thread (no idea this is even feasible). Once compilation is done, subsequent calls come from the compiled one.

kerams · 2020-07-18T10:34:18Z

I haven't benchmarked it in isolation, but I've seen a nice improvement in Fable.Remoting thanks to this approach. Let me put something together.

Would it be possible to have the best of both, for instance by calling it the old way, and lazily compile in a different thread (no idea this is even feasible). Once compilation is done, subsequent calls come from the compiled one.

Oh, it should be possible by introducing some kind of a cache for field readers (and more, unfortunately, separate caches for the rest of the PreCompute methods). However, it seems like a lot of trouble and I'm not convinced it's worth the effort. I'd expect the caller to use the function A LOT more than just 500 times.

Say the threshold where precomputing pays off with this change (haven't measured this) is 10000 invocations, but you only need 1000. You'll get a performance penalty now, but if that's a huge problem, you can always switch to GetRecordFields, which does record field lookup every time. The time spent on PropertyInfo lookup using reflection (1000 * record field count) times will be negligible in my opinion (unless you for some reason need to make those 1000 calls over and over again in a new process).

What I'm not sure about are the memory consumption implications. How much space does a compiled expression the size of the one in compilePropGetterFunc take? They don't ever get GCed either, right?

The obvious solution to all of these concerns is having new methods, but then you won't get a performance boost (or, indeed, penalty in very specific cases) just by updating.

kerams · 2020-07-18T11:08:14Z

type Record = {
    A: int
    B: int
    C: string
    D: string
    E: unit }

let compileRecordReaderFunc (recordType: Type) =
    let param = Expression.Parameter (typeof<obj>, "param")
    let typedParam = Expression.Variable recordType

    let expr =
        Expression.Lambda<Func<obj, obj[]>> (
            Expression.Block (
                [ typedParam ],
                Expression.Assign (typedParam, Expression.Convert (param, recordType)),
                Expression.NewArrayInit (typeof<obj>, [
                    for prop in typeof<Record>.GetProperties (BindingFlags.Instance ||| BindingFlags.Public) ->
                        Expression.Convert (Expression.Property (typedParam, prop), typeof<obj>) :> Expression
                ])
            ),
            param)
    expr.Compile ()

let compileRecordReaderFuncWithBuffer (recordType: Type) =
    let param = Expression.Parameter (typeof<obj>, "param")
    let typedParam = Expression.Variable recordType
    let buffer = Expression.Parameter typeof<obj[]>
    let props = typeof<Record>.GetProperties (BindingFlags.Instance ||| BindingFlags.Public)

    let expr =
        Expression.Lambda<Func<obj, obj[], int>> (
            Expression.Block (
                [ typedParam ],
                [
                    Expression.Assign (typedParam, Expression.Convert (param, recordType)) :> Expression

                    for i, prop in typeof<Record>.GetProperties (BindingFlags.Instance ||| BindingFlags.Public) |> Array.indexed do
                        let arrayAtIndex = Expression.ArrayAccess (buffer, Expression.Constant (i, typeof<int>))
                        Expression.Assign (arrayAtIndex, Expression.Convert (Expression.Property (typedParam, prop), typeof<obj>)) :> Expression

                    Expression.Constant (props.Length, typeof<int>) :> Expression
                ]
            ),
            [ param; buffer ])
    expr.Compile ()

[<MemoryDiagnoser>]
type Test () =
    let before = FSharpValue.PreComputeRecordReader typeof<Record>
    let after = compileRecordReaderFunc typeof<Record>
    let afterWithBuffer = compileRecordReaderFuncWithBuffer typeof<Record>
    let buffer = Array.zeroCreate 100

    let record = { A = 1; B = 2; C = "3"; D = "4"; E = () }

    [<Benchmark(Baseline = true)>]
    member _.Before () =
        for i in 1 .. 1000 do
            before record |> ignore

    [<Benchmark>]
    member _.After () =
        for i in 1 .. 1000 do
            after.Invoke record |> ignore

    [<Benchmark>]
    member _.AfterWithProvidedBuffer () =
        for i in 1 .. 1000 do
            afterWithBuffer.Invoke (record, buffer) |> ignore

    [<Benchmark>]
    member _.Direct () =
        for i in 1 .. 1000 do
            [| box record.A; box record.B; box record.C; box record.D; box record.E |] |> ignore

    [<Benchmark>]
    member _.ReaderCompilation () =
        compileRecordReaderFunc typeof<Record> |> ignore

    [<Benchmark>]
    member _.GetRecordFields () =
        for i in 1 .. 1000 do
            FSharpValue.GetRecordFields record |> ignore

BenchmarkRunner.Run<Test> () |> ignore

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
Before	546.78 us	10.889 us	14.537 us	1.00	0.00	12.6953	-	-	109.38 KB
After	21.19 us	0.411 us	0.404 us	0.04	0.00	13.3667	-	-	109.38 KB
AfterWithProvidedBuffer	17.37 us	0.241 us	0.214 us	0.03	0.00	5.7373	-	-	46.88 KB
Direct	19.27 us	0.376 us	0.501 us	0.04	0.00	13.3667	-	-	109.38 KB
ReaderCompilation	191.33 us	2.377 us	2.224 us	0.35	0.01	0.7324	0.2441	-	7.39 KB
GetRecordFields	20,648.06 us	200.123 us	187.195 us	37.47	0.86	500.0000	-	-	4132.85 KB

So if I'm interpreting this right, compiling a single reader function for the entire record (as opposed to a function for each field in the original commit) costs ~350 invocations of the present day version of the function from PreComputeRecordReader and each call of the compiled function is ~20 times faster than a single invocation of the latter. That puts the threshold somewhere around 370.

baronfel · 2020-07-18T13:52:51Z

Having this as an option would be great. When I tested FSharp.SystemTextJson last year as part of my OpenF# talk, The use of the reflection based members erased almost all of the performance benefits from using system.txt.json. Having an out-of-the-box way to get that same information in a more efficient way would make that library even more of a no-brainer than it already is

abelbraaksma · 2020-07-18T16:24:00Z

Thanks for the benchmark, the numbers help understand the impact.

If I understand the PR correctly, this pre-compiles, then caches access to members of records when PreComputeRecordReader is used, right? And you deliberately didn't do it for GetRecordFields. Since that method already has the word compute in it, it kinda makes sense. But I agree that it would be even better to be have this for more functions in reflect.fs.

I am, however, a little worried about the initial overhead. I don't know in what contexts this code is usually used (well, in reflection), and if we can ascertain somehow that the threshold of 350+ calls is reached.

An alternative would be do add functions that have Compile in the name, so that users can choose. Something like PreCompileRecordReader, GetCompiledRecordFields.

Yet another way is perhaps how dynamics works in C#, which caches the invoked member for future access, though I'm not sure if it uses Compile(). I mean, I think in C# you get the MethodInfo, and a delegate is created, which is much cheaper than compiling a LINQ tree. I didn't check if such approach is feasible here though.

kerams · 2020-07-18T17:20:07Z

If I understand the PR correctly, this pre-compiles, then caches access to members of records when PreComputeRecordReader is used, right?

The property accessors are embedded in the returned function closure. You can refer to it as caching, but if you call the precompute method again with the same type, everything is compiled anew and you get a different closure back.

you deliberately didn't do it for GetRecordFields

Yes, that's the "one-off" variant I talked about in the OP. There isn't any sort of caching involved and each call looks up PropertyInfo of each field of the record type. See GetRecordFields method in the benchmark.

and if we can ascertain somehow that the threshold of 350+ calls is reached

Unless I misunderstood something, the caller is the one that needs to know how often they're going to need to read record fields and it is their responsibility to choose the appropriate API/approach.

An alternative would be do add functions that have Compile in the name, so that users can choose. Something like PreCompileRecordReader, GetCompiledRecordFields.

Sure, but it would also be fantastic if users could automatically reap the benefits of this change by simply updating to a new version of .NET/FSharp.Core. As we have established though, this does introduce additional overhead, so I'll let the powers that be decide whether or not a new set of methods is required.

and a delegate is created, which is much cheaper than compiling a LINQ tree

I think Delegate.CreateDelegate returns a delegate for a specific instance if used with instance methods. It's also quite a bit faster than plain Invoke, but nowhere near as fast the compiled Func, which technically isn't even reflection anymore. Compare Direct and After2 benchmarks. The only overhead stems from the need to allocate an array to return the results and boxing of value types.

abelbraaksma · 2020-07-18T19:02:06Z

Unless I misunderstood something,

No, I think we're on the same page. I understand the PR better now, thanks for the explanations!

Sure, but it would also be fantastic if users could automatically reap the benefits of this change by simply updating to a new version of .NET/FSharp.Core.

I totally agree.

The property accessors are embedded in the returned function closure. You can refer to it as caching

I see. Since compiled functions are never GC'ed, it may be better to introduce global caching for this, or users may leak memory (but that may not be trivial, concurrency stuff et al, and I don't know what the general idea is about global caches from the Core lib).

I think Delegate.CreateDelegate returns a delegate for a specific instance if used with instance methods.

There's only one method table, regardless of instance, so I doubt that. You pass the instance as first argument to a delegate if it's an instance method delegate.

It's also quite a bit faster than plain Invoke, but nowhere near as fast the compiled Func

I thought so too, but in my own timings (different kinds of reflection, though) I saw only a few percent difference.

Anyway, compiling is certainly the fastest once it's compiled, but the overhead of compiling is huge compared to delegates. Which is why I raised the suggestion as an alternative.

But whatever route we take, it's a great improvement :).

kerams · 2020-07-19T08:54:48Z

I've added another method to the benchmark. This could potentially be an extra overload where results are written into the provided buffer, doing away with an array allocation.

kerams · 2020-07-20T12:26:14Z

This is pretty cool https://github.com/dadhi/FastExpressionCompiler, but I am not sure if it's desirable to depend on it in FSharp.Core.

abelbraaksma · 2020-07-20T15:45:30Z

Yeah, they try to keep FSharp.Core independent of other assemblies.

cartermp

I think this is generally a good improvement for the serialization scenario.

@dsyme what are your thoughts here?

Daniel-Svensson · 2020-07-22T20:48:10Z

Nice work @kerams.
Have you considered to just use delegate invocations as an alternative?
I did some benchmarking on different approaches to get property values a while back and found it surprisingly fast.
It also has the upside of working really fast on platforms where compiled expressions are interpreted (such as when reflection emit is missing).

I share my findings below:
Note:

update I did a quick attempt at a f# version running on netcoreapp3.1 and there the expression version seems to be faster than delegates even for 100s items, so it seems lika a god solution for that runtime
That the benchmark is in C# and available here and the timings are for creating a Func<object,object> and calling it N times.
In your scenario you will do a single expression compile per type so it cannot be directly translated as the total overhead will be lower.
I did only measure on net framework, measrements on core will be different.
The posted measurements are from my laptop so last results might se some increase in measured error (even if cpu was capped to 50%).

For my scenario delegates did win over pure reflection even after just 10 calls and is was faster.

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.18363
Intel Core i5-8250U CPU 1.60GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
  [Host]    : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.8.4180.0
  RyuJitX64 : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.8.4180.0

Job=RyuJitX64  Jit=RyuJit  Platform=X64

Method	NumInvocations	Mean	Error	StdDev	Median	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
Reflection	10	3.374 us	0.1361 us	0.3927 us	3.424 us	1.00	0.00	-	-	-	-
ExpressionCompile	10	533.485 us	16.3923 us	47.8171 us	526.475 us	160.25	22.56	0.9766	-	-	5216 B
DelegateInvoke	10	16.636 us	0.6175 us	1.7717 us	16.812 us	4.99	0.77	0.2441	-	-	787 B

Reflection	50	18.658 us	1.6585 us	4.6507 us	17.166 us	1.00	0.00	-	-	-	-
ExpressionCompile	50	740.353 us	24.1893 us	69.7915 us	750.745 us	41.72	9.86	0.9766	-	-	5216 B
DelegateInvoke	50	18.863 us	1.1237 us	3.1325 us	18.746 us	1.07	0.29	0.2441	-	-	787 B

Reflection	100	90.505 us	1.7948 us	3.1903 us	90.785 us	1.00	0.00	-	-	-	-
ExpressionCompile	100	1,239.046 us	17.2747 us	16.1587 us	1,239.779 us	14.08	0.47	-	-	-	5200 B
DelegateInvoke	100	54.064 us	4.7938 us	14.1345 us	61.546 us	0.71	0.06	0.2441	-	-	787 B

Reflection	500	291.448 us	25.6986 us	75.7730 us	272.045 us	1.00	0.00	-	-	-	-
ExpressionCompile	500	877.857 us	31.8290 us	93.8484 us	900.854 us	3.22	0.91	0.9766	-	-	5216 B
DelegateInvoke	500	49.281 us	1.9660 us	5.5772 us	49.284 us	0.18	0.05	0.2441	-	-	787 B

Reflection	1000	426.901 us	18.8465 us	55.5693 us	423.196 us	1.00	0.00	-	-	-	-
ExpressionCompile	1000	867.530 us	70.6930 us	208.4398 us	817.176 us	2.06	0.54	0.9766	-	-	5216 B
DelegateInvoke	1000	49.232 us	2.1033 us	6.1353 us	50.323 us	0.12	0.02	0.2441	-	-	787 B

KevinRansom

I'm okay with this change as is. The performance benefit when cached is excellent, and in serialization scenarios this will be noticeable and significant. The user of this API will certainly want to cache the result to eliminates generating the funcs a bunch of times. One time uses of the PreComputeRecordReader are probably fairly rare ... the clue is in the name.

Thank you for preparing this, and the performance analysis.

kerams · 2020-07-23T04:56:20Z

@KevinRansom, I'd be more than happy to try to implement this for these methods in a similar fashion (and refactor the record reader to use a single compiled func instead of one for every record field because I did not expect this to get merged so quickly :))):

PreComputeRecordConstructor(Type, FSharpOption)
PreComputeUnionConstructor(UnionCaseInfo, FSharpOption)
PreComputeUnionReader(UnionCaseInfo, FSharpOption)
PreComputeUnionTagReader(Type, FSharpOption)
PreComputeRecordFieldReader(PropertyInfo)
PreComputeTupleConstructor(Type)
PreComputeTupleReader(Type)

Additionally, do overloads taking a buffer (see AfterWithProvidedBuffer in the benchmark) sound like something that would be worth adding as well?

@Daniel-Svensson, if you're only going to read a property as few as a 100 or 1000 times, does it really matter which option you choose? Your benchmark shows that the difference between the slowest and fastest is sub millisecond (not sure what happened in (ExpressionCompile 100) and I have a hard time coming up with a plausible scenario where that would matter at all. When I set out to create this PR, I had a specific use case in mind - serialization in web servers. The compilation overhead gets amortized into nothing and you get suberb performance for the (long) lifetime of the process.

Your point about interpreted expression trees is interesting. Do those platforms throw on Compile() or do they return a delegate that does the interpretation on every invocation?

KevinRansom · 2020-07-23T18:13:20Z

@kerams, please take a look if you would like. Certainly we would consider prs for those apis.

Compile property accessors for PreComputeRecordReader

3d69ffa

baronfel mentioned this pull request Jul 18, 2020

Use System.Reflection.Emit Tarmil/FSharp.SystemTextJson#15

Draft

cartermp reviewed Jul 22, 2020

View reviewed changes

KevinRansom approved these changes Jul 23, 2020

View reviewed changes

cartermp merged commit d82a0eb into dotnet:master Jul 23, 2020

kerams deleted the reflection branch July 23, 2020 04:35

kerams mentioned this pull request Jul 25, 2020

Optimize reflection of F# types, part 2 #9784

Merged

nosami pushed a commit to xamarin/visualfsharp that referenced this pull request Feb 23, 2021

Compile property accessors for PreComputeRecordReader (dotnet#9714)

2dfb9b9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize reflection of F# types #9714

Optimize reflection of F# types #9714

kerams commented Jul 18, 2020

dnfadmin commented Jul 18, 2020 •

edited

Loading

abelbraaksma commented Jul 18, 2020 •

edited

Loading

abelbraaksma commented Jul 18, 2020

kerams commented Jul 18, 2020 •

edited

Loading

kerams commented Jul 18, 2020 •

edited

Loading

baronfel commented Jul 18, 2020

abelbraaksma commented Jul 18, 2020 •

edited

Loading

kerams commented Jul 18, 2020 •

edited

Loading

abelbraaksma commented Jul 18, 2020 •

edited

Loading

kerams commented Jul 19, 2020

kerams commented Jul 20, 2020

abelbraaksma commented Jul 20, 2020

cartermp left a comment

Daniel-Svensson commented Jul 22, 2020 •

edited

Loading

KevinRansom left a comment

kerams commented Jul 23, 2020 •

edited

Loading

KevinRansom commented Jul 23, 2020

Optimize reflection of F# types #9714

Optimize reflection of F# types #9714

Conversation

kerams commented Jul 18, 2020

dnfadmin commented Jul 18, 2020 • edited Loading

abelbraaksma commented Jul 18, 2020 • edited Loading

abelbraaksma commented Jul 18, 2020

kerams commented Jul 18, 2020 • edited Loading

kerams commented Jul 18, 2020 • edited Loading

baronfel commented Jul 18, 2020

abelbraaksma commented Jul 18, 2020 • edited Loading

kerams commented Jul 18, 2020 • edited Loading

abelbraaksma commented Jul 18, 2020 • edited Loading

kerams commented Jul 19, 2020

kerams commented Jul 20, 2020

abelbraaksma commented Jul 20, 2020

cartermp left a comment

Choose a reason for hiding this comment

Daniel-Svensson commented Jul 22, 2020 • edited Loading

KevinRansom left a comment

Choose a reason for hiding this comment

kerams commented Jul 23, 2020 • edited Loading

KevinRansom commented Jul 23, 2020

dnfadmin commented Jul 18, 2020 •

edited

Loading

abelbraaksma commented Jul 18, 2020 •

edited

Loading

kerams commented Jul 18, 2020 •

edited

Loading

kerams commented Jul 18, 2020 •

edited

Loading

abelbraaksma commented Jul 18, 2020 •

edited

Loading

kerams commented Jul 18, 2020 •

edited

Loading

abelbraaksma commented Jul 18, 2020 •

edited

Loading

Daniel-Svensson commented Jul 22, 2020 •

edited

Loading

kerams commented Jul 23, 2020 •

edited

Loading