[C#-X.0 Proposal] Determinism #1763
Replies: 49 comments 5 replies
-
/CC: @gafter @MadsTorgersen |
Beta Was this translation helpful? Give feedback.
-
Community-submitted proposals should start as issues, not as PRs. Those issues can than gather community feedback. We use "proposal" PRs to track specifications for features that are already championed and approved (at least in principle). |
Beta Was this translation helpful? Give feedback.
-
To clarify: I recommend you place your actual proposal text in some issue (e.g. this one). |
Beta Was this translation helpful? Give feedback.
-
@gafter: Do please excuse me -- I thought it was more practical to edit it locally and push it to my own fork instead. |
Beta Was this translation helpful? Give feedback.
-
Original proposal:Purity / Determinism / Constant Functions / Constant ExpressionsThis proposal is dedicated to the often mentioned ideas of "function purity", "determinism", "pre-compilation" and "constant functions/expressions". This article also mentions features and wishes expressed (more or less briefly) the following proposals:
IntroductionC# has been and currently is moving into the exiting field of functional programming. It is not at all a new field (think about how old Haskell is!) - but it is a field with great potential and should therefore be explored and used in modern C# programming. One important function is the so-called **determinism ** of functions and expressions. One can gain extreme performance, parallelism improvements if rightly used and analyzed. Furthermore, deterministic compiler approaches can prevent many run-time exceptions due to improved code transformation and analysis. I will use the word determinism throughout most of this article - but you can think of it as pure functions, pre-compiled functions or pseudo-"constant" functions. What does 'Determinism' mean?In a broad mathematical and functional sense, determinism is the concept of function or expression declaration which always returns the same output for the same input. If any function is deterministic, then follows: A C# example for deterministic functions are, e.g.: public static long FuncA() => 420L;
public static long FuncB(int x)
{
long res = 0;
for (int c = 0; c < x; ++c)
res += c * 42;
return res;
} It is pretty obvious, that the function above always outputs the same result for the same input. But what about the following one? public static long FuncC(int x)
{
long input = long.Parse(System.Console.ReadLine());
return input + x * 42;
} As This example shows that for the purpose of this article, one shall define 'deterministic as follows':
Constant functions (functions composed only of constant values and expressions) are trivially deterministic. How can we differentiate between deterministic and non-deterministic functions?To determine whether a function is deterministic (aka. "pure"), one must (recursively) look at all affected functions and variables:
Why do we need this?Good point. Why do we need this indeed? 1.) Performance: caching or look-upImagine having having a huge dataset with statistical data. Determinism could be used in a lot of mathematical functions handling this dataset. Examples could be the calculation of metrics, such as the standard deviation, median, regression expressions, etc... Many performance aspects however come apparent on the second call of a deterministic function: 2.) Performance: parallelismAs deterministic functions are only composed of deterministic function-calls and expressions, their code is known to be side-effect free. They can therefore often be parallelized safely in order to gain performance, as they are known to have no side-effects on the remaining application state. If global variables are involved, some kind of functional dependency graph or transactional system has to be created by the compiler in order to insure the variable's determinism. You could imagine the parallelism of deterministic functions along the lines of parallelism of Having a set of deterministic functions with function // deterministic
public static const string F(string s)
{
return new string(s.ToUpperInvariant()
.Reverse()
.ToArray());
}
// non-deterministic
public static void G(object o) => Console.WriteLine(o?.ToString() ?? "<null>");
// non-deterministic
public static void Main()
{
string input = Console.ReadLine();
Thread.Sleep(1000); // or some other long-duration operation
G(F(input));
} If the calculation times are as follows:
one would expect the e.g. following classic timeline:
The method
Of course - this was an example with rather bad numbers, but you do get the idea... ;) 3.) Performance: pre-compilationImagine the following function using static System.Math;
public static float sigmoid(float x) => Exp(x) / (Exp(x) + 1); Assuming that the function This could also be applied to string-manipulation, e.g.: string s = $"Hello {Math.PI:N3}!".ToLowerInvariant(); would always result in the following value: "hello 3.142!" 4.) Elimination of (semantically) unused codeAs deterministic functions are per definition side-effect free, one can safely remove code snippets inside those functions which have no effect on the return value. This is possible, as the semantics of the "optimized" functions do not differ from the "original" one. An example: L01: public const float MyComplexFunction(float y)
L02: {
L03: float x = MyComplexFunction(Sin(y));
L04:
L05: return y * 7;
L06: } Traditionally, the execution of However, as we know that The compiler could now remove The compiled result would be: public const float MyComplexFunction(float y) => y * 7; Which would operate as semantically expected. 5.) Reduction of error sourcesThe usage of determinism could greatly reduce the amount of produced errors, as potential conflicts could be determined by the compiler (e.g. eventual overflow or division-by-zero). If the application of determinism in C# could be broadened to -- let's say -- discriminated unions and pattern matching, many patterns could be detected by the compiler which would e.g. never be matched etc. Determinism could also be applied to the upcoming C#8.0-feature of nullable reference types: Determinism could provide the compiler with improved methods of tracking null references. 6.) More compile-time constants!!Many expressions can be transformed to be compile-time constants, if all underlying (arithmetic) operators and function calls are insured to be deterministic. This enables the usage of compile-time constants in many more places. Please take a look at the following chapter (especially this section) for more information. OK, I get it. But what would it look like?There are multiple designs which this proposal could adopt:
For the purpose of this proposal, we settle on "Conservative determinism" with the keyword Examples:Deterministic functions/// calculate hyperbolic cotangent
public static const float Coth(float x) => (Exp(2 * x) + 1) / (Exp(2 * x) - 1) This would the function public static DateTime GetToday() => DateTime.Now;
public static const DateTime GetNextDay()
{
DateTime today = GetToday();
// ^^^^^^^^^^ ERROR
// A non-deterministic function cannot be called from a deterministic one.
return today.AddDays(1);
} If the function public static const DateTime GetToday() => DateTime.Now;
// ^^^ ERROR
// A non-determinisitic property cannot be accessed from a determinisitic function.
public static const DateTime GetNextDay()
{
DateTime today = GetToday();
return today.AddDays(1);
} It is needless to say that calls from a non-deterministic function to a deterministic one are perfectly valid. Deterministic typesA type can be marked as deterministic, if it is public const struct Point2Df
{
public const float X { get; }
public const float Y { get; }
....
}
Point2Df point = new Point2Df();
Point2Df* ptr = &point;
// ^ ^^^^^^ ERROR
// Cannot take address of deterministic datatyoe 'Point2Df'. For now, only classes and (managed) read-only structures should be valid candidates for deterministic datatypes. A deterministic type could look as follows: public const sealed class ComplexNumber
{
private float _re, _im;
// deterministic constructor
public const ComplexNumber()
: this(0, 0) { }
// deterministic constructor
public const ComplexNumber(float re)
: this(re, 0) { }
// deterministic constructor
public const ComplexNumber(float re, float im)
{
_re = re;
_im = im;
}
// deterministic function (this requires "Sqrt : float -> float" to be deterministic)
public const float GetMagnitude() => Sqrt(_re * _re + _im * _im);
} Deterministic operators and propertiesProperties and (custom) operators can only be used inside deterministic functions or expressions, if they are also marked as public const sealed class ComplexNumber
{
.....
// deterministic read-only property
public const float Imaginary => _im;
// deterministic read-write property
public const float Real
{
get => _re;
set => _re = value;
}
public static const ComplexNumber operator+(ComplexNumber c1, ComplexNumber c2) =>
new ComplexNumber(c1._re + c2._re, c1._im + _c2.im);
} The code above could be compiled into something like this: public const sealed class ComplexNumber
{
.....
public const float get_Imaginary() => this._im;
public const float get_Real() => this._re;
public const void set_Real(float $value) => this._re = $value;
public static const ComplexNumber op_Addition(ComplexNumber c1, ComplexNumber c2) =>
new ComplexNumber(c1._re + c2._re, c1._im + _c2.im);
} which are all deterministic functions. Deterministic flow controlDeterminism would also expand to flow control, e.g.: public static const float GetMagicNumber(float x) => x / 0f;
// Compiler detects that the function above always returns float.Infinity
public static const string GetMagicString(int i)
{
float res = GetMagicNumber(i + 42.0f);
if (float.IsInfinity(res))
return "Oh noes!";
else
{
return "Yes!";
}
}
// As all used constructs are deterministic, the compiler would replace any call of the
// function 'GetMagicString : int -> string' with the constant string "Oh noes!". Constant expressionsDeterminism naturally extends to expressions, meaning that the expression The value of an expression is known at compile-time if it is only composed constant literals and deterministic function calls with constant parameters. This would e.g. enable the following expression to be compile-time constant: public const int MY_CONSTANT = $"Hello, {Math.E}!"[7] - '\x8'; It would be composed as follows during compile-time: public const int MY_CONST = $"Hello, {Math.E}!"[7] - '\x8';
// equals: (int)string.Format("Hello, {0}", Math.E).get_Chars(7) - (int)'\x8'
// (int) "Hello, 2.71828182845905!" .get_Chars(7) - (int)'\x8'
// (int) '2' - (int)'\x8'
// int.op_Implicit('2') - int.op_Implicit('\x8')
// int.op_Subtract(50, 8)
// 42 The compiler would build an expression tree of the expression
The compiler would then issue the following IL for the expression above: .field public static literal int32 MY_CONSTANT = int32(42) Constant expressions could be used as default parameter value: public static float FuncF(float x, float y = Sin(2 * PI) / 14)
// ^^^^^^^^^^^^^^^^
// This will be evaluated at compile-time!!
{
return Tan(x) * y;
} Naturally, the usage of compile-time constants (in the deterministic sense) could be expanded to attributes: [Obsolete($"Perfectly valid code! {nameof(MyFunc)} is obsolete -- and this is compile-time constant! {Math.Pow(0.025, -1) + 2}")]
public void MyFunc() { ... } Which would evaluate to: [Obsolete("Perfectly valid code! MyFunc is obsolete -- and this is compile-time constant! 42")]
public void MyFunc() { ... } Deterministic Attributes (?)A possible idea would be to allow determinism to be allowed in attributes as well: public const delegate bool DeterministicFunc(float f);
public const sealed class CheckParameterAttribute
: Attribute
{
...
const public CheckParameterAttribute(DeterministicFunc<float, bool> predicate)
{
...
}
}
public const void funcE(
[CheckParameter(const f => f > -1 && f < 1)]
float f
)
{
...
} This would, however, require a language change (except if a deterministic function handle or What about accessing type/global variables?Well -- this is a more complex matter. Imagine having the following code: public const struct Matrix2D<const T>
{
private T[,] _values;
public const Matrix2D(int n, int m) =>
_values = new T[n, m];
public const void Transpose() { ... }
public const T[] Eigenvalues() { ... }
public const Vector<T> Eigenvectors() { ... }
public const (Matrix2D<T> Left, Matrix2D<T> Right) DecomposeLR() { ... }
public static const Matrix2D<T> GenerateTHEmatrix() { ... }
} Let's imagine the following access patterns:
If the code of our public static const void Main(string[] argv)
{
Matrix2D<float> l, r, m, n;
Vector<float> ev;
float[] e;
m = new Matrix2D<float>(5, 5);
.... // fill values into m
n = Matrix2D.GenerateTHEmatrix(); // some really time-intensive operation
e = m.Eigenvalues();
ev = m.Eigenvectors();
m.Transpose();
(l, r) = m.DecomposeLR();
} One cannot parallelize The compiler must create some kind of functional dependency graph (a bit like transactional database systems do), in order to parallelize as much code as possible: [Ignore the absolutely mad MSPaint skillz™] The following access conflict table should be used when checking whether two sequential accesses
If the dependency graph shows a cyclic dependency, all deterministic functions must be executed in sequence (traditional code execution order). TL;DRIn general, the compiler could follow the following guidelines for deterministic functions:
Issues / Drawbacks / QuestionsThis goes without saying, that such complex determinism analysis requires a lot of work. The following points are some issues, drawbacks and open questions which I thought of:
ConclusionIS THIS WORTH IT? The C#8.0 feature IMO, It would definitely require a breaking CLI change (which would incidentally be a good excuse to clean-up the CLI and its instruction set 😉). The huge workload involved in the implementation would therefore make it a candidate for a major language version (e.g. C#-10). |
Beta Was this translation helpful? Give feedback.
-
This pretty much guarantees that it's destined to go absolutely nowhere. Giant meta-proposals are not useful. They force all of the feedback through a single funnel making the following of any conversation completely impossible. If you wanted any serious attention to any of these individual proposals I'd suggest submitting them separately, at least for the proposals that you're not just duplicating. |
Beta Was this translation helpful? Give feedback.
-
Apologies in advance if some of my notes that follow get too nitpicky. The term "determinism" refers to quite a different concept in the Roslyn compiler, so I think this proposal should use a different name (like "pure functions"), to avoid confusion.
What does that mean? It seems to me you are conflating whether a function (e.g.
What are "expression chains" and "operation flows"? Isn't it enough to say that all statements need to be deterministic? (And that a statement is deterministic if all its child expressions and statements are deterministic.)
I don't understand what this means. What are a function's dependencies? Is this referring to field accesses?
What does it mean to "access memory" from C#? Does reading a local variable (which could be stored on the stack, which is part of memory) count? Does e.g. Also, don't most of these boil down to PInvoke calls anyway?
Memoization can indeed improve performance significantly. But I would like to see evidence that some form of automatic memoization at the JIT level would make sense. As far as I know, even Haskell does not use that.
Your example would require adding some synchronization. How would the compiler figure out that doing so is worth it? Also, it's improving latency at the cost of decreasing throughput (e.g. when there are many requests and you want to process as many requests as possible, you don't want the computation to be done in parallel). How does the compiler determine that this trade-off is acceptable to the user? In other words: automatic parallelization is a hard problem. Knowing what code can be safely executed in parallel is necessary for that, but not sufficient.
That would be a breaking change, since it currently produces
Do you have a more realistic example of where this would be actually useful?
Do I understand it correctly that this is about producing errors at compile time for code that always throws? If not, could you expand on that?
No, it would not.
How would the compiler do that? How would it figure out what fields does a function read or write? And if it can do that, wouldn't it also be able to figure out what functions are deterministic, so we wouldn't need the |
Beta Was this translation helpful? Give feedback.
-
Like @HaloFour I really think that you need to break this into multiple proposals and I'm not sure where you should start but it seems like touch multiple places, personally, I want to have constant functions and expressions. I'll throw it out here because it's a thought I had from the previous discussion we had instead of letting the compiler do all this work I'd really think it's better to have a "constexpr engine" where you can mark any expression/function with Good luck! ;) Rephrased. |
Beta Was this translation helpful? Give feedback.
-
Yes -- you are right, I should rename it
That was a bad choice of wording on my part ... I meant that the arguments can be deterministically computed (at invocation)
You are right, saying that all statements need to be deterministic is enough. I thought that I should elaborate this a bit, as devs sometimes tend to forget, that operators can also be functions (instead of simple CPU instructions).
Yes, but not only field access. I view a function's dependencies as a set of data sources, which a function relies upon. This could be access to a variable/field/... at a point
I meant "traditional" memory access via e.g. pointers, fixed buffers etc.. I mean, that determinism can only be assured, if no other code can modify the memory region utilised by the deterministic function.
No. Nor does array access (as long as the array is not pinned).
Do you mean the points in my list? Yes, they more or less do. Concerning your thoughts on
Hm -- interesting. I forgot that some cultures differentiate between
Yes - exactly.
I unfortunately forgot that during the typing of the proposal - I never really used culture-dependent string formatting (not even when designing language packages for applications).
I think that this should be possible by looking at the generated syntax tree and symbol table to resolve all assignments to/reads from fields.
Yes -- you are right. It would effectively render the |
Beta Was this translation helpful? Give feedback.
-
@eyalsk @HaloFour @eyalsk I really like your idea of Roslyn acting as a service in the background. |
Beta Was this translation helpful? Give feedback.
-
To some extent it's important to use a modifier, especially if you want determinism to act across binary boundaries. I see it as similar to the proposed |
Beta Was this translation helpful? Give feedback.
-
That's part of my question. As I understand it, this proposal would require knowing which fields are read or written by a |
Beta Was this translation helpful? Give feedback.
-
If that argument is a delegate, can your function be inferred to be deterministic if that delegate is? Like, |
Beta Was this translation helpful? Give feedback.
-
@orthoxerox : That was my thought exactly. |
Beta Was this translation helpful? Give feedback.
-
@Unknown6656: @eyalsk idea seems a lot better than spending time on ways to determine if a function is deterministic. Naive approach, create a flag for calculated consts, use that to compile 2 versions of the code one without and one with, use the one without to supply values to the one with. Doubles the work that needs to be done, but everything that can be calculated and used could be used, and the only restriction is that all of the direct function calls be static |
Beta Was this translation helpful? Give feedback.
-
If you want to completely optimize away any (every?) pure/deterministic function that has parameters known at compile time, the compiler is going to need to execute the code to generate the result. Given that the Roslyn compiler is extensively used within Visual Studio (for Intellisense, Unit Test discovery, Code Analyzers, other things), this raises a scary scenario: I clone a git repo of something, so I can review the code. Notes ... ... I didn't compile and run the code, it happened automatically in the background. Aggressive inlining of pure/deterministic functions might be worth it - but would clearly need to be approached in a very conservative, diligent and security conscious fashion. |
Beta Was this translation helpful? Give feedback.
-
@theunrepentantgeek Also I imagine it would be doable to only evaluate constant expressions when actually compiling, and/or to evaluate them in a sandbox. After all, if they are truly constant, they shouldn't require anything outside the sandbox. |
Beta Was this translation helpful? Give feedback.
-
i would never expect the actual functions to be optimized away. I would simply expect the invocations of those functions to be potentially optimized away. |
Beta Was this translation helpful? Give feedback.
-
If you're willing to wait till compilation time, then i go back to my original point: i don't see why this is part of the C# language. Just add a compile step that does this through IL optimization. |
Beta Was this translation helpful? Give feedback.
-
@CyrusNajmabadi |
Beta Was this translation helpful? Give feedback.
-
Arbitrary user code in the form of MSBuild scripts executes when you load a csproj. (I've done this.) It can execute any code that a standalone EXE can. The same is true of each NuGet package you reference when you do a package restore. |
Beta Was this translation helpful? Give feedback.
-
My advice is to be antifragile to malware. This means offline backups, only allowing your machine to access data that you don't mind sharing with the world, and a streamlined routine so that doing clean installations of Windows doesn't hurt. |
Beta Was this translation helpful? Give feedback.
-
@theunrepentantgeek Think about it, what could it execute? Obviously it would not touch any machine-specific code, nor any code concerning streams, I/O, shared memory, networking, user input or hardware. These functions/APIs are not pure in a deterministic sense. I do not see any problem as any "harmful" code would obviously require to be harmful outside of the program itself in order to harm the machine (or any in/outgoing communication). As these are per definition non-deterministic, the compiler would not even touch them. (I'm using the 'compiler' here as step, but it could technically be in the linker or an other pipeline step) @CyrusNajmabadi: Why should we not optimize away complete functions? Would it make a semantic difference on non-public functions (or fields for that matter)? |
Beta Was this translation helpful? Give feedback.
-
What is a 'complete function'?
That's a pretty big deal. Why would you break reflection?
That also seems like a pretty big deal. |
Beta Was this translation helpful? Give feedback.
-
The point I would like to make is: I do understand, that it is a huge deal, however I think that we should reflect upon about how we could reduce a .NET application's footprint and resource load. EDIT: Do pardon me, on second thought it is a bad idea to introduce such a huge breaking change. |
Beta Was this translation helpful? Give feedback.
-
@Unknown6656 I think that you are right, but this kind of optimization will be problematic, I mean not imposible without breaking-changes, but really dificult. I think that for now we should try to push the pure/determinism/constexp part of the proposal. PD: Of course its good to have in consideration future work related to this proposal. This probably will help upstreaming this proposal |
Beta Was this translation helpful? Give feedback.
-
What do you mean by 'public' semantics? reflection, for example, is a public API. Note: if you want something that pulls out unused methods, then just use a tree-shaking tool. Note that none of this is really related to the language per-se. It seems to be about optimizing the end compiled IL. THis all feels like the domain of IL optimizers. |
Beta Was this translation helpful? Give feedback.
-
Yes it is mostly on an IL level and not language level, however, I thought some language annotation might be needed in my original proposal.
Private methods not having any visible effect on the program's input and output
That is the reason why it is such a breaking change and why I "retract" my previous standpoint.
I agree that this will be difficult. I would like to help on that matter but I do not know where would be the best point to start... |
Beta Was this translation helpful? Give feedback.
-
I think that the first step is view the benefits and harms of IL Level implementation vs Language Level implementation. But probably @gafter can give us more information of what is required and his personal opinion on this. |
Beta Was this translation helpful? Give feedback.
-
I would comment that this proposal is huge. Can we start with just deterministic functions? That will allow a lot of optimizations, and improved constant declarations, without major (if any) changes to the CLR. |
Beta Was this translation helpful? Give feedback.
-
EDIT: Jump to this comment -> #1763 (comment) <- to see the full proposal
Purity / Determinism / Constant Functions / Constant Expressions
I have written a rather long proposal about a possible language feature concerning the determinism/purity of code and it's usage for a future possible language version.
This could be seen as an "umbrella proposal" for the following issues (mixed with some of my ideas):
constexpr
in C#?"constant_string.Length
"This is my first in-depth proposal and I am open (and hoping) for a healthy discussion about a subject which I myself find very interesting, potential-rich and very valuable for future C# iterations.
Due to the length of the proposal, I would advise any reader to first grab a piece of cake and a hot chocolate/coffee before starting to read 😉
Beta Was this translation helpful? Give feedback.
All reactions