[C#-X.0 Proposal] Determinism #1763

Unknown6656 · 2018-08-02T09:27:42Z

Unknown6656
Aug 2, 2018

EDIT: Jump to this comment -> #1763 (comment) <- to see the full proposal

Purity / Determinism / Constant Functions / Constant Expressions

Proposed: https://github.com/Unknown6656/csharplang/blob/master/proposals/determinism.md
Prototype: None
Implementation: Not Started
Specification: Not Started

I have written a rather long proposal about a possible language feature concerning the determinism/purity of code and it's usage for a future possible language version.

This could be seen as an "umbrella proposal" for the following issues (mixed with some of my ideas):

dotnet/csharplang#504: "Enable Compile-Time Folding of Deterministic Outputs"
dotnet/csharplang#776: "C# Pure Function Keyword to Mark No Side Effects Or External Dependencies"
dotnet/csharplang#1028: "Compile time expressions"
dotnet/csharplang#1413: "Compile-Time method inlining"
dotnet/roslyn#9627: "Compile Time Function Execution"
dotnet/roslyn#10506: "Static Expressions"
dotnet/roslyn#11259: "Constant string interpolation"
dotnet/roslyn#12238: "Compile time expressions"
dotnet/roslyn#14665: "Constant expression - Power operator?"
dotnet/roslyn#15079: "How would you imagine constexpr in C#?"
dotnet/coreclr#3633: "Optimizing constant_string.Length"

This is my first in-depth proposal and I am open (and hoping) for a healthy discussion about a subject which I myself find very interesting, potential-rich and very valuable for future C# iterations.

Due to the length of the proposal, I would advise any reader to first grab a piece of cake and a hot chocolate/coffee before starting to read 😉

Unknown6656 · 2018-08-02T09:29:27Z

Unknown6656
Aug 2, 2018
Author

/CC: @gafter @MadsTorgersen
@eyalsk (IIRC you did also lead a lengthy discussion about exactly this topic and were quiet passionate about it)

0 replies

gafter · 2018-08-02T18:48:52Z

gafter
Aug 2, 2018

Community-submitted proposals should start as issues, not as PRs. Those issues can than gather community feedback. We use "proposal" PRs to track specifications for features that are already championed and approved (at least in principle).

0 replies

gafter · 2018-08-02T18:49:52Z

gafter
Aug 2, 2018

To clarify: I recommend you place your actual proposal text in some issue (e.g. this one).

0 replies

Unknown6656 · 2018-08-02T19:46:52Z

Unknown6656
Aug 2, 2018
Author

@gafter: Do please excuse me -- I thought it was more practical to edit it locally and push it to my own fork instead.
I will copy the contents of the .md into this issue.

0 replies

Unknown6656 · 2018-08-02T19:55:35Z

Unknown6656
Aug 2, 2018
Author

Original proposal:

Purity / Determinism / Constant Functions / Constant Expressions

This proposal is dedicated to the often mentioned ideas of "function purity", "determinism", "pre-compilation" and "constant functions/expressions".

This article also mentions features and wishes expressed (more or less briefly) the following proposals:

dotnet/csharplang#504: "Enable Compile-Time Folding of Deterministic Outputs"
dotnet/csharplang#776: "C# Pure Function Keyword to Mark No Side Effects Or External Dependencies"
dotnet/csharplang#1028: "Compile time expressions"
dotnet/csharplang#1413: "Compile-Time method inlining"
dotnet/roslyn#9627: "Compile Time Function Execution"
dotnet/roslyn#10506: "Static Expressions"
dotnet/roslyn#11259: "Constant string interpolation"
dotnet/roslyn#12238: "Compile time expressions"
dotnet/roslyn#14665: "Constant expression - Power operator?"
dotnet/roslyn#15079: "How would you imagine constexpr in C#?"
dotnet/coreclr#3633: "Optimizing constant_string.Length"

Introduction

C# has been and currently is moving into the exiting field of functional programming. It is not at all a new field (think about how old Haskell is!) - but it is a field with great potential and should therefore be explored and used in modern C# programming.

One important function is the so-called **determinism ** of functions and expressions. One can gain extreme performance, parallelism improvements if rightly used and analyzed. Furthermore, deterministic compiler approaches can prevent many run-time exceptions due to improved code transformation and analysis.

I will use the word determinism throughout most of this article - but you can think of it as pure functions, pre-compiled functions or pseudo-"constant" functions.

What does 'Determinism' mean?

In a broad mathematical and functional sense, determinism is the concept of function or expression declaration which always returns the same output for the same input.
Mathematically speaking, this should look as follows:

If any function is deterministic, then follows:

A C# example for deterministic functions are, e.g.:

public static long FuncA() => 420L;

public static long FuncB(int x)
{
    long res = 0;
    
    for (int c = 0; c < x; ++c)
        res += c * 42;
    
    return res;
}

It is pretty obvious, that the function above always outputs the same result for the same input. But what about the following one?

public static long FuncC(int x)
{
    long input = long.Parse(System.Console.ReadLine());
    
    return input + x * 42;
}

As FuncC uses Console.ReadLine : void -> string, the function FuncC can only be deterministic, if ReadLine is (which it of course isn't).

This example shows that for the purpose of this article, one shall define 'deterministic as follows':
A function is deterministic ⟺

All parameters are deterministic
All inner function calls are deterministic
Any expression, expression chain or operation 'flow' is deterministic
All dependencies are deterministic (in time)

Constant functions (functions composed only of constant values and expressions) are trivially deterministic.

How can we differentiate between deterministic and non-deterministic functions?

To determine whether a function is deterministic (aka. "pure"), one must (recursively) look at all affected functions and variables:

If a function solely accesses parameters, (constant) literals, local function variables and other deterministic functions, then it definitely is deterministic.
If a function accesses any of the following components - it is not deterministic:
- Memory (physical or virtual)
- Interrupt-based mechanics, such as devices (including the mouse) or timers
- Stream-based mechanics, such as Networking, I/O, etc.
- UI-based components, such as Windows, UI-message-queues (they often use interrupts)
- P/Invoke calls (or any calls to non-deterministic code)
The above components are also called "side-effects", as any code operating with them can have undesired or non-deterministic (read: non-repeatable) side-effects on the own application's state.
If a function access global (static or instance) variables, the determinism cannot always be ensured. The requirements for this must be: No external or non-deterministic code can access the variables in question. All functions accessing the global variable must be deterministic.
For any other remaining cases, one must define in greater detail where the line has to be drawn between determinism and non-determinism.

Why do we need this?

Good point. Why do we need this indeed?
There are a couple of reasons out there:

1.) Performance: caching or look-up

Imagine having having a huge dataset with statistical data. Determinism could be used in a lot of mathematical functions handling this dataset. Examples could be the calculation of metrics, such as the standard deviation, median, regression expressions, etc...

Many performance aspects however come apparent on the second call of a deterministic function:
As the JIT knows a function to be deterministic, it could simply look-up its result upon re-calculation. The result would be correct, as the output of any deterministic function does not change, as long as the input parameters remain unchanged.
The JIT could have some kind of (partial) look-up table or cache for deterministic functions in order to hugely improve an application's performance.

2.) Performance: parallelism

As deterministic functions are only composed of deterministic function-calls and expressions, their code is known to be side-effect free. They can therefore often be parallelized safely in order to gain performance, as they are known to have no side-effects on the remaining application state.

If global variables are involved, some kind of functional dependency graph or transactional system has to be created by the compiler in order to insure the variable's determinism.

You could imagine the parallelism of deterministic functions along the lines of parallelism of async functions:

Having a set of deterministic functions with function F waiting for the result of function G at some point, one can pre-compute F in the background (in parallel) and "await" its result before passing it along to G, e.g:

// deterministic
public static const string F(string s)
{
    return new string(s.ToUpperInvariant()
                       .Reverse()
                       .ToArray());
}

// non-deterministic
public static void G(object o) => Console.WriteLine(o?.ToString() ?? "<null>");

// non-deterministic
public static void Main()
{
    string input = Console.ReadLine();
    
    Thread.Sleep(1000); // or some other long-duration operation
    
    G(F(input));
}

If the calculation times are as follows:

                 F:    5ms
  Console.ReadLine:   10ms [+ wait for user input]
 Console.WriteLine:   10ms
Thread.Sleep(1000): 1000ms

one would expect the e.g. following classic timeline:

ms: 0                           5000   5010      6010 6015   6025
    |-----WAIT FOR USER INPUT-----|------|---------|----|------|
  START                           READLINE  SLEEP    F  WRITELINE

The method F, however, is deterministic, so it could already be called at 5010 ms instead of 6010 ms. This is semantically perfectly valid, as F is known to have no side-effect on the rest of the application (or the rest of the application on F).
The execution time of Main could therefore be effectively reduced by the calculation time of F (5 ms):

ms: 0                           5000   5010 5015     6010   6020        <-- overall time reduced
    |-----WAIT FOR USER INPUT-----|------|----|--------|------|
    |                             |      |--F-'        |      |         <-- F is executed
  START                           READLINE    SLEEP    WRITELINE              in parallel

Of course - this was an example with rather bad numbers, but you do get the idea... ;)

3.) Performance: pre-compilation

Imagine the following function sigmoid:

using static System.Math;

public static float sigmoid(float x) => Exp(x) / (Exp(x) + 1);

Assuming that the function Math.Exp : float -> float is deterministic, the function sigmoid obviously also is. Upon compiling a program which contains the function sigmoid, the compiler could take a look at all call occurrences and replace the function calls with the pre-calculated result.

This could also be applied to string-manipulation, e.g.:

string s = $"Hello {Math.PI:N3}!".ToLowerInvariant();

would always result in the following value:

"hello 3.142!"

4.) Elimination of (semantically) unused code

As deterministic functions are per definition side-effect free, one can safely remove code snippets inside those functions which have no effect on the return value. This is possible, as the semantics of the "optimized" functions do not differ from the "original" one.

An example:

L01: public const float MyComplexFunction(float y)
L02: {
L03:     float x = MyComplexFunction(Sin(y));
L04:     
L05:     return y * 7;
L06: }

Traditionally, the execution of MyComplexFunction : float -> float would result into a StackOverflowException on line L03 due to a never-ending recursive call. This could not be optimized with the traditional C# compiler, as the function is not even tail-recursive.

However, as we know that MyComplexFunction is deterministic, it is known to have no side effects. Therefore, we can omit the line L03 entirely, as its results will never effect the return value of MyComplexFunction.

The compiler could now remove L03 and issue the hint Removed line L03, as it has no effects on the function's semantics., instead of only issuing a warning Unused variable 'x' in line L03. and leaving the line L03 untouched.

The compiled result would be:

public const float MyComplexFunction(float y) => y * 7;

Which would operate as semantically expected.

5.) Reduction of error sources

The usage of determinism could greatly reduce the amount of produced errors, as potential conflicts could be determined by the compiler (e.g. eventual overflow or division-by-zero). If the application of determinism in C# could be broadened to -- let's say -- discriminated unions and pattern matching, many patterns could be detected by the compiler which would e.g. never be matched etc.

Determinism could also be applied to the upcoming C#8.0-feature of nullable reference types: Determinism could provide the compiler with improved methods of tracking null references.

6.) More compile-time constants!!

Many expressions can be transformed to be compile-time constants, if all underlying (arithmetic) operators and function calls are insured to be deterministic. This enables the usage of compile-time constants in many more places.

Please take a look at the following chapter (especially this section) for more information.

OK, I get it. But what would it look like?

There are multiple designs which this proposal could adopt:

"Conservative determinism":
The compiler should only employ determinism if the developer says to do so. This could implemented as follows:
- A keyword like const or pure to mark deterministic functions.
  My favorite solution for now is the usage of the existing keyword const, so I will stick to that throughout this article.
- An compiler-intrinsic attribute, e.g. [Pure] or [Determinsitic] which would have the same effect as the keyword(s) mentioned above.
"Aggressive determinism:"
The compiler shall greedily assume any function as being deterministic and require the developer to mark non-deterministic functions as such.
This could be the proposal's "final" goal in the far-away future. However, such a behavior would be a breaking change at the current stage.

For the purpose of this proposal, we settle on "Conservative determinism" with the keyword const as determinism-marker for now.

Examples:

Deterministic functions

/// calculate hyperbolic cotangent
public static const float Coth(float x) => (Exp(2 * x) + 1) / (Exp(2 * x) - 1)

This would the function System.Math.Exp : float -> float also require to be marked as const.
Any call from a deterministic function to a non-deterministic one would result in a compiler error:

public static DateTime GetToday() => DateTime.Now;

public static const DateTime GetNextDay()
{
    DateTime today = GetToday();
    //               ^^^^^^^^^^ ERROR
    // A non-deterministic function cannot be called from a deterministic one.
    
    return today.AddDays(1);
}

If the function GetToday : void -> DateTime would be marked as const, the error would "shift" as follows:

public static const DateTime GetToday() => DateTime.Now;
//                                                  ^^^ ERROR
// A non-determinisitic property cannot be accessed from a determinisitic function.     

public static const DateTime GetNextDay()
{
    DateTime today = GetToday();
    
    return today.AddDays(1);
}

It is needless to say that calls from a non-deterministic function to a deterministic one are perfectly valid.

Deterministic types

A type can be marked as deterministic, if it is sealed and all containing methods are deterministic. Furthermore, deterministic types cannot be modified via memory operations such as pointer, pinning or P/Invoke operations:

public const struct Point2Df
{
    public const float X { get; }
    public const float Y { get; }
    
    ....
}


Point2Df point = new Point2Df();
Point2Df* ptr = &point;
//      ^       ^^^^^^ ERROR
// Cannot take address of deterministic datatyoe 'Point2Df'.

For now, only classes and (managed) read-only structures should be valid candidates for deterministic datatypes.

A deterministic type could look as follows:

public const sealed class ComplexNumber
{
    private float _re, _im;
    
    
    // deterministic constructor
    public const ComplexNumber()
        : this(0, 0) { }
    
    // deterministic constructor
    public const ComplexNumber(float re)
        : this(re, 0) { }
    
    // deterministic constructor
    public const ComplexNumber(float re, float im)
    {
        _re = re;
        _im = im;
    }
    
    // deterministic function (this requires "Sqrt : float -> float" to be deterministic)
    public const float GetMagnitude() => Sqrt(_re * _re + _im * _im);
}

Deterministic operators and properties

Properties and (custom) operators can only be used inside deterministic functions or expressions, if they are also marked as const. The syntax would look along these lines:

public const sealed class ComplexNumber
{
    .....
    
    // deterministic read-only property
    public const float Imaginary => _im;
    
    // deterministic read-write property
    public const float Real
    {
        get => _re;
        set => _re = value;
    }
    
    public static const ComplexNumber operator+(ComplexNumber c1, ComplexNumber c2) =>
        new ComplexNumber(c1._re + c2._re, c1._im + _c2.im);
}

The code above could be compiled into something like this:

public const sealed class ComplexNumber
{
    .....
    
    public const float get_Imaginary() => this._im;
    
    public const float get_Real() => this._re;
    
    public const void set_Real(float $value) => this._re = $value;
    
    public static const ComplexNumber op_Addition(ComplexNumber c1, ComplexNumber c2) =>
        new ComplexNumber(c1._re + c2._re, c1._im + _c2.im);
}

which are all deterministic functions.

Deterministic flow control

Determinism would also expand to flow control, e.g.:

public static const float GetMagicNumber(float x) => x / 0f;
// Compiler detects that the function above always returns float.Infinity

public static const string GetMagicString(int i)
{
    float res = GetMagicNumber(i + 42.0f);
    
    if (float.IsInfinity(res))
        return "Oh noes!";
    else
    {
        return "Yes!";
    }
}
// As all used constructs are deterministic, the compiler would replace any call of the
// function 'GetMagicString : int -> string' with the constant string "Oh noes!".

Constant expressions

Determinism naturally extends to expressions, meaning that the expression 420f / 10f is expected to always yield the result 42f. One could therefore redefine compile-time constant expressions:

The value of an expression is known at compile-time if it is only composed constant literals and deterministic function calls with constant parameters.

This would e.g. enable the following expression to be compile-time constant:

public const int MY_CONSTANT = $"Hello, {Math.E}!"[7] - '\x8';

It would be composed as follows during compile-time:

public const int MY_CONST = $"Hello, {Math.E}!"[7] - '\x8';
//  equals:                 (int)string.Format("Hello, {0}", Math.E).get_Chars(7) - (int)'\x8'
//                          (int)     "Hello, 2.71828182845905!"    .get_Chars(7) - (int)'\x8'
//                          (int)                       '2'                       - (int)'\x8'
//                                      int.op_Implicit('2')      -      int.op_Implicit('\x8')
//                                             int.op_Subtract(50, 8)
//                                                       42

The compiler would build an expression tree of the expression MY_CONSTANT : int and only evaluate its value if the following functions are deterministic or constant:

string.Format : string -> obj[] -> string
string.get_Chars : string -> int -> char
int.op_Implicit : char -> int
int.op_Subtract : int -> int -> int

The compiler would then issue the following IL for the expression above:

.field public static literal int32 MY_CONSTANT = int32(42)

Constant expressions could be used as default parameter value:

public static float FuncF(float x, float y = Sin(2 * PI) / 14)
//                                           ^^^^^^^^^^^^^^^^
//                                           This will be evaluated at compile-time!!
{
    return Tan(x) * y;
}

Naturally, the usage of compile-time constants (in the deterministic sense) could be expanded to attributes:

[Obsolete($"Perfectly valid code! {nameof(MyFunc)} is obsolete -- and this is compile-time constant! {Math.Pow(0.025, -1) + 2}")]
public void MyFunc() { ... }

Which would evaluate to:

[Obsolete("Perfectly valid code! MyFunc is obsolete -- and this is compile-time constant! 42")]
public void MyFunc() { ... }

Deterministic Attributes (?)

A possible idea would be to allow determinism to be allowed in attributes as well:

public const delegate bool DeterministicFunc(float f);

public const sealed class CheckParameterAttribute
    : Attribute
{
    ...
    
    const public CheckParameterAttribute(DeterministicFunc<float, bool> predicate)
    {
      ...
    }
}


public const void funcE(
    [CheckParameter(const f => f > -1 && f < 1)]
    float f
)
{
  ...
}

This would, however, require a language change (except if a deterministic function handle or .text offset would be stored as a constant inside the attribute ....)

What about accessing type/global variables?

Well -- this is a more complex matter. Imagine having the following code:

public const struct Matrix2D<const T>
{
    private T[,] _values;
    
    
    public const Matrix2D(int n, int m) =>
        _values = new T[n, m];
    
    public const void Transpose() { ... }
    public const T[] Eigenvalues() { ... }
    public const Vector<T> Eigenvectors() { ... }
    public const (Matrix2D<T> Left, Matrix2D<T> Right) DecomposeLR() { ... }
    
    public static const Matrix2D<T> GenerateTHEmatrix() { ... }
}

Let's imagine the following access patterns:

method	variable	access
`.ctor : int -> int -> Matrix2D<T>`	`_values` (instance)	write
`Transpose : void -> void`	`_values` (instance)	read, write
`Eigenvalues : void -> T[]`	`_values` (instance)	read
`Eigenvectors : void -> Vector<T>`	`_values` (instance)	read
`DecomposeLR : void -> Matrix2D<T> * Matrix2D<T>`	`_values` (instance)	read
`GenerateTHEmatrix : void -> Matrix2D<T>`	<none>	<none>

If the code of our Main-function looks as follows:

public static const void Main(string[] argv)
{
    Matrix2D<float> l, r, m, n;
    Vector<float> ev;
    float[] e;
    
    m = new Matrix2D<float>(5, 5);
    .... // fill values into m
        
    n = Matrix2D.GenerateTHEmatrix(); // some really time-intensive operation
        
    e = m.Eigenvalues();
    ev = m.Eigenvectors();
        
    m.Transpose();
    
    (l, r) = m.DecomposeLR();
}

One cannot parallelize Transpose and DecomposeLR trivially, as they have conflicting access patterns to the same type variable (in this case m._values : float[]). However, the function calls of Eigenvalues and Eigenvectors are parallelizable, as both only require a read-access on the type variable m._vlaues.

The compiler must create some kind of functional dependency graph (a bit like transactional database systems do), in order to parallelize as much code as possible:

[Ignore the absolutely mad MSPaint skillz™]
On the left-hand side one can see the serial (imperative) execution of the functions in the code example above. After building a deterministic execution tree, the compiler would use the dependency graph on the right-hand side in order to parallelize as much code as possible without causing semantic inconsistencies.

The following access conflict table should be used when checking whether two sequential accesses T1 and T2 can be parallelized when accessing the same variable x:

`T1 -> T2`	`T1` : read(x)	`T1` : write(x)
`T2` : read(x)	yes	no
`T2` : write(x)	no	no

If the dependency graph shows a cyclic dependency, all deterministic functions must be executed in sequence (traditional code execution order).

TL;DR

In general, the compiler could follow the following guidelines for deterministic functions:

no parameters, no type variable access ⟹ calculate function result at compile-time.
parameters, but all calls pass constant values ⟹ calculate function result at compile-time.
parameters, no type variable access ⟹ calls can be calculated asynchronously.
type variable access ⟹ compiler has to resort to a dependency graph etc. in order to insure correctness.

Issues / Drawbacks / Questions

This goes without saying, that such complex determinism analysis requires a lot of work. The following points are some issues, drawbacks and open questions which I thought of:

This requires a language change. The const- or [Deterministic]-marker could be issued as IL-Metadata, but I am not sure whether it is enough. IMO, const should be part of a deterministic function's signature -- however -- functions should not be overloadable by the const-marker only.
A language change is definitely required if determinism is to be allowed for Attribute constructors with non-constant (but deterministic) attributes.
Core framework changes. Many, many CoreFX functions could be transformed to be deterministic, e.g. most mathematic and LINQ functions.
How are deterministic APIs exposed? That's a very good question.
- Should they act like regular functions? That would detain the compiler from pre-compiling some deterministic functions.
- Should precompiled functions be accessible as constants? Maybe - but that does not sound like .NET-consistent compiler behavior.
What happens with reflection-invoked deterministic functions? They should act like API-exposed ones; meaning that the solution to the API-question above would be the same for reflection-invocation.

Deterministic Generics? Interfaces? Definitely!
One could introduce some constraints as follows:

public interface IVector<const T>
{
    const T ScalarZero { get; }
    const T ScalarOne { get; }
    
    const IVector<T> VectorZero { get; }
    const IVector<T> VectorOne { get; }
    
    const IVector<T> VectorAdd(IVector<T> other);
    const IVector<T> ScalarMultiply(T scalar);
}

What about deterministic λs and delegates? Maybe like this:

const delegate bool DeterministicPredicate<const T>(T value);


public const T[] LINQWhere<const T>(T[] input, DeterministicPredicate<T> predicate)
{
    List<T> result = new List<T>();
    
    foreach (T element in input ?? new T[0])
        if (predicate(element))
            result.Add(element);
    
    return element.ToArray();
}

public static const void Main(string[] argv)
{
    int[] source = { 3, 1, 5, 4, 2, -88 };
    int[] result = LINQWhere(source, const i => (i % 2) != 0);
}

Conclusion

IS THIS WORTH IT?
Yes - I definitely think that the effort is worth the drawbacks for a long-time future.

The C#8.0 feature nullable reference types already goes into the right direction. However, I think that a whole lot more potential and performance can be gained from this proposed language feature.

IMO, It would definitely require a breaking CLI change (which would incidentally be a good excuse to clean-up the CLI and its instruction set 😉).

The huge workload involved in the implementation would therefore make it a candidate for a major language version (e.g. C#-10).

0 replies

HaloFour · 2018-08-02T20:52:17Z

HaloFour
Aug 2, 2018

IMO, It would definitely require a breaking CLI change (which would incidentally be a good excuse to clean-up the CLI and its instruction set 😉).

This pretty much guarantees that it's destined to go absolutely nowhere.

Giant meta-proposals are not useful. They force all of the feedback through a single funnel making the following of any conversation completely impossible. If you wanted any serious attention to any of these individual proposals I'd suggest submitting them separately, at least for the proposals that you're not just duplicating.

0 replies

svick · 2018-08-02T21:35:56Z

svick
Aug 2, 2018
Collaborator

Apologies in advance if some of my notes that follow get too nitpicky.

The term "determinism" refers to quite a different concept in the Roslyn compiler, so I think this proposal should use a different name (like "pure functions"), to avoid confusion.

All parameters are deterministic

What does that mean? It seems to me you are conflating whether a function (e.g. F(int i)) is deterministic and whether a function invocation (e.g. F(42)) is deterministic .

Any expression, expression chain or operation 'flow' is deterministic

What are "expression chains" and "operation flows"? Isn't it enough to say that all statements need to be deterministic? (And that a statement is deterministic if all its child expressions and statements are deterministic.)

All dependencies are deterministic (in time)

I don't understand what this means. What are a function's dependencies? Is this referring to field accesses?

If a function accesses any of the following components - it is not deterministic:

Memory (physical or virtual)

Interrupt-based mechanics, such as devices (including the mouse) or timers

Stream-based mechanics, such as Networking, I/O, etc.

UI-based components, such as Windows, UI-message-queues (they often use interrupts)

P/Invoke calls (or any calls to non-deterministic code)

What does it mean to "access memory" from C#? Does reading a local variable (which could be stored on the stack, which is part of memory) count? Does e.g. "foo"[1] count?

Also, don't most of these boil down to PInvoke calls anyway?

Performance: caching or look-up

Memoization can indeed improve performance significantly. But I would like to see evidence that some form of automatic memoization at the JIT level would make sense. As far as I know, even Haskell does not use that.

Performance: parallelism

Your example would require adding some synchronization. How would the compiler figure out that doing so is worth it? Also, it's improving latency at the cost of decreasing throughput (e.g. when there are many requests and you want to process as many requests as possible, you don't want the computation to be done in parallel). How does the compiler determine that this trade-off is acceptable to the user?

In other words: automatic parallelization is a hard problem. Knowing what code can be safely executed in parallel is necessary for that, but not sufficient.

This could also be applied to string-manipulation, e.g.: string s = $"Hello {Math.PI:N3}!".ToLowerInvariant(); would always result in the following value: "hello 3.142!"

That would be a breaking change, since it currently produces "hello 3,142!" on my machine.

Elimination of (semantically) unused code

Do you have a more realistic example of where this would be actually useful?

Reduction of error sources

Do I understand it correctly that this is about producing errors at compile time for code that always throws? If not, could you expand on that?

This would e.g. enable the following expression to be compile-time constant:

No, it would not. string.Format is not deterministic, because it depends on the current culture.

The compiler must create some kind of functional dependency graph

How would the compiler do that? How would it figure out what fields does a function read or write? And if it can do that, wouldn't it also be able to figure out what functions are deterministic, so we wouldn't need the const modifier?

0 replies

iam3yal · 2018-08-03T05:45:55Z

iam3yal
Aug 3, 2018

@Unknown6656

Like @HaloFour I really think that you need to break this into multiple proposals and I'm not sure where you should start but it seems like touch multiple places, personally, I want to have constant functions and expressions.

I'll throw it out here because it's a thought I had from the previous discussion we had instead of letting the compiler do all this work I'd really think it's better to have a "constexpr engine" where you can mark any expression/function with const (or another keyword, alternatively an attribute) and then spawn Roslyn as a service (which it is) to compute the value and replace it at the callsites or at the position of the expression during compilation, it also means you can go a bit out of the pure purity area and have things that touches the system just work but really this approach doesn't even need language changes, CLI changes or CLR changes, as a prototype you can fork Roslyn, create your own ConstAttribute attribute and hook into the compiler to do all this work I don't have a lot of details but this look like it's doable but I'm sure that there are challenges waiting ahead, the question is really how much time people are willing to put into research, design and create prototype for it and whether the benefit is enough to justify it, you say it is but I really don't know, I'd like to see benchmarks.

Good luck! ;)

Rephrased.

0 replies

Unknown6656 · 2018-08-03T08:50:04Z

Unknown6656
Aug 3, 2018
Author

@svick

The term "determinism" refers to quite a different concept in the Roslyn compiler, so I think this proposal should use a different name (like "pure functions"), to avoid confusion.

Yes -- you are right, I should rename it

All parameters are deterministic

That was a bad choice of wording on my part ... I meant that the arguments can be deterministically computed (at invocation)

What are "expression chains" and "operation flows"? Isn't it enough to say that all statements need to be deterministic? (And that a statement is deterministic if all its child expressions and statements are deterministic.)

You are right, saying that all statements need to be deterministic is enough. I thought that I should elaborate this a bit, as devs sometimes tend to forget, that operators can also be functions (instead of simple CPU instructions).

I don't understand what this means. What are a function's dependencies? Is this referring to field accesses?

Yes, but not only field access. I view a function's dependencies as a set of data sources, which a function relies upon. This could be access to a variable/field/... at a point X in time. It could also be a sort of execution "contract", that some other code is executed e.g. before the time-point X. Temporal determinism would be a kind of assurance, that e.g. function A will always be executed before function B.

What does it mean to "access memory" from C#?

I meant "traditional" memory access via e.g. pointers, fixed buffers etc.. I mean, that determinism can only be assured, if no other code can modify the memory region utilised by the deterministic function.

Does e.g. "foo"[1] count?

No. Nor does array access (as long as the array is not pinned).

Also, don't most of these boil down to PInvoke calls anyway?

Do you mean the points in my list? Yes, they more or less do.

Concerning your thoughts on Performance: parallelism: Hm ... I see your points. One must maybe resort to compiler hints via Attributes. Now you mention it, the problem is AFAIK indeed NP-Hard. That could be pretty difficult for a large code base to compile....

That would be a breaking change, since it currently produces "hello 3,142!" on my machine

Hm -- interesting. I forgot that some cultures differentiate between , and . as decimal separator.

Do I understand it correctly that this is about producing errors at compile time for code that always throws?

Yes - exactly.

No, it would not. string.Format is not deterministic, because it depends on the current culture.

I unfortunately forgot that during the typing of the proposal - I never really used culture-dependent string formatting (not even when designing language packages for applications).

How would the compiler do that? How would it figure out what fields does a function read or write?

I think that this should be possible by looking at the generated syntax tree and symbol table to resolve all assignments to/reads from fields.

And if it can do that, wouldn't it also be able to figure out what functions are deterministic, so we wouldn't need the const modifier?

Yes -- you are right. It would effectively render the const modifier obsolete (which would kind of be the final goal: the compiler detects determinism automatically)

0 replies

Unknown6656 · 2018-08-03T08:55:17Z

Unknown6656
Aug 3, 2018
Author

@eyalsk @HaloFour
You are right -- I should maybe break up this proposal into multiple issues (as long as they have not already been proposed).

@eyalsk I really like your idea of Roslyn acting as a service in the background.
I would also be very interested in creating such a Roslyn fork and trying to develop a prototype. The problem, however, is that I currently do not have enough time for such a project (mainly due to university and work). If I find enough time, I will dig deeper into this!

0 replies

jnm2 · 2018-08-03T12:13:37Z

jnm2
Aug 3, 2018
Collaborator

@svick

And if it can do that, wouldn't it also be able to figure out what functions are deterministic, so we wouldn't need the const modifier?

To some extent it's important to use a modifier, especially if you want determinism to act across binary boundaries. I see it as similar to the proposed readonly keyword they're working on for methods right now (there's a better example I'm not thinking of), or the principle that the return type should not be inferred because of spooky action at a distance.

0 replies

svick · 2018-08-03T13:50:18Z

svick
Aug 3, 2018
Collaborator

@jnm2

To some extent it's important to use a modifier, especially if you want determinism to act across binary boundaries.

That's part of my question. As I understand it, this proposal would require knowing which fields are read or written by a const method, even across assembly boundaries. How does that work? Since the C# compiler inspecting IL is almost certainly not realistic, do you have some set of attributes that would expose what until now were implementation details? (Which I think would also cause that "spooky action at a distance".)

0 replies

orthoxerox · 2018-08-03T14:14:11Z

orthoxerox
Aug 3, 2018

That was a bad choice of wording on my part ... I meant that the arguments can be deterministically computed (at invocation)

If that argument is a delegate, can your function be inferred to be deterministic if that delegate is? Like, Enumerable.Select(this source, func) is deterministic if func is?

0 replies

Unknown6656 · 2018-08-05T08:47:44Z

Unknown6656
Aug 5, 2018
Author

@orthoxerox : That was my thought exactly.

0 replies

jrmoreno1 · 2018-08-12T14:34:22Z

jrmoreno1
Aug 12, 2018

@Unknown6656: @eyalsk idea seems a lot better than spending time on ways to determine if a function is deterministic. Naive approach, create a flag for calculated consts, use that to compile 2 versions of the code one without and one with, use the one without to supply values to the one with. Doubles the work that needs to be done, but everything that can be calculated and used could be used, and the only restriction is that all of the direct function calls be static

0 replies

theunrepentantgeek · 2019-02-23T22:26:15Z

theunrepentantgeek
Feb 23, 2019

If you want to completely optimize away any (every?) pure/deterministic function that has parameters known at compile time, the compiler is going to need to execute the code to generate the result.

Given that the Roslyn compiler is extensively used within Visual Studio (for Intellisense, Unit Test discovery, Code Analyzers, other things), this raises a scary scenario:

I clone a git repo of something, so I can review the code.
Unknown to me, the repo has been compromised by a bad actor.
I load the code into Visual Studio for review.
Visual Studio uses Roslyn to evaluate IntelliSense, the malicious code exploits a bug and gets to run on my system.
Bad code runs on my machine - it's not my machine anymore.

Notes ...

... I didn't compile and run the code, it happened automatically in the background.
... the very tools that I'd be using to review the code to see if it was safe are the exact tools that could be compromised to exploit my system.
... there are always bugs, no matter how diligent or disciplined the team involved. From a security perspective, assume breach and evaluate how much damage can be done.

Aggressive inlining of pure/deterministic functions might be worth it - but would clearly need to be approached in a very conservative, diligent and security conscious fashion.

0 replies

YairHalberstadt · 2019-02-23T22:29:39Z

YairHalberstadt
Feb 23, 2019
Collaborator

@theunrepentantgeek
TBH if you download a repo with compromised code you are almost certainly stuffed anyway. They could easily hide something in the code which you wont notice, and does something malicious when executed.

Also I imagine it would be doable to only evaluate constant expressions when actually compiling, and/or to evaluate them in a sandbox. After all, if they are truly constant, they shouldn't require anything outside the sandbox.

0 replies

CyrusNajmabadi · 2019-02-23T22:36:45Z

CyrusNajmabadi
Feb 23, 2019
Collaborator

I have that opinion that a compiler should aggressively optimize anything that is deterministic, however this will be an issue when one wants to invoke a function which would otherwise be optimized away, e.g.:

i would never expect the actual functions to be optimized away. I would simply expect the invocations of those functions to be potentially optimized away.

0 replies

CyrusNajmabadi · 2019-02-23T22:37:30Z

CyrusNajmabadi
Feb 23, 2019
Collaborator

Also I imagine it would be doable to only evaluate constant expressions when actually compiling, and/or to evaluate them in a sandbox. After all, if they are truly constant, they shouldn't require anything outside the sandbox.

If you're willing to wait till compilation time, then i go back to my original point: i don't see why this is part of the C# language. Just add a compile step that does this through IL optimization.

0 replies

YairHalberstadt · 2019-02-23T22:39:16Z

YairHalberstadt
Feb 23, 2019
Collaborator

@CyrusNajmabadi
I don't necessarily disagree. I'm just addressing @theunrepentantgeeks security concerns.

0 replies

jnm2 · 2019-02-24T01:16:25Z

jnm2
Feb 24, 2019
Collaborator

Arbitrary user code in the form of MSBuild scripts executes when you load a csproj. (I've done this.) It can execute any code that a standalone EXE can. The same is true of each NuGet package you reference when you do a package restore.

0 replies

jnm2 · 2019-02-24T01:20:00Z

jnm2
Feb 24, 2019
Collaborator

My advice is to be antifragile to malware. This means offline backups, only allowing your machine to access data that you don't mind sharing with the world, and a streamlined routine so that doing clean installations of Windows doesn't hurt.

0 replies

Unknown6656 · 2019-02-24T07:58:54Z

Unknown6656
Feb 24, 2019
Author

@theunrepentantgeek
Asides from "being careful" I would agree that the compiler shall execute the code in order to optimize it, if (and only if) is has been marked as 'pure' (by the user or the compiler itself).

Think about it, what could it execute? Obviously it would not touch any machine-specific code, nor any code concerning streams, I/O, shared memory, networking, user input or hardware. These functions/APIs are not pure in a deterministic sense.
The compiler would only execute code concerning constant manipulation (numeric or string-wise), stuff like known collection operations etc.

I do not see any problem as any "harmful" code would obviously require to be harmful outside of the program itself in order to harm the machine (or any in/outgoing communication). As these are per definition non-deterministic, the compiler would not even touch them.

(I'm using the 'compiler' here as step, but it could technically be in the linker or an other pipeline step)

@CyrusNajmabadi: Why should we not optimize away complete functions? Would it make a semantic difference on non-public functions (or fields for that matter)?
Obviously this would break reflection but I do not see any issues aside from that..... (well except that it would be a breaking change from the specification's point of view.)

0 replies

CyrusNajmabadi · 2019-02-24T09:57:58Z

CyrusNajmabadi
Feb 24, 2019
Collaborator

@CyrusNajmabadi: Why should we not optimize away complete functions?

What is a 'complete function'?

Obviously this would break reflection

That's a pretty big deal. Why would you break reflection?

well except that it would be a breaking change from the specification's point of view

That also seems like a pretty big deal.

0 replies

Unknown6656 · 2019-02-25T09:27:36Z

Unknown6656
Feb 25, 2019
Author

The point I would like to make is:
If we want to optimize the program as much as possible why leave functions which could be eliminated and do not have any public semantics?
One could alternatively try to inline the calls and leave the original functions (more or less) unchanged in the assembly so as to not break reflection.... but should we?

I do understand, that it is a huge deal, however I think that we should reflect upon about how we could reduce a .NET application's footprint and resource load.

EDIT: Do pardon me, on second thought it is a bad idea to introduce such a huge breaking change.
However, I still think that the .NET platform has so much more performance-wise potential which could be used.

0 replies

deinok · 2019-02-25T17:19:41Z

deinok
Feb 25, 2019

@Unknown6656 I think that you are right, but this kind of optimization will be problematic, I mean not imposible without breaking-changes, but really dificult. I think that for now we should try to push the pure/determinism/constexp part of the proposal.

PD: Of course its good to have in consideration future work related to this proposal. This probably will help upstreaming this proposal

0 replies

CyrusNajmabadi · 2019-02-25T18:16:07Z

CyrusNajmabadi
Feb 25, 2019
Collaborator

If we want to optimize the program as much as possible why leave functions which could be eliminated and do not have any public semantics?

What do you mean by 'public' semantics? reflection, for example, is a public API.

Note: if you want something that pulls out unused methods, then just use a tree-shaking tool. Note that none of this is really related to the language per-se. It seems to be about optimizing the end compiled IL. THis all feels like the domain of IL optimizers.

0 replies

Unknown6656 · 2019-02-26T11:36:03Z

Unknown6656
Feb 26, 2019
Author

It seems to be about optimizing the end compiled IL. THis all feels like the domain of IL optimizers.

Yes it is mostly on an IL level and not language level, however, I thought some language annotation might be needed in my original proposal.

What do you mean by 'public' semantics?

Private methods not having any visible effect on the program's input and output

reflection, for example, is a public API.

That is the reason why it is such a breaking change and why I "retract" my previous standpoint.

I think that you are right, but this kind of optimization will be problematic, I mean not imposible without breaking-changes, but really dificult. I think that for now we should try to push the pure/determinism/constexp part of the proposal.

I agree that this will be difficult. I would like to help on that matter but I do not know where would be the best point to start...

0 replies

deinok · 2019-02-27T18:37:13Z

deinok
Feb 27, 2019

I think that the first step is view the benefits and harms of IL Level implementation vs Language Level implementation. But probably @gafter can give us more information of what is required and his personal opinion on this.

0 replies

TahirAhmadov · 2020-12-01T16:33:01Z

TahirAhmadov
Dec 1, 2020

I would comment that this proposal is huge. Can we start with just deterministic functions? That will allow a lot of optimizations, and improved constant declarations, without major (if any) changes to the CLR.

5 replies

Unknown6656 Dec 3, 2020
Author

Yeah. That proposal is indeed huge.

Could you clarify what "just deterministic functions" would mean for you?
Would you like to see an attribute/... implemented, which marks a function as 'deterministic'? This could aid the JIT.
Or do you already want some deterministic optimizations by the compiler?

How would you like to see the definition of deterministic functions?
I defined them as

A function is deterministic, iff:
1. All parameters are deterministic
2. All inner function calls are deterministic
3. Any expression, expression chain or operation 'flow' is deterministic
4. All dependencies are deterministic (in time)

This would imply that deterministic functions do not access any shared variable (this includes this, instance variables, static variables, etc.), and that all function calls inside a deterministic function must be deterministic themselves.

Would such a compile-time analysis (and maybe enforcement) be an adequate "first step"?
The next steps would probably be JIT optimizations based on determinism. The last step would be compile-time optimizations.

TahirAhmadov Dec 3, 2020

Could you clarify what "just deterministic functions" would mean for you?
Would you like to see an attribute/... implemented, which marks a function as 'deterministic'? This could aid the JIT.
Or do you already want some deterministic optimizations by the compiler?

In short, all of the above. I'd say the const keyword should be added, which can be compiled down to a hidden attribute. The attribute allows C# compiler to recognize existing compiled methods as const, as well as allows optimizer to do caching/inlining/etc. as necessary. At compile time, const methods used in const field/variable declarations can be evaluated, negating the need for runtime warm up of initializing static readonly fields.

How would you like to see the definition of deterministic functions?

Your definition is spot on, I think.

This would imply that deterministic functions do not access any shared variable (this includes this, instance variables, static variables, etc.), and that all function calls inside a deterministic function must be deterministic themselves.

Yes. In the future, if and when const types are added, this can be revisited.

Would such a compile-time analysis (and maybe enforcement) be an adequate "first step"?
The next steps would probably be JIT optimizations based on determinism. The last step would be compile-time optimizations.

I'd say, since the language syntax needs to be set in stone, we can further break the "deterministic functions" story into few smaller sub-stories, which can be implemented either together or in a sequence over multiple C#/.NET versions by various teams:

const keyword on methods; enforcement of determinism. (C#)
a. If we are going to do the compile time optimizations, such as not calling into an endless recursion, these should be done together with Async Streams #1, otherwise they become breaking changes.
Allow calling const functions in const variable/field declarations (depends on 1). (C#) (I'd say do this together with Async Streams #1 but it can be done separately)
Improve optimizer(s) to recognize const functions - caching/inlining (depends on 1). (JIT)
Decorate appropriate methods in existing BCL with const (depends on 1). (.NET)

Unknown6656 Dec 3, 2020
Author

I concur with all of your points.
It'd be interesting to know what the LDT thinks about this, e.g. whether determinism has ever been discussed from a language design point of view during the LDT meetings.

IIRC, my original proposal mentions readonly classes/structs, as well as readonly method variables. While the former has been implemented in the meantime, the latter has not yet been incorporated into the C# language. I expect that the LDT would want to implement readonly method variables (#188) before turning their attention to const functions.

HaloFour Dec 3, 2020

Lots of conversations on the subject: #2379

Unknown6656 Dec 3, 2020
Author

And in dotnet/roslyn#15079 as well

[C#-X.0 Proposal] Determinism #1763

EDIT: Jump to this comment -> #1763 (comment) <- to see the full proposal

Purity / Determinism / Constant Functions / Constant Expressions

Replies: 49 comments · 5 replies

Unknown6656 Aug 2, 2018 Author

Unknown6656 Aug 2, 2018 Author

Unknown6656 Aug 2, 2018 Author

Original proposal:

Purity / Determinism / Constant Functions / Constant Expressions

Introduction

What does 'Determinism' mean?

How can we differentiate between deterministic and non-deterministic functions?

Why do we need this?

1.) Performance: caching or look-up

2.) Performance: parallelism

3.) Performance: pre-compilation

4.) Elimination of (semantically) unused code

5.) Reduction of error sources

6.) More compile-time constants!!

OK, I get it. But what would it look like?

Examples:

Deterministic functions

Deterministic types

Deterministic operators and properties

Deterministic flow control

Constant expressions

Deterministic Attributes (?)

What about accessing type/global variables?

TL;DR

Issues / Drawbacks / Questions

Conclusion

svick Aug 2, 2018 Collaborator

Unknown6656 Aug 3, 2018 Author

Unknown6656 Aug 3, 2018 Author

jnm2 Aug 3, 2018 Collaborator

svick Aug 3, 2018 Collaborator

Unknown6656 Aug 5, 2018 Author

YairHalberstadt Feb 23, 2019 Collaborator

CyrusNajmabadi Feb 23, 2019 Collaborator

CyrusNajmabadi Feb 23, 2019 Collaborator

YairHalberstadt Feb 23, 2019 Collaborator

jnm2 Feb 24, 2019 Collaborator

jnm2 Feb 24, 2019 Collaborator

Unknown6656 Feb 24, 2019 Author

CyrusNajmabadi Feb 24, 2019 Collaborator

Unknown6656 Feb 25, 2019 Author

CyrusNajmabadi Feb 25, 2019 Collaborator

Unknown6656 Feb 26, 2019 Author

Unknown6656 Dec 3, 2020 Author

Unknown6656 Dec 3, 2020 Author

Unknown6656 Dec 3, 2020 Author

Replies: 49 comments 5 replies

Unknown6656
Aug 2, 2018
Author

Unknown6656
Aug 2, 2018
Author

Unknown6656
Aug 2, 2018
Author

svick
Aug 2, 2018
Collaborator

Unknown6656
Aug 3, 2018
Author

Unknown6656
Aug 3, 2018
Author

jnm2
Aug 3, 2018
Collaborator

svick
Aug 3, 2018
Collaborator

Unknown6656
Aug 5, 2018
Author

YairHalberstadt
Feb 23, 2019
Collaborator

CyrusNajmabadi
Feb 23, 2019
Collaborator

CyrusNajmabadi
Feb 23, 2019
Collaborator

YairHalberstadt
Feb 23, 2019
Collaborator

jnm2
Feb 24, 2019
Collaborator

jnm2
Feb 24, 2019
Collaborator

Unknown6656
Feb 24, 2019
Author

CyrusNajmabadi
Feb 24, 2019
Collaborator

Unknown6656
Feb 25, 2019
Author

CyrusNajmabadi
Feb 25, 2019
Collaborator

Unknown6656
Feb 26, 2019
Author

Unknown6656 Dec 3, 2020
Author

Unknown6656 Dec 3, 2020
Author

Unknown6656 Dec 3, 2020
Author