Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

null-forgiving operator *can have* effect at run time #17988

Closed
springy76 opened this issue Apr 21, 2020 · 32 comments · Fixed by #18124
Closed

null-forgiving operator *can have* effect at run time #17988

springy76 opened this issue Apr 21, 2020 · 32 comments · Fixed by #18124
Assignees
Labels
dotnet-csharp/svc product-question Product usage related questions [org][type][category]

Comments

@springy76
Copy link

Quote from the current doc:

The null-forgiving operator has no effect at run time. It only affects the compiler's static flow analysis by changing the null state of the expression. At run time, expression x! evaluates to the result of the underlying expression x.

The document currently focuses primarily on the usage of suppressing warnings of code flow analysis.

But when combining the null-forgiving operator with a leading null-conditional member-access operator the compiler will indeed emit different code and this will have effect at run time -- which comes unexpected.

Example:

void Main()
{
	var names = T1(); // `names == null` now
	var names2 = T2(); // throws
}

IEnumerable<string> T1()
{
	FileInfo? fi = null;
	return fi?.Directory?.EnumerateFileSystemInfos().Select(fsi => fsi.Name);
}

IEnumerable<string> T2()
{
	FileInfo? fi = null;
	return fi?.Directory?.EnumerateFileSystemInfos()!.Select(fsi => fsi.Name);
}

This happens because in T2() the code actually compiles to return (fi?.Directory?.EnumerateFileSystemInfos()).Select(fsi => fsi.Name); (notice the additional parentheses).

I'm sure this is by design and is covered by the sentence "expression x! evaluates to the result of the underlying expression x". But since any article or documentation I have read about the null-forgiving operator so far, focused only about getting rid of warnings, this (change in) behavior was very unexpected, at least to me.


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

@pkulikov
Copy link
Contributor

@springy76 thanks!

I've simplified your example a bit:

using System;
using System.Collections.Generic;
using System.Linq;
#nullable enable

class Program
{
    private static void Main()
    {
        try
        {
            T1(null);
        }
        catch (Exception)
        {
            Console.WriteLine("T1 throws");
        }
        try
        {
            T2(null);
        }
        catch (Exception)
        {
            Console.WriteLine("T2 throws");
        }
    }

    private static IEnumerable<int> T1(int[]? numbers)
    {
        return numbers?.ToArray().ToArray();
    }

    private static IEnumerable<int> T2(int[]? numbers)
    {
        return numbers?.ToArray()!.ToArray();
    }
}

The above program produces the following output:

T2 throws

That program in sharplab

Note that T1 is converted into the following code:

    private static IEnumerable<int> T1([System.Runtime.CompilerServices.Nullable(2)] int[] numbers)
    {
        if (numbers == null)
        {
            return null;
        }
        return Enumerable.ToArray(Enumerable.ToArray(numbers));
    }

while T2:

    private static IEnumerable<int> T2([System.Runtime.CompilerServices.Nullable(2)] int[] numbers)
    {
        return Enumerable.ToArray((numbers != null) ? Enumerable.ToArray(numbers) : null);
    }

@BillWagner is that expected behavior?

@IEvangelist IEvangelist self-assigned this Apr 21, 2020
@IEvangelist IEvangelist added product-question Product usage related questions [org][type][category] and removed ⌚ Not Triaged Not triaged labels Apr 21, 2020
@IEvangelist IEvangelist added this to the April 2020 milestone Apr 21, 2020
@IEvangelist
Copy link
Member

Let's ask @jcouv for some thoughts on this, thank you @pkulikov and @springy76.

@IEvangelist
Copy link
Member

IEvangelist commented Apr 21, 2020

Yeah, I see it as:

private static IEnumerable<int> T1(int[]? numbers)
{
    return numbers?.ToArray().ToArray();
}

private static IEnumerable<int> T2(int[]? numbers)
{
    return (numbers?.ToArray()).ToArray();
}

It is a really odd use case, it's literally like expressing that something could be null, and then something else chained off of that could be null and then saying that the last thing is definitely not null. It could be null, it could be null, it's not null.

@BillWagner
Copy link
Member

BillWagner commented Apr 21, 2020

I can't resist a good puzzle. I think the behavior is correct, and is covered in operator precedence and in the nullable reference type spec on the null forgiving operator, and finally in the C# 6.0 spec on the null conditional operator:

  • the null conditional operator is a unary operator, which has lower precedence than primary expressions.
  • the null forgiving operator is a primary expression.

I think that means the left operand of ! must be considered as non-null, so execute the ToArray() method regardless of the earlier left operand(s) of any ?. operator.

What's equally interesting is that the statement "The null forgiving operator has no runtime effect" is still strictly true (and was taken directly from the feature spec referenced above). It isn't very helpful, though. The code generated is based on the precedence of the operators, and the fact that they are left-associative.

I'm still not sure how best to fix this in the docs, but you've found a super interesting edge case @springy76 We'll keep noodling.

@pkulikov
Copy link
Contributor

Precedence. Does that mean that numbers?.ToArray()!.ToArray() is interpreted as (numbers?.ToArray()!).ToArray() because of precedence?

@BillWagner
Copy link
Member

BillWagner commented Apr 21, 2020

Sort of. It's both precedence and associativity. The ! operator is a post-fix unary operator. Its operand is whatever is left of it. Because ToArray() needs a receiver, its operand becomes (numbers?.ToArray()). Because ! has higher precedence than . (primary vs. unary), the expression numbers?.ToArray()! is considered non-null and then becomes the receiver to the final ToArray.

It took my a few reads of the sections I cited above, and I'm still not completely certain I'm analyzing it correctly. (I first thought it might be a compiler bug. Now I don't think so.) I'd like to wait for @jcouv to weigh in as well before we suggest a fix.

@pkulikov
Copy link
Contributor

ok, then we need to write such code (it compiles!) :)

numbers?.ToArray()!?.ToArray();

@jcouv
Copy link
Member

jcouv commented Apr 21, 2020

Thanks @BillWagner for the analysis (which I think is correct).

@gafter, Would this warrant changing the precedence of post-fix ! (suppression)?
The precedence of ! causes x?.ToArray()!.ToArray() to parse with an implicit grouping (x?.ToArray()!).ToArray(), which affects semantics...

@agocke
Copy link
Member

agocke commented Apr 21, 2020

I don't see what other precedence it could be...

@pkulikov
Copy link
Contributor

pkulikov commented Apr 22, 2020

@BillWagner thank you for the analysis; it helped. However, it's hard to think in terms of precedence because the spec on that doesn't mention null-conditional operators. And it's possible to think that the precedence of ?. is the same as precedence of ., which is primary. So, I'll try to make my own analysis to understand this issue better.

Given (I assume the following two statements are true):

  • x! is a primary expression
  • the C# 6 draft spec for the null-conditional operators is still true for C# 8.0. In particular, the following grammar is valid:
null_conditional_expression
    : primary_expression null_conditional_operations
    ;

null_conditional_operations
    : null_conditional_operations? '?' '.' identifier type_argument_list?
    | null_conditional_operations? '?' '[' argument_list ']'
    | null_conditional_operations '.' identifier type_argument_list?
    | null_conditional_operations '[' argument_list ']'
    | null_conditional_operations '(' argument_list? ')'
    ;

numbers?.ToArray().ToArray() is a null-conditional expression with numbers as primary_expression and ?.ToArray().ToArray() as null_conditional_operations

We can't do the similar split for numbers?.ToArray()!.ToArray() because ! cannot be included into null_conditional_operations (it's not identifier, I guess). That means that numbers?.ToArray()!.ToArray() is NOT a null-conditional expression. According to the grammar, only numbers?.ToArray() is a null-conditional expression, let's mark it A. Then, we get A!.ToArray(), which is equivalent to A.ToArray(), which is equivalent to (numbers?.ToArray()).ToArray()

Furthermore, numbers?.ToArray()!?.ToArray() is a null-conditional expression with numbers?.ToArray()! as primary_expression and ?.ToArray() as null_conditional_operations.

So, if this issue to be fixed, the definition of a null-conditional expression should be updated, not only the precedence of null-forgiving (to unary??).

@BillWagner @jcouv is that correct analysis?

@springy76
Copy link
Author

springy76 commented Apr 22, 2020

Let me go into detail how I (we) spotted this problem yesterday, because it's exactly the other way round than the example I gave in the original post:

It began with a collegue who claimed that inserting a "!" somewhere fixed a NRE. I protested and bet a staple of crates of beer that the null-forgiving operator won't fix any code but only give the code analyzer a hint: "I know it better, just be calm and stop warning me".
But he proved me wrong.

In our case it was exactly the other way round (code without "!" threw NRE) because the LINQ extension method in our code (I used Enumerable.Select(fsi => fsi.Name) in the sample above) was named EmptyIfNull and as the name implies is designed to handle null for input (and will return Array.Empty<T>() in this case):

  var data = mightBeNullObject?.ArrayTypeProperty.EmptyIfNull().Select(x => x.Prop);
  hashset.UnionWith(data); // ArgumentNullException when mightBeNullObject is null (data is then null, too)

So in our case the "?." stripped away the call to .EmptyIfNull<T>() and the resulting IEnumerable variable was still null and with the additional "!" the call to .EmptyIfNull<T>() was forced just the same way as putting the left side of the expression into parenthesis:

  var data = mightBeNullObject?.ArrayTypeProperty!.EmptyIfNull().Select(x => x.Prop);
  hashset.UnionWith(data); // data is never null now

equals

  var data = (mightBeNullObject?.ArrayTypeProperty).EmptyIfNull().Select(x => x.Prop);
  hashset.UnionWith(data); // data is never null now

@AntonLapounov
Copy link
Member

It has no effect at run time; however, it may affect how the expression is parsed at compile time:

int[] a = null;
string s1 = a?.First().ToString();  // null: ToString() is not called
string s2 = a?.First()!.ToString(); // Empty string: ToString() is called on default(int?)

@springy76
Copy link
Author

springy76 commented Apr 23, 2020

Yes, if you decompile it (using ILSpy for example) you'll see that the compiler emitted different code (by adding parenthesis or maybe an intermediary variable): string s2 = ( a?.First() ).ToString();

I just did not expect the compiler to behave different in any way because I only expected reducing of false-positive-warnings.

As already stated in the original post: I don't think the compiler is doing something wrong, it just came unexpected due to the lack of explanation or any mentioning in all the samples.

@pkulikov
Copy link
Contributor

@BillWagner @IEvangelist @springy76 what about adding the following note to the article about the null-forgiving operator:

Note

The compiler interprets expression x?.y!.z as (x?.y)!.z. Because of that, at run time x?.y!.z is evaluated as (x?.y).z.

@IEvangelist
Copy link
Member

IEvangelist commented Apr 23, 2020

@BillWagner @IEvangelist @springy76 what about adding the following note to the article about the null-forgiving operator:

[!NOTE]
The compiler interprets expression x?.y!.z as (x?.y)!.z. Because of that, at run time x?.y!.z is evaluated as (x?.y).z.

I think that note would be appropriate, I'm curious what @BillWagner would think? I'd probably reword it a little (let me know your thoughts):

Note

The compiler interprets the x?.y!.z expression as (x?.y)!.z. Due to this interpretation and the operator precedence, at runtime x?.y!.z is evaluated as (x?.y).z.

@pkulikov
Copy link
Contributor

@IEvangelist I would omit the words "the operator precedence": the parentheses in the first sentence are enough. Also to me, "operator precedence" here is more confusing than explaining (however, I don't know how it is to the remaining audience).

By the way, "runtime" is the noun that is in "Common Language Runtime", while "at run time/at compile time".

@AntonLapounov
Copy link
Member

AntonLapounov commented Apr 23, 2020

I first thought it might be a compiler bug. Now I don't think so.

Have we clarified with the compiler team that it is indeed the intended behavior here? If not, I would create a Roslyn issue asking for clarification. The present behavior is counterintuitive. We might consider changing the grammar to allow null-forgiving operators in null_conditional_operations.

@pkulikov
Copy link
Contributor

@BillWagner @IEvangelist I've thought more about it and can explain why mentioning precedence confuses me (and can confuse others).

We say that a + b * c is evaluated as a + (b * c) because * has a higher precedence. So, a?.b!.c is evaluated as (a?.b)!.c. That looks as if ?. has a higher precedence because it's evaluated still first.

However, I don't think we need to mention precedence here at all. Precedence, for a given operand, decides to which operator the operand binds: to the one with a higher precedence. For example, in a + b * c, b binds to * because * has higher precedence than +.

That said, in expression a?.b.c.d!.e, all operators are unary postfix. Each operand binds to an operator on its right, there is no choice (thus, there is no need in precedence). Thus, all operators are evaluated from left to right in order. So, the compiler tries to evaluate the first ?.. To do that, I guess, it needs to form a null-conditional expression. A null-conditional expression can contain only (null-conditional) member and index access operations. The ! operator breaks that sequence of member and index accesses. That's why a?.b.c.d!.e is interpreted as (a?.b.c.d)!.e.

There are two solutions: (1) to allow ! in a null-conditional expression, (2) update the docs

As for the docs, currently they say:

The null-forgiving operator has no effect at run time. It only affects the compiler's static flow analysis by changing the null state of the expression. At run time, expression x! evaluates to the result of the underlying expression x.

Maybe we should say something like that:

The null-forgiving operator has no effect at run time. Expression x! evaluates to the result of the underlying expression x. However, at compile time, the null-forgiving operator affects the compiler's static flow analysis by changing the null state of the expression. In the case of a null-conditional expression, that changes the way how it's evaluated when you add the ! operator to it. The compiler interprets a?.b!.c as (a?.b)!.c, which evaluates to (a?.b).c at run time.

@AntonLapounov
Copy link
Member

That said, in expression a?.b.c.d!.e, all operators are unary postfix.

@pkulikov I am not sure how you came to this conclusion. Expressions a?. and a. have no meaning by themselves, because those operators require both left- and right- hand sides. For instance, the C++ reference describes the first and the second operands of the a.b member access operator. If you do not wish to use the word operand for syntax constructions, we might agree to call them lhs and rhs. The operators would still be binary though.

Having said that, I do not think it is strictly necessary to distinguish priorities of those special operators (implied by the grammar) in the table in question. Its main idea is that primary expressions are evaluated first (they have the highest precedence), then all other 'normal' operators are evaluated according to their priorities. For that purpose, it seems OK to extend primary expressions with null-conditional operators even if the outdated formal spec is worded slightly differently.

Maybe we should say something like that:

The null-forgiving operator has no effect at run time. Expression x! evaluates to the result of the underlying expression x. However, at compile time, the null-forgiving operator affects the compiler's static flow analysis by changing the null state of the expression. In the case of a null-conditional expression, that changes the way how it's evaluated when you add the ! operator to it. The compiler interprets a?.b!.c as (a?.b)!.c, which evaluates to (a?.b).c at run time.

The words "that changes" imply it is the same effect; however, those are two separate effects:

  • The ! operator may change the null state of the expression. That may suppress a warning.
  • The ! operator may change how the primary expression is parsed and, therefore, evaluated.

@pkulikov
Copy link
Contributor

I am not sure how you came to this conclusion.

@AntonLapounov this is how I understand the language spec. According to it, a null-conditional expression and a member access expression (as it's a primary expression) are unary expressions. That means they have one operand. Indeed, they require right-hand part, but it's not operand (it can be a member identifier or a list of member and indexer access operations). For example, the spec on null-conditional says:

The null-conditional operator applies a list of operations to its operand...

So, there is a distinction between "operand" and "rhs". This distinction is important, because precedence concerns operands.

@pkulikov
Copy link
Contributor

The words "that changes" imply it is the same effect; however, those are two separate effects:

@AntonLapounov thanks for that remark. I agree my suggestion can be improved like that:

At compile time, the null-forgiving operator affects the compiler's static flow analysis by changing the null state of the expression. At run time, expression x! evaluates to the result of the underlying expression x. Typically, the introduction of the ! operator doesn't affect run time evaluation. However, if you introduce the ! operator in a null-conditional expression, you might observe the change in behavior at run time. That happens because the compiler interprets a?.b!.c as (a?.b)!.c, which evaluates to (a?.b).c at run time.

@springy76 @IEvangelist @BillWagner what do you think?

@IEvangelist
Copy link
Member

IEvangelist commented Apr 24, 2020

@pkulikov I agree with your analysis, and I like the updated text you're proposing. The last sentence was tripping me up a bit (this is a minor).

That happens because the compiler interprets a?.b!.c as (a?.b)!.c, which evaluates to (a?.b).c at run time.

Does this read a bit better written this way?

The reason for this, is that the compiler interprets a?.b!.c as (a?.b)!.c, which evaluates to (a?.b).c at run time.

I am thoroughly enjoying this thread, being new to this team - I'm learning a lot from it. Thank you

@pkulikov
Copy link
Contributor

@IEvangelist I would make the intro even shorter (is that grammatically correct?):

That is because the compiler...

@springy76
Copy link
Author

springy76 commented Apr 24, 2020

BTW: Not to oversee what happens on nullable structs while talking about nullable reference types:

static System.TimeSpan T1(DateTime? dt) 
    => dt?.TimeOfDay!.GetValueOrDefault(Timeout.InfiniteTimeSpan);

Without ! the member GetValueOrDefault of the intermediary Nullable<T> struct would not be valid at this place at all since TimeOfDay itself is just a struct (TimeSpan).

@IEvangelist
Copy link
Member

@IEvangelist I would make the intro even shorter (is that grammatically correct?):

That is because the compiler...

I'm good with that, I think the "That happens because" was throwing me off...

@AntonLapounov
Copy link
Member

this is how I understand the language spec. According to it, a null-conditional expression and a member access expression (as it's a primary expression) are unary expressions. That means they have one operand.

That is not correct interpretation of the spec. While every primary_expression matches the unary_expression production, that does not mean that every operator within a primary_expression has to be a unary operator. Consider, for example, this unary_expression: a[1+2].b. It contains three operators:

  • Binary arithmetic operator + taking arguments 1 and 2.
  • Binary indexing operator [] taking arguments a and 3 (the result of 1+2).
  • Binary access operator . taking arguments a[3] and b.

While you might claim that b is not a 'real' operand of the last operator and name it rhs instead, it would be difficult to deny that 3 is a real operand of a[3], which is certainly a unary_expression.

For example, the spec on null-conditional says:

The null-conditional operator applies a list of operations to its operand...

And that list of operations works as the second argument/operand of the operator. That is commonly used terminology. Look, for example, at Safe navigation operator Wiki page:

In object-oriented programming, the safe navigation operator (also known as optional chaining operator, safe call operator, null-conditional operator) is a binary operator that returns null if its first argument is null; otherwise it performs a dereferencing operation as specified by the second argument (typically an object member access or an array index).

So, there is a distinction between "operand" and "rhs". This distinction is important, because precedence concerns operands.

Operator precedence rules dictate how an expression is interpreted by the compiler and translated into the expression tree. That includes determining arguments of each operator. It does not matter whether you call them operands or lhs/rhs — you still have to determine their textual boundaries. If some part of the grammar is more complicated and cannot be expressed as an operator-precedence grammar, that is fine, but then you should avoid assigning precedence to operators in that part of the grammar. That is what you did in #18001 and I came here to explain why it was wrong:

  • moved x?.y and x?[y] (null-conditionals) from the primary precedence row to the unary precedence row (putting it to the primary expressions group was not correct)

@BillWagner BillWagner removed this from the April 2020 milestone Apr 27, 2020
@BillWagner
Copy link
Member

Thanks for all the discussion. I'll make the following proposal:

In light of dotnet/csharplang#3393, dotnet/csharplang#3297 and dotnet/roslyn#43659 which open possible changes o the spec and implementation, I'd like to re-focus this on explaining the current behavior. (Thanks @AntonLapounov @agocke and @jcouv for driving those discussions).

In the short term, I think the best solution is Option 1 in https://github.com/dotnet/docs/pull/18037/files#r414319212 as @AntonLapounov says. In addition, we should add a note as previously discussed. I suggest the following edit to the previous proposal:

Note

In C# 8, The expression x?.y!.z is parsed as (x?.y)!.z. Due to this interpretation (x?.y)! is considered not null. Therefore, if either x or y is null, the expression throws a NullReferenceException.

I specifically mention C# 8 to take into account change that might come from dotnet/csharplang#3393.

In the longer term, as the standardization effort catches up, we should remove this table from the language reference section, and refer to the corresponding table in the spec.

Can we give this is thumbs up or down as a proposed solution?

@AntonLapounov
Copy link
Member

Therefore, if either x or y is null, the expression throws a NullReferenceException.

Nitpicking, it may throw, but not necessary if z is a method call:

int[] a = null;
// s2 is the empty string: ToString() is called on default(int?)
string s2 = a?.First()!.ToString();

What's about the following?

In C# 8 the expression x?.y!.z is parsed as (x?.y)!.z. Due to this interpretation z is evaluated even if x is null, which may result in a NullReferenceException.

@springy76
Copy link
Author

BTW: What is inside LINQ expressions? ?. isn't allowed there but ! is allowed -- but seems to have no impact on the expression.

BillWagner added a commit to BillWagner/docs that referenced this issue Apr 28, 2020
Fixes dotnet#17988.  See dotnet#17988 (comment)

The null conditional operators and null forgiving operators interact in interesting ways. Add notes on both pages to explain those interactions.
@BillWagner
Copy link
Member

I opened #18124 to address this. Thanks for all the discussion.

@springy76 Can you open a new issue for the question on LINQ expressions? I'm not sure what you mean, and I'd like to start a new discussion for that concern.

@springy76
Copy link
Author

@BillWagner

I just wanted to note that

Expression<Func<DirectoryInfo, string>> expr = di => di.Parent?.Name;

does not compile at all (CS8072 An expression tree lambda may not contain a null propagating operator) but

Expression<Func<DirectoryInfo, string>> expr = di => di.Parent!.Name;

is no problem, seems to be ignored by (or transparent to) the expression code builder.

@BillWagner
Copy link
Member

Thanks @springy76 That's a different issue. (And we don't document it well.)

There are a number of new syntax elements that aren't representable in LINQ expression trees. I'll create an issue and we'll address that as well.

BillWagner added a commit that referenced this issue Apr 29, 2020
* Add note about interactions in operators

Fixes #17988.  See #17988 (comment)

The null conditional operators and null forgiving operators interact in interesting ways. Add notes on both pages to explain those interactions.

* respond to feedback

* typos

* rework note based on feedback.

* one final set of changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dotnet-csharp/svc product-question Product usage related questions [org][type][category]
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants