Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For-loop should use larger type to avoid overflow exception. #180

Closed
ericmutta opened this issue Oct 5, 2017 · 20 comments
Closed

For-loop should use larger type to avoid overflow exception. #180

ericmutta opened this issue Oct 5, 2017 · 20 comments

Comments

@ericmutta
Copy link

ericmutta commented Oct 5, 2017

Try the following code:

For n As Byte = Byte.MinValue To Byte.MaxValue
  UseByteValue(n)
Next

The code looks perfectly logical: n is a byte variable and it is being iterated through all valid values for its type...but a System.OverflowException will be thrown for reasons that are not obvious unless you know the underlying implementation of the For loop.

Currently, to get around this leakage of implementation details, I have to do this ugly hack (I am implementing crypto/hash algorithms and the tests require iterating through all byte values pretty often, so this hurts):

For n1 As Integer = Byte.MinValue To Byte.MaxValue
   Dim n2 = CByte(n1)
   UseByteValue(n2)
Next

I would love for the compiler to do that automatically (i.e use an Integer typed variable underneath the covers when iterating the full range of Byte and Short values. It might even use Long if compiling for a 64-bit machine but those would be hidden implementation details).

@AdamSpeight2008
Copy link
Contributor

AdamSpeight2008 commented Oct 5, 2017

@ericmutta
We'll still have the issue irrespective of the Integer Type used, due to how the For Loop is lowered
Note: This is the ascending case.

Dm  _Value As T = _Start
Dim _End As T = ...
While _Value <= _End  
  ...
  _Value += _StepSize
End While

That _Value += _StepSize would still throw System.OverflowException , if _End is .MaxValue.

I think a solution would be better handling of the cases that approach the types MinValue and MaxValues.
eg

For idx As Byte = Byte.MinValue To Byte.MaxValue
  ' ...
Next

Is lowered to something like

Dim idx As Byte = Byte.MinValue
Dim _StepSize As Byte = 0
Dim _EndValue As Byte = Byte.MaxValue
While True
  ' code inside loop
  If idx = _EndValue Then Exit While
  Idx += _StepSize
End While

There are other cases to be aware of

For idx As Byte = Byte.MinValue To Byte.MaxValue Step 5
  ...
Next

to

Dim idx As Byte = Byte.MinValue
Dim __EndValue As Byte = Byte.MaxValue
While True
  ...
 Dim _idxDelta = __EndValue - Idx
 If ( _idxDelta =0 ) OrElse ( _idxDelta < StepSize ) Then Exit While
 idx+=1
End While

A side-effect of this approach is the perf of the loop construct is slightly degraded.

@ericmutta
Copy link
Author

@AdamSpeight2008 I think a solution would be better handling of the cases that approach the types MinValue and MaxValues.

That would certainly do the trick...good point too about the Step clause 👍

@AdamSpeight2008 A side-effect of this approach is the perf of the loop construct is slightly degraded.

The suggestion you made to overcome the overflow exception for the variant without a Step clause, looks like it would be OK perf-wise, when reduced to machine code (in fact they may be identical, but I don't have the disassembly window to check).

@AnthonyDGreen
Copy link
Contributor

Hmm... oddly enough this loop is inexpressible in both VB and C# without either an overflow expression or an infinite loop because the overflow wraps back around to Byte.MinValue.

Dim i = Byte.MinValue
Goto Test
Body:
    UseByte(i)

    i += 1
Test:
    If i <= Byte.MaxValue Then Goto Body

But it's 1) difficult to imagine a lowering that can handle this case without sacrificing performance, and 2) a breaking change insofar as that for loop control variables which are captured or whose scope is greater than the For loop it is observable today that the variable was incremented beyond the upper-bound of the array. This information could be used to determine whether the loop terminated early or completed normally.

Hmm...

@AdamSpeight2008
Copy link
Contributor

AdamSpeight2008 commented Oct 6, 2017

@AnthonyDGreen
In the case when the iteration variable idx (in my example) isn't utilise after the loop, hence any side-effected are not observed. The newer lower would work.


Imports System
Public Class C
    Public Sub M()
        ForLoop_V0
        Console.WriteLine()
        Console.WriteLine()
        ForLoop_V1
    End Sub
Sub ForLoop_V0()
        For idx As Byte = Byte.MinValue To Byte.MaxValue
        Console.Write($"{idx}  ")
    Next
    End Sub
Sub ForLoop_V1()
    Dim __Value__ As Byte = Byte.MaxValue
    const __End__ As Byte = Byte.MaxValue
    const __Step__ As Byte = 1
    While True
        Console.Write($"{__Value__}  ")
        Dim __delta__ = __End__ - __Value__
        If __delta__ <__Step__ Then Exit While
                __Value__ +=__Step__
                End While
    End Sub
End Class

ForLoop_V0 becomes in IL (Release)

    .method public 
        instance void ForLoop_V0 () cil managed 
    {
        // Method begins at RVA 0x2070
        // Code size 37 (0x25)
        .maxstack 2
        .locals init (
            [0] uint8
        )

        IL_0000: ldc.i4.0
        IL_0001: stloc.0
        IL_0002: ldstr "{0}  "
        IL_0007: ldloc.0
        IL_0008: box [mscorlib]System.Byte
        IL_000d: call string [mscorlib]System.String::Format(string, object)
        IL_0012: call void [mscorlib]System.Console::Write(string)
        IL_0017: ldloc.0
        IL_0018: ldc.i4.1
        IL_0019: add
        IL_001a: conv.ovf.u1.un
        IL_001b: stloc.0
        IL_001c: ldloc.0
        IL_001d: ldc.i4 255
        IL_0022: ble.un.s IL_0002
        IL_0024: ret
    } // end of method C::ForLoop_V0

ForLoop_V1 becomes in IL (Release)

    .method public 
        instance void ForLoop_V1 () cil managed 
    {
        // Method begins at RVA 0x20a4
        // Code size 46 (0x2e)
        .maxstack 2
        .locals init (
            [0] uint8
        )

        IL_0000: ldc.i4 255
        IL_0005: stloc.0
        IL_0006: ldstr "{0}  "
        IL_000b: ldloc.0
        IL_000c: box [mscorlib]System.Byte
        IL_0011: call string [mscorlib]System.String::Format(string, object)
        IL_0016: call void [mscorlib]System.Console::Write(string)
        IL_001b: ldc.i4 255
        IL_0020: ldloc.0
        IL_0021: sub
        IL_0022: conv.ovf.u1.un
        IL_0023: ldc.i4.1
        IL_0024: blt.un.s IL_002d
        IL_0026: ldloc.0
        IL_0027: ldc.i4.1
        IL_0028: add
        IL_0029: conv.ovf.u1.un
        IL_002a: stloc.0
        IL_002b: br.s IL_0006
        IL_002d: ret
    } // end of method C::ForLoop_V1

The IL implementation is virtually the same, the difference is the handling of the bound-range check.
Which may have a better implementation available.

@AdamSpeight2008
Copy link
Contributor

AdamSpeight2008 commented Oct 6, 2017

The JIT-Asm

C.ForLoop_V0()
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: push esi
    L0004: push ebx
    L0005: xor ebx, ebx
    L0007: mov ecx, 0x728083b0
    L000c: call 0x1bd30c8
    L0011: mov edx, eax
    L0013: mov [edx+0x4], bl
    L0016: mov eax, edx
    L0018: mov ecx, eax
    L001a: xor edx, edx
    L001c: xor esi, esi
    L001e: mov eax, [0x25612d50]
    L0023: push eax
    L0024: push esi
    L0025: push edx
    L0026: push ecx
    L0027: mov edx, [0x1c8da7d4]
    L002d: xor ecx, ecx
    L002f: call System.String.FormatHelper(System.IFormatProvider, System.String, System.ParamsArray)
    L0034: mov ecx, eax
    L0036: call System.Console.Write(System.String)
    L003b: inc ebx
    L003c: test ebx, 0xffffff00
    L0042: jnz L0050
    L0044: cmp ebx, 0xff
    L004a: jbe L0107
    L004c: pop ebx
    L004d: pop esi
    L004e: pop ebp
    L004f: ret
    L0050: call 0x746c0030
    L0055: int3
C.ForLoop_V1()
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: push esi
    L0004: push ebx
    L0005: mov ebx, 0xff
    L000a: mov ecx, 0x728083b0
    L000f: call 0x1bd30c8
    L0014: mov edx, eax
    L0016: mov [edx+0x4], bl
    L0019: mov eax, edx
    L001b: mov ecx, eax
    L001d: xor edx, edx
    L001f: xor esi, esi
    L0021: mov eax, [0x25612d50]
    L0026: push eax
    L0027: push esi
    L0028: push edx
    L0029: push ecx
    L002a: mov edx, [0x1c8da7d4]
    L0030: xor ecx, ecx
    L0032: call System.String.FormatHelper(System.IFormatProvider, System.String, System.ParamsArray)
    L0037: mov ecx, eax
    L0039: call System.Console.Write(System.String)
    L003e: mov eax, ebx
    L0040: neg eax
    L0042: add eax, 0xff
    L0047: test eax, 0xffffff00
    L004c: jnz L0061
    L004e: test eax, eax
    L0050: jz L005d
    L0052: inc ebx
    L0053: test ebx, 0xffffff00
    L0059: jnz L0061
    L005b: jmp L010a
    L005d: pop ebx
    L005e: pop esi
    L005f: pop ebp
    L0060: ret
    L0061: call 0x746c0030
    L0066: int3

@rskar-git
Copy link

I'm throwing this out there, because this talk of changing the internal plumbing of For ... Next is making me nervous.

How about a new syntax concept, such as:

For n As Byte In RangeOf(Byte.MinValue, Byte.MaxValue)
    UseByteValue(n)
Next

So RangeOf here looks like any other IEnumerator or Iterator Function, except it is something recognized by the compiler. Maybe use a syntax like that for the whatever new-and-improved lowered code may be?

It's very Python-ish, too - may be a selling point?

@AdamSpeight2008
Copy link
Contributor

@rskar-git Reread the issue, it being Byte as nothing to do with it.

@rskar-git
Copy link

@AdamSpeight2008 I think I get it, I recognize this is all about an overflow/underflow that may happen after the desired range was in fact processed. It's the side-effect and high-performance of the current For that I'm advocating for, a 'baked-in' legacy behavior I believe must be kept, despite its flaw(s).

@AnthonyDGreen
Copy link
Contributor

I think the body would have to be lowered like this:

Dim i As SByte = -128
If i > 127 Then Goto After
Goto Body
Do
    i += 1
Body:
    ? i
Loop Until i >= 127
After:

That lowering I think might actually preserve the performance of the current loop (need to check JIT optimizations) and would work for upper and lower bounds not known at compile time to encompass the entire range of the SByte or Byte type, and wouldn't suffer an infinite loop due to overflow if such checking were turned off, and wouldn't be observable if the iteration variable were declared in the loop header and was not captured by a lambda or query expression.

If all those things hold true... I like it. As edge as this is I agree that the exception you get today is leaking an implementation detail and it really bothers me when VB doesn't have attention to detail in areas like this. And while there's a simple workaround for Byte there isn't for Long, not that I think a loop over all values of Long is a thing.

@AdamSpeight2008
Copy link
Contributor

@AnthonyDGreen What about the Step case i mentioned above.

@KlausLoeffelmann
Copy link
Member

Changing this behavior would introduce breaking changes as atleast Unit Tests would fail, when they expect exceptions. Please don’t touch this!

@AdamSpeight2008
Copy link
Contributor

AdamSpeight2008 commented Dec 3, 2017

@AnthonyDGreen The following template code should be efficient and prevent the overflow also.

Template ForLoop( @Start, @End, @Step, @Body )
  Dim idx = @Start
  #Template_If( @Start <> @End )
    Dim last = @End - @Step
    #Template_If( @Start < @End )
    While idx <= last
    #Template_Else
    While idx >= last
    #Template_EndIf
      ${ @Body(idx) }$
      idx += @Step
    End While
  #Template_EndIf
  ${ @Body(idx) }$
End Template

@AnthonyDGreen
Copy link
Contributor

If your unit test fails that's great. It means you caught it. That's the value of unit tests.

A bigger concern is code that relies on the overflow looping infinitely when overflow checking is turned off. Some kind of cyclic sine wave thing, I guess. And I know the JIT optimizes certain code patterns in for loops to avoid redundant array bounds checking. If we changed the pattern and lost the optimization the would be an unacceptable performance regression for VB.

@AdamSpeight2008, the step case would still overflow either throwing an exception or looping infinitely. I think the check you suggest where we check for that adds too much to the existing loop so you raise a good point. That kind of inconsistency does lean heavily against any change here.

One last thing we haven't talked about at all is late bound for loops which do some calculations I don't remember off the top of my head.

@AdamSpeight2008
Copy link
Contributor

AdamSpeight2008 commented Dec 3, 2017

@AnthonyDGreen The Step case wouldn't overflow in my template, the only proviso is Step has to be a signed type.

@AdamSpeight2008
Copy link
Contributor

AdamSpeight2008 commented Dec 3, 2017

To following is pretty short and preserves most of the original's efficiency at the cost of a few additional lines of IL

Dim idx = @Start
Dim last = @End - @Step
For idx = @Start To last Step @Step
  @Body(idx)
Next
@Body(idx)

But this has the exist issue, regarding unsigned limits and signed step (negative). Throwing an overflow exception.


@AnthonyDGreen
An alternative is to add a contextual keyword Checked, which makes this an explicit opt-in feature.
Thus all existing For ... loops work as currently (backwards compatible).

For Checked idx = Byte.MaxValue To Byte.MinValue Step -1
'...
Next

@paul1956
Copy link

paul1956 commented Dec 3, 2017

@AdamSpeight2008 The heart was for the Checked, not sure the proposed code works correctly for

       <TestMethod()> Public Sub ForTestAsync()
            Dim count As Integer = 0
            For i As Integer = 1 To 0 Step 1
                count += 1
            Next
            Xunit.Assert.True(count = 0, $"Count = {count}")
        End Sub

@AdamSpeight2008
Copy link
Contributor

@paul1956 just need to add a check for descending with ascending step and visa-versa.

@paul1956
Copy link

paul1956 commented Dec 4, 2017

@AdamSpeight2008 yes with constants its obvious you don't need to generate the For Loop, but the template that generates the code needs to worry about the cases when variables are involved.

<TestMethod()> Public Sub ForTestAsync()
             Dim count As Integer = 0 
            For i As Byte = x To y Step y
                 count += 1
             Next
             Xunit.Assert.True(count = 0, $"Count = {count}")
         End Sub
``

@AnthonyDGreen AnthonyDGreen changed the title Proposal: For-loop should use larger type to avoid overflow exception. For-loop should use larger type to avoid overflow exception. Dec 4, 2017
@AnthonyDGreen
Copy link
Contributor

Alright, I thought on it a bit this weekend and I'm pretty certain at this point that a change in the For loop lowering isn't happening for many of the reasons already mentioned. But, there is a shiny side: Proposal #25 would add a new expression to VB that produces a range, a sequence of values from some x to some y; intuitively it a for loop that yields its iteration variable. However, we can't have an IEnumerable(Of Byte) just throwing exceptions at the end:

For Each n As Byte In CByte(0) To CByte(255)
    ' Isn't expected to throw exceptions
Next

That's not how For Each is supposed to work. When you run out of values you just stop, not blow up. Nor can we really have it looping infinitely if overflow checking is off.

We can build in the overflow avoiding implementation into the iterators without any of the concerns above. I kinda suspect that over time this form of looping will phase out traditional For loops anyway. It also has the benefit of using For Each capturing semantics. So:

Dim list = New List(Of Action)
For Each n In 1 To 10
    ' Each lambda remembers the value of n when it was created.
    list.Add(Sub() Console.WriteLine(n))
Next

' Were the above a For loop this loop would print "11" ten times.
For Each a In list
    a()
Next

So to recap, don't think we're changing the For loop. Might have a sweet workaround in VB16.

@AnthonyDGreen
Copy link
Contributor

Update: The 12/6/2017 LDM rejected this idea. We agree that this sucks but don't think widening the type is the solution; for Long and ULong doing so isn't even an option. Changing the code-gen is fraught with peril, potential performance regressions, and back-compat breaks. #25 is something we're likely going to move forward with and could be used to address this problem as described.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants