Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve the interplay between bounds checking system and effect system #50167

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

aviatesk
Copy link
Member

@aviatesk aviatesk commented Jun 14, 2023

In the current state of the Julia compiler, bounds checking and its related optimization code, such as @boundscheck and @inbounds, pose a significant handicap for effect analysis. As a result, we're encountering an ironic situation where the application of @inbounds annotations, which are intended to optimize performance, instead obstruct the program's optimization, thereby preventing us from achieving optimal performance.

This PR is designed to resolve this situation. It aims to enhance the relationship between bounds checking and effect analysis, thereby correctly improving the performance of programs that have @inbounds annotations.

In the following, I'll first explain the reasons that have led to this situation for better understanding, and then I'll present potential improvements to address these issues. This commit is a collection of various improvement proposals. It's necessary that we incorporate all of them simultaneously to enhance the situation without introducing any regressions.

Core of the Problem

There are fundamentally two reasons why effect analysis of code containing bounds checking is difficult:

  1. The evaluation value of Expr(:boundscheck) is influenced by the @inbounds macro and the --check-bounds flag. Hence, when performing a concrete evaluation of a method containing Expr(:boundscheck), it's crucial to respect the @inbounds macro context and the --check-bounds settings, ensuring the method's behavior is consistent across the compile time concrete evaluation and the runtime execution.
  2. If the code, from which bounds checking has been removed due to @inbounds or --check-bounds=no, is unsafe, it may lead to undefined behavior due to uncertain memory access.

Current State

The current Julia compiler handles these two problems as follows:

Current State 1

Regarding the first problem, if a code or method call containing Expr(:boundscheck) is within an @inbounds context, a concrete evaluation is immediately prohibited. For instance, in the following case, when analyzing bar(), if you simply perform concrete evaluation of foo(), it wouldn't properly respect the @inbounds context present in bar(). However, since the concrete evaluation of foo() is prohibited, it doesn't pose an issue:

foo() = (r = 0; @boundscheck r += 1; return r)

bar() = @inbounds foo()

Conversely, in the following case, there is no need to prohibit the concrete evaluation of A1_inbounds due to the presence of @inbounds. This is because the execution of the @boundscheck block is determined by the presence of local @inbounds Expr(:boundscheck) within a local @inbounds context does not need to block concrete evaluation:

function A1_inbounds()
    r = 0
    @inbounds begin
        @boundscheck r += 1
    end
    return r
end

However, currently, we prohibit the concrete evaluation of such code as
well. Moreover, we are not handling such local @inbounds contexts
effectively, which results in incorrect execution of A1_inbounds()
(even our test is incorrect for this example:
https://github.com/JuliaLang/julia/blob/834aad4ab409f4ba65cbed2963b9ab6fa2770354/test/boundscheck_exec.jl#L34)

EDIT It is an expected behavior as pointed out by Jameson.

Furthermore, there is room for improvement when the --check-bounds flag is specified. Specifically, when the --check-bounds flag is set, the evaluation value of Expr(:boundscheck) is determined irrespective of the @inbounds context. Hence, there is no need to prohibit concrete evaluations due to inconsistency in the evaluation value of Expr(:boundscheck).

Current State 2

Next, we've ensured that concrete evaluation isn't performed when there's potentially unsafe code that may have bounds checking removed, or when the --check-bounds=no flag is set, which could lead to bounds checking being removed always.
For instance, if you perform concrete evaluation for the function call baz((1,2,3), 4) in the following example, it may return a value accessed from illicit memory and introduce undefined behaviors into the program:

baz(t::Tuple, i::Int) = @inbounds t[i]

baz((1,2,3), 4)

However, it's evident that the above code is incorrect and unsafe program and I believe undefined behavior in such programs is deemed, as explicitly stated in the @inbounds documentation:

Warning

Using @inbounds may return incorrect results/crashes/corruption for
out-of-bounds indices. The user is responsible for checking it
manually. Only use @inbounds when it is certain from the information
locally available that all accesses are in bounds.

Actually, the @inbounds macro is primarily an annotation to "improve performance by removing bounds checks from safe programs". Therefore, I opine that it would be more reasonable to utilize it to alleviate potential errors due to bounds checking within @inbounds contexts.

To bring up another associated concern, in the current compiler implementation, the :nothrow modelings for getfield/arrayref/arrayset is a bit risky, and :nothrow-ness is assumed when their bounds checking is turned off by call argument.
If our intended direction aligns with the removal of bounds checking based on @inbounds as proposed in issue #48245, then assuming :nothrow-ness due to @inbounds seems reasonable. However, presuming :nothrow-ness due to bounds checking argument or the --check-bounds flag appears to be risky, especially considering it's not documented.

This Commit

This commit implements all proposed improvements against the current issues as mentioned above. In summary, the enhancements include:

  • allowing concrete evaluation within a local @inbounds context
  • folding out Expr(:boundscheck) when the --check-bounds flag is set (and allow concrete evaluation)
  • changing the :nothrow effect bit to UInt8 type, and refining :nothrow information when in an @inbounds context
  • removing dangerous assumptions of :nothrow-ness for built-in functions when bounds checking is turned off
  • replacing the @_safeindex hack with @inbounds

@aviatesk aviatesk requested review from vchuravy, vtjnash and Keno June 14, 2023 17:59
@vtjnash
Copy link
Member

vtjnash commented Jun 14, 2023

This is because the execution of the @BoundsCheck block is determined by the presence of local @inbounds

The test for this is not wrong, but this claim is. The @boundscheck is controlled by the caller, and is not scope-based to where it is in, so the presence of @inbounds in the same scope does not have an impact on the @boundscheck. Indeed, a user could make some sort of sparse-dictionary-array with a pattern like this:

function getindex(arr::MyArray, idx::Int)
    @boundscheck (boundscheck(arr.idsx, idx) && @inbounds(arr[idx.idxs] !== missing)) || throw_boundserror(arr, idx)
    return @inbounds arr.data[arr.idxs[idx]]
end

@vtjnash
Copy link
Member

vtjnash commented Jun 14, 2023

How do we deal with the "foo" problem from the current state 1 in this change?

@aviatesk
Copy link
Member Author

The test for this is not wrong, but this claim is. The @boundscheck is controlled by the caller, and is not scope-based to where it is in, so the presence of @inbounds in the same scope does not have an impact on the @boundscheck. Indeed, a user could make some sort of sparse-dictionary-array with a pattern like this:

Thanks for pointing this out. After double-checking with https://docs.julialang.org/en/v1/devdocs/boundscheck/#Eliding-bounds-checks, I just agree with you that my claim was incorrect. I'll go ahead and remove that part from the PR.

How do we deal with the "foo" problem from the current state 1 in this change?

With this adjustment, A1_inbounds can be concrete evaluated when the caller context isn't @inbounds, something that wasn't possible before. We can't fold out Expr(:boundscheck) using a local @inbounds context (as you noted earlier) but we don't have to block concrete evaluation in these scenarios either.

@aviatesk aviatesk force-pushed the avi/inbounds-nothrow branch 3 times, most recently from b07cd82 to 7d6579e Compare June 15, 2023 10:46
aviatesk added 2 commits June 15, 2023 20:12
In the current state of the Julia compiler, bounds checking and its
related optimization code, such as `@boundscheck` and `@inbounds`, pose
a significant handicap for effect analysis. As a result, we're
encountering an ironic situation where the application of `@inbounds`
annotations, which are intended to optimize performance, instead
obstruct the program's optimization, thereby preventing us from
achieving optimal performance.

This PR is designed to resolve this situation. It aims to enhance the
relationship between bounds checking and effect analysis, thereby
correctly improving the performance of programs that have `@inbounds`
annotations.

In the following, I'll first explain the reasons that have led to this
situation for better understanding, and then I'll present potential
improvements to address these issues. This commit is a collection of
various improvement proposals. It's necessary that we incorporate all
of them simultaneously to enhance the situation without introducing
any regressions.

\## Core of the Problem

There are fundamentally two reasons why effect analysis of code
containing bounds checking is difficult:
1. The evaluation value of `Expr(:boundscheck)` is influenced by the
  `@inbounds` macro and the `--check-bounds` flag. Hence, when
  performing a concrete evaluation of a method containing
  `Expr(:boundscheck)`, it's crucial to respect the `@inbounds` macro
  context and the `--check-bounds` settings, ensuring the method's
  behavior is consistent across the compile time concrete evaluation
  and the runtime execution.
1. If the code, from which bounds checking has been removed due to
  `@inbounds` or `--check-bounds=no`, is unsafe, it may lead to
  undefined behavior due to uncertain memory access.

\## Current State

The current Julia compiler handles these two problems as follows:

\### Current State 1

Regarding the first problem, if a code or method call containing
`Expr(:boundscheck)` is within an `@inbounds` context, a concrete
evaluation is immediately prohibited. For instance, in the following
case, when analyzing `bar()`, if you simply perform concrete evaluation
of `foo()`, it wouldn't properly respect the `@inbounds` context present
in `bar()`. However, since the concrete evaluation of `foo()` is
prohibited, it doesn't pose an issue:
```julia
foo() = (r = 0; @BoundsCheck r += 1; return r)

bar() = @inbounds foo()
```

Conversely, in the following case, there is _no need_ to prohibit the
concrete evaluation of `A1_inbounds` due to the presence of `@inbounds`.
This is because ~~the execution of the `@boundscheck` block is determined
by the presence of local `@inbounds`~~ `Expr(:boundscheck)` within a
local `@inbounds` context does not need to block concrete evaluation:
```julia
function A1_inbounds()
    r = 0
    @inbounds begin
        @BoundsCheck r += 1
    end
    return r
end
```

However, currently, we prohibit the concrete evaluation of such code as
well. ~~Moreover, we are not handling such local `@inbounds` contexts
effectively, which results in incorrect execution of `A1_inbounds()`
(even our test is incorrect for this example:
`https://github.com/JuliaLang/julia/blob/834aad4ab409f4ba65cbed2963b9ab6fa2770354/test/boundscheck_exec.jl#L34`)~~
EDIT: It is an expected behavior as pointed out by Jameson.

Furthermore, there is room for improvement when the `--check-bounds`
flag is specified. Specifically, when the `--check-bounds` flag is set,
the evaluation value of `Expr(:boundscheck)` is determined irrespective
of the `@inbounds` context. Hence, there is no need to prohibit concrete
evaluations due to inconsistency in the evaluation value of
`Expr(:boundscheck)`.

\### Current State 2

Next, we've ensured that concrete evaluation isn't performed when
there's potentially unsafe code that may have bounds checking removed,
or when the `--check-bounds=no` flag is set, which could lead to bounds
checking being removed always.
For instance, if you perform concrete evaluation for the function call
`baz((1,2,3), 4)` in the following example, it may return a value
accessed from illicit memory and introduce undefined behaviors into the
program:
```julia
baz(t::Tuple, i::Int) = @inbounds t[i]

baz((1,2,3), 4)
```

However, it's evident that the above code is incorrect and unsafe
program and I believe undefined behavior in such programs is deemed,
as explicitly stated in the `@inbounds` documentation:

> │ Warning
> │
> │  Using @inbounds may return incorrect results/crashes/corruption for
> │  out-of-bounds indices. The user is responsible for checking it
> │  manually. Only use @inbounds when it is certain from the information
> │  locally available that all accesses are in bounds.

Actually, the `@inbounds` macro is primarily an annotation to
"improve performance by removing bounds checks from safe programs".
Therefore, I opine that it would be more reasonable to utilize it to
alleviate potential errors due to bounds checking within `@inbounds`
contexts.

To bring up another associated concern, in the current compiler
implementation, the `:nothrow` modelings for `getfield`/`arrayref`/`arrayset`
is a bit risky, and `:nothrow`-ness is assumed when their bounds checking
is turned off by call argument.
If our intended direction aligns with the removal of bounds checking
based on `@inbounds` as proposed in issue #48245, then assuming
`:nothrow`-ness due to `@inbounds` seems reasonable. However, presuming
`:nothrow`-ness due to bounds checking argument or the `--check-bounds`
flag appears to be risky, especially considering it's not documented.

\## This Commit

This commit implements all proposed improvements against the current
issues as mentioned above. In summary, the enhancements include:
- making `Expr(:boundscheck)` within a local `@inbounds` context not
  block concrete evaluation
- folding out `Expr(:boundscheck)` when the `--check-bounds` flag is set
  (and allow concrete evaluation)
- changing the `:nothrow` effect bit to `UInt8` type, and refining
  `:nothrow` information when in an `@inbounds` context
- removing dangerous assumptions of `:nothrow`-ness for built-in
  functions when bounds checking is turned off
- replacing the `@_safeindex` hack with `@inbounds`
@aviatesk aviatesk force-pushed the avi/inbounds-nothrow branch from 7d6579e to 538a511 Compare June 15, 2023 11:13
@Keno
Copy link
Member

Keno commented Jun 17, 2023

Actually, the @inbounds macro is primarily an annotation to "improve performance by removing bounds checks from safe programs". Therefore, I opine that it would be more reasonable to utilize it to alleviate potential errors due to bounds checking within @inbounds contexts.

The real problem here is that the compiler may make up values at function boundaries. In order for inbounds to be safe in the presence of concrete evaluation, it needs to be true for all values the compiler could possibly make up. For example, consider the following:

struct LessThan5
i::UInt
LessThan5(i::UInt) = (i < 5 || error("It's not less than 5"); new(i)
end

function index_into_array(i::LessThan5)
   a = collect(1:5)
   # Safe, because i.i is guaranteed globally to be less than 5
   @inbounds a[i.i]
end

function boom()
    index_into_array(LessThan5(6))
end

I think this use of inbounds is reasonable, and yet it will blow up when compiling boom.

@Keno
Copy link
Member

Keno commented Jun 17, 2023

We can't fold out Expr(:boundscheck) using a local @inbounds context (as you noted earlier) but we don't have to block concrete evaluation in these scenarios either.

I don't understand this point. At the very least, this breaks :consistent, which is a requisite for concrete-eval.

@vtjnash
Copy link
Member

vtjnash commented Jun 17, 2023

Isn't that already a bug introduced into the user code with use of that command line flag, and not unique to this though?

@martinholters
Copy link
Member

Considering @Keno's example, can't we just unconditionally enable bounds-checking during concrete-eval? If it then hits a bounds error, it can just synthesize throwing that. It the error is never hit at runtime, no-one will notice. If it is hit, we're in UB land, and throwing an exception is allowed to happen just as anything else. A niche problem would be bounds checks with side effects; but I think if the semantics of @inbounds is just that it allows the compiler to elide bounds checks, not a guarantee that it will, that might be ok? Clearly I'm missing something why this can't work?

@Keno
Copy link
Member

Keno commented Jun 19, 2023

Considering @Keno's example, can't we just unconditionally enable bounds-checking during concrete-eval?

Yes, that has always been an option, but we don't really have the capability to do this, because we'd need to compile two versions of every function, which even if we're ok with the perf overhead, the system isn't set up to do.

@vtjnash
Copy link
Member

vtjnash commented Jun 19, 2023

It also is not currently considered a bug for boundschecking to have other side effects, which might be removed by inbounds. That should block concrete eval anyways though. The main issue is divergent results. The semantics might suggest it may be okay to unconditionally run the code with boundschecking on, even if that means the result might not be consistent? (Of course, we can't do that if the user passes the flag, since that flag forces execution to be inconsistent then)

@aviatesk
Copy link
Member Author

aviatesk commented Jun 19, 2023

This PR introduces two main modifications: 1.) enhances the analysis of divergent results (inconsistencies) due to the use of @inbounds and @boundscheck, and 2.) accepts undefined behavior caused by @inbounds, even during compile time.
As for 1.) I believe there's no issue, as it purely boosts the accuracy of the analysis.
@Keno 's concern pertains to the second point. Indeed accepting this PR would mean that @inbounds, which is valid and does not trigger undefined behavior (UB) at runtime, could potentially lead to UB during compile time. From what I understand, @martinholters 's proposal is to use a Julia runtime that doesn't permit UB (e.g. a process with --check-bounds=yes set) during concrete-eval, in order to prevent UB at compile time.
Nonetheless, isn't the chance of such UB posing a problem relatively low? For example, as for Keno's earlier mentioned example, an error within the constructor would be detected first, thus Const(LessThan5(6)) would not emerge during compile time.
To induce UB, one would need to write in the following way:

@eval code_typed() do
    index_into_array($(Expr(:new, LessThan5, 6)))
end

@aviatesk
Copy link
Member Author

I don't understand this point. At the very least, this breaks :consistent, which is a requisite for concrete-eval.

The @inbounds blk does not affect Expr(:boundscheck) that syntactically appears within blk. In this sense, I believe we could handle such case by only tainting :noinbounds=false caused by Expr(:boundscheck), not tainting :consistent-cy. In specific terms, I think we can enhance the precision of the analysis in part 1 by deleting the following code:

Given the cache is already divided because of the --check-bounds flag, I assume there's no need to taint consistency.

aviatesk added a commit that referenced this pull request Jul 15, 2023
`Core._svec_ref` has accepted `boundscheck`-value as the first argument
since it was added in #45062. Nonetheless, `Core._svec_ref` simply calls
`jl_svec_ref` in either the interpreter or the codegen, and thus the
`boundscheck` value isn't utilized in any optimizations. Rather, even
worse, this `boundscheck`-argument negatively influences the effect
analysis (xref #50167 for details) and has caused type inference
regressions as reported in #50544.

For these reasons, this commit simply eliminates the `boundscheck`
argument from `Core._svec_ref`. Consequently,
`getindex(::SimpleVector, ::Int)` is now being concrete-eval eligible.

closes #50544
aviatesk added a commit that referenced this pull request Jul 15, 2023
`Core._svec_ref` has accepted `boundscheck`-value as the first argument
since it was added in #45062. Nonetheless, `Core._svec_ref` simply calls
`jl_svec_ref` in either the interpreter or the codegen, and thus the
`boundscheck` value isn't utilized in any optimizations. Rather, even
worse, this `boundscheck`-argument negatively influences the effect
analysis (xref #50167 for details) and has caused type inference
regressions as reported in #50544.

For these reasons, this commit simply eliminates the `boundscheck`
argument from `Core._svec_ref`. Consequently,
`getindex(::SimpleVector, ::Int)` is now being concrete-eval eligible.

closes #50544
aviatesk added a commit that referenced this pull request Jul 15, 2023
`Core._svec_ref` has accepted `boundscheck`-value as the first argument
since it was added in #45062. Nonetheless, `Core._svec_ref` simply calls
`jl_svec_ref` in either the interpreter or the codegen, and thus the
`boundscheck` value isn't utilized in any optimizations. Rather, even
worse, this `boundscheck`-argument negatively influences the effect
analysis (xref #50167 for details) and has caused type inference
regressions as reported in #50544.

For these reasons, this commit simply eliminates the `boundscheck`
argument from `Core._svec_ref`. Consequently,
`getindex(::SimpleVector, ::Int)` is now being concrete-eval eligible.

closes #50544
aviatesk added a commit that referenced this pull request Jul 17, 2023
`Core._svec_ref` has accepted `boundscheck`-value as the first argument
since it was added in #45062. Nonetheless, `Core._svec_ref` simply calls
`jl_svec_ref` in either the interpreter or the codegen, and thus the
`boundscheck` value isn't utilized in any optimizations. Rather, even
worse, this `boundscheck`-argument negatively influences the effect
analysis (xref #50167 for details) and has caused type inference
regressions as reported in #50544.

For these reasons, this commit simply eliminates the `boundscheck`
argument from `Core._svec_ref`. Consequently,
`getindex(::SimpleVector, ::Int)` is now being concrete-eval eligible.

closes #50544
aviatesk added a commit that referenced this pull request Jul 17, 2023
`Core._svec_ref` has accepted `boundscheck`-value as the first argument
since it was added in #45062. Nonetheless, `Core._svec_ref` simply calls
`jl_svec_ref` in either the interpreter or the codegen, and thus the
`boundscheck` value isn't utilized in any optimizations. Rather, even
worse, this `boundscheck`-argument negatively influences the effect
analysis (xref #50167 for details) and has caused type inference
regressions as reported in #50544.

For these reasons, this commit simply eliminates the `boundscheck`
argument from `Core._svec_ref`. Consequently,
`getindex(::SimpleVector, ::Int)` is now being concrete-eval eligible.

closes #50544
@brenhinkeller brenhinkeller added the compiler:effects effect analysis label Aug 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:effects effect analysis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants