-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate array dimensions from allocations #43487
Propagate array dimensions from allocations #43487
Conversation
Since my branches are part of a fork, I've created a PR comparing this one and the allocation hoisting PR immediately before it here: pchintalapudi#4 |
a351c5e
to
6e3b743
Compare
Now that #43057 has been merged, this PR is ready to be reviewed for rebasing onto the master branch. |
Compile time benchmarks:Master:
PR:
|
Other notable examples of optimization improvement:Example 2: doc/src/manual/performance-tips.md:952-963 loopincMaster:Godbolt: https://godbolt.org/z/oT9Wa4784 PR:Godbolt: https://godbolt.org/z/r1or75rzK Example 3: doc/src/manual/performance-tips.md:969-984 loopinc_preallocMaster:Godbolt: https://godbolt.org/z/7Ed4xd3o9 PR:Godbolt: https://godbolt.org/z/G6adnc69f Both examples show complete removal of the bounds check due to length propagation, and LLVM then recognizes that all of the loads from the array can be forwarded and eliminates them all. When combined with #43548 and its ability to remove arrays that are never loaded from, and if xinc(!) is not marked noinline, LLVM will either vectorize loopinc (resulting in a massive speedup) or outright precompute the sum in loopinc_prealloc (another O(n) -> O(1) transformation) |
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
1f3cb02
to
5c5f559
Compare
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
CI failed with:
|
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
4a18af9
to
cc79187
Compare
@nanosoldier |
@nanosoldier |
If nanosoldier comes back clean I am going to merge this. |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
@nanosoldier |
I am not certain if we want to move forward with tricks like this. The goal for Array was to replace it with something simpler implemented in Julia, which would hopefully allow the compiler to "see" this, without needing to make it a special case again in codegen (we had some code like this many years ago). |
I think both Prem and I agree with that notion. But it is also the case that the move from |
I think there's another point to consider, which is that if Array is implemented in Julia as something like:
we cannot stack-allocate the internal buffer of the array even with interprocedural escape analysis. The reasoning behind that is linked to the possible reshape method below:
Here, the lifetime of |
That is not really a new point, but is already a problem: just replace "struct Array" with any existing type, such as SubArray, Ref, Tuple, etc., and you have provided a rough definition for escape analysis already. |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
I think the crux of stack allocating the buffer data relies on the lifetime of the data pointer always being <= the lifetime of the containing object. If we had a way of communicating that sort of relationship within the language to the compiler, we could extend stack allocation to subfields of generic objects as well. That seems fairly risky though, as there isn't really a clear way to warn someone when they're about to violate that assumption, and it's really really easy to violate that assumption. Of course, this is also a pretty niche case that relies on a lot of different conditions and analyses being present (we don't have either stack allocation for arrays yet or IPO escape analysis, although I believe both are coming soon), so it's hard to say whether such an optimization is even worth the effort yet. That being said, as Valentin says this might be more worth it as a optimization now while we don't have a full-Julia implementation of Array, especially since basic array stack allocation is 4-5 PRs down the line from this one (#43573) and depends on the array identification machinery proposed here. In the future, we might end up purging the arraysize/arraylen codegen functions regardless if it's entirely specified in Julia, so we can drop the codegen modifications as they get deleted. |
This has been approved by 1/4 reviewers. Is it ok to merge? Are the "3 failing checks" false alarms? This seems like important work blocking very important stack-allocation work PR. |
2D and 3D array dimension lengths cannot change, and thus we can replace loads of array length and array dimension size with their values from when the array was allocated. 1D array dimensions might change, but we can determine if they might using escape analysis.
Currently, the mechanism for identifying array allocations does not work on Windows, due to the insertion of a trampoline function on Windows, which prevents identification by function pointer equivalence from working as intended (this data was collected from #43107). Fixing this for Windows would require overhauling how we handle ccalls during codegen, which is not part of this PR.Identification of arrays is now done through metadata set during codegen, which should theoretically get around the Windows issue.
Depends on #43057 for creating the llvm-alloc-helpers.cpp file
Performance benchmark:
Function:
Master:
Godbolt: https://godbolt.org/z/cj7h3e3Gr
Benchmark:
PR:
Godbolt: https://godbolt.org/z/zKWz8dxev
Benchmark: