-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
range indexing should produce a subarray, not a copy #3701
Comments
Thanks for making the issue. We should definitely do this. |
+1 |
At the moment, the issue is not that it would reduce Matlab compatibility, but that it would almost certainly reduce performance---for many operations SubArrays have nowhere near the performance of Arrays. #3503 will help when it can be merged (but there are a few "deep" issues that need to be addressed first). |
@timholy, there is no technical reason (as far as I can tell) why accessing a SubArray should not generate the same performance (and in fact, almost exactly the same code) as accessing the same data directly in the original Array. If that's not the case now, then this is a performance bug that should be fixed, but I see that as a somewhat orthogonal issue to this one. (There will be cases where it is advantageous for cache reasons to make a contiguous copy of a subarray that is discontiguous in memory, but of course the user can always call |
As @timholy mentioned, we are working on an improved version of subarray -- the current implementation is way too slow. However, the main obstacle is performance degradation. Even a very simple immutable wrapper over |
@stevengj, there are two issues. Dahua mentioned one, inlining the wrapper. The other principal hurdle is linear indexing, which is heavily used. Suppose Heck, even getting good performance from @StefanKarpinski has been doing some work on the linear indexing problem, hopefully this will help. |
Regarding inlining, that is a compiler issue; there is no technical reason why subarray accesses could not be inlined with zero overhead. It seems like a bad idea to decide fundamental language semantics based on temporary performance issues. Regarding linear indexing, exactly the same issue applies to looping over a portion of the original |
Absolutely. We're all for range slices producing views. These performance issues are clearly surmountable. |
The linear indexing issue with subarrays was actually one reason that I started to work on the |
I strongly agree with having a good design and improving the compiler to match it. I know it will be possible to generate much better code for subarrays than we do now. |
Since we're in the chaos period, I'd be down to just rebase that branch and merge it as soon as @lindahua get's a chance to do so. |
We can add it, but we can't remove SubArrays until the AbstractArray problem is solved. That's basically waiting on stagedfunctions. |
Is this issue the reason why the v0.5 release will be called "Arraymaddon"? |
One of several. |
@stevengj: Since you initiated this issue it would be very interesting whats your opinion on the recent discussion on the topic in JuliaLang/LinearAlgebra.jl#255 |
Current plan is to revisit this after all immutables can be inlined. See #9150 |
I'm pretty sure this was decided against and the thought now is to keep returning copies but with a more convenient syntax for views ( |
@KristofferC, is there a discussion/rationale leading to this decision available somewhere online? From several issues here in this repo I got an impression that people were generally in favor of returning views by default as soon as some general performance issues are fixed. I am really curious as to why the opposite decision was finally made. |
Probably some useful discussion in #7941 and JuliaLang/LinearAlgebra.jl#255. We also have |
@timholy, thanks. I knew about |
There are two major issues:
As it stands, every single indexing expression in Julia can be performed as a view. You can opt-in to using views over large sections of code with
|
One of the best reasons for returning views was to minimize the amount of copying in complex expressions; fusion is really a better solution to the same problem. |
One thing that could help there is hammering out a definition for |
The options seem to be
I've found myself wanting both on occasion. IIRC, there's a discussion about this somewhere. |
Yes, as have I. I've wanted them both exclusively of each other (within a few lines) when working with nested JSON-like data. #19169. |
To clarify my usage pattern, I'm talking about something like this (a stencil-type computation, essentially). Omitting
I guess you can look at it that way. The problem is that loop fusion and views prevent the creation of temporary arrays resulting from two separate expression types: arithmetic and slicing respectively. If fusion worked for the latter type of expressions too (e.g. as @mbauman suggested), it would indeed be a better solution to this problem. |
I find myself wanting to use this pattern all the time: a = zeros(1000)
b = zeros(1000,10)
function f1(a,b,i)
a[:] = b[:,i]
nothing
end
function f2(a,b,i)
a .= b[:,i]
nothing
end
function f3(a,b,i)
a .= view(b,:,i)
nothing
end
@time f1(a,b,1)
0.000011 seconds (6 allocations: 8.109 KiB)
@time f2(a,b,1)
0.000006 seconds (6 allocations: 8.109 KiB)
@time f3(a,b,1)
0.000002 seconds (4 allocations: 160 bytes) Preserving the semantics of copying on slicing, could the compiler optimise away the copy in I find the idea of module-wide |
It is entirely possible that the compiler could at some point recognize that it doesn't actually need to make a copy in these cases. We've also considered the syntax |
@views function f2(a,b,i)
a .= b[:,1]
nothing
end isn't so bad and that way you don't have to put it on the entire module. Optimization would, of course, be better. |
From the doc:
How about a similar rule for the RHS, eg. indexing produces a view whenever the array is an operand of a dot call? |
Out of curiosity, is there any chance the default behavior will switch to views instead of copies in a future version of the language? |
If it were to happen, it can't happen before 2.0. |
Cool cool. Thanks to everyone for their great work on the language! It's very impressive :) |
Stdlib: Pkg URL: https://github.com/JuliaLang/Pkg.jl.git Stdlib branch: master Julia branch: master Old commit: 85f1e5564 New commit: 3c86ba27e Julia version: 1.11.0-DEV Pkg version: 1.11.0 Bump invoked by: @IanButterworth Powered by: [BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl) Diff: JuliaLang/Pkg.jl@85f1e55...3c86ba2 ``` $ git log --oneline 85f1e5564..3c86ba27e 3c86ba27e add `add --weak/extra Foo` to add to [weakdeps] or [extras] (#3708) 2e640f92f respect --color=no in Pkg.precompile (#3740) cbd5d08ad Automatically add compat entries when adding deps to a package (#3732) 03de920b3 rm old manual handling of `--compiled-modules` (#3738) 314d5497b Use realpaths for temp dirs during tests. Fix SparseArrays `why` breakage (#3734) a6531d4be environments.md: update Julia version (#3715) a509bc062 Revise the API of is_manifest_current. (#3701) 60b7b7995 rm incorrect kwargs in add docstring (#3733) ``` Co-authored-by: Dilum Aluthge <[email protected]>
I would prefer that range indexing of an
Array
(e.g.X[1:10, :]
) created aSubArray
(a view/reference), not a copy. This seems more in the spirit of Julia's pass-by-reference semantics, and would eliminate some of the performance gotchas with range indexing.It might also make future loop-devectorization optimization easier, because subarray references can be devectorized back into references to the original array without worrying that you will be changing the semantics.
It would reduce Matlab compatibility, but we already do that with pass-by-reference and so I doubt many users will be surprised by having the same reference semantics for range indexing.
This is something that has come up informally several times on the mailing list, but I didn't see any issue for it.
The text was updated successfully, but these errors were encountered: