Skip to content

Commit

Permalink
doc: indexing by different numbers of indices (#30736)
Browse files Browse the repository at this point in the history
- document indexing by different numbers of indices
- remove/supercede an outdated section on the same in devdocs/subarray
  • Loading branch information
mbauman authored and fredrikekre committed Jan 28, 2019
1 parent 8fa0645 commit 1c1fc03
Show file tree
Hide file tree
Showing 2 changed files with 116 additions and 39 deletions.
31 changes: 3 additions & 28 deletions doc/src/devdocs/subarrays.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,34 +3,9 @@
Julia's `SubArray` type is a container encoding a "view" of a parent [`AbstractArray`](@ref). This page
documents some of the design principles and implementation of `SubArray`s.

## Indexing: cartesian vs. linear indexing

Broadly speaking, there are two main ways to access data in an array. The first, often called
cartesian indexing, uses `N` indices for an `N` -dimensional `AbstractArray`. For example, a
matrix `A` (2-dimensional) can be indexed in cartesian style as `A[i,j]`. The second indexing
method, referred to as linear indexing, uses a single index even for higher-dimensional objects.
For example, if `A = reshape(1:12, 3, 4)`, then the expression `A[5]` returns the value 5. Julia
allows you to combine these styles of indexing: for example, a 3d array `A3` can be indexed as
`A3[i,j]`, in which case `i` is interpreted as a cartesian index for the first dimension, and
`j` is a linear index over dimensions 2 and 3.

For `Array`s, linear indexing appeals to the underlying storage format: an array is laid out as
a contiguous block of memory, and hence the linear index is just the offset (+1) of the corresponding
entry relative to the beginning of the array. However, this is not true for many other `AbstractArray`
types: examples include [`SparseMatrixCSC`](@ref) from the `SparseArrays` standard library
module, arrays that require some kind of
computation (such as interpolation), and the type under discussion here, `SubArray`.
For these types, the underlying information is more naturally described in terms of
cartesian indices.

The `getindex` and `setindex!` functions for `AbstractArray` types may include automatic conversion
between indexing types. For explicit conversion, [`CartesianIndices`](@ref) can be used.

While converting from a cartesian index to a linear index is fast (it's just multiplication and
addition), converting from a linear index to a cartesian index is very slow: it relies on the
`div` operation, which is one of the slowest low-level operations you can perform with a CPU.
For this reason, any code that deals with `AbstractArray` types is best designed in terms of
cartesian, rather than linear, indexing.
One of the major design goals is to ensure high performance for views of both [`IndexLinear`](@ref) and
[`IndexCartesian`](@ref) arrays. Furthermore, views of `IndexLinear` arrays should themselves be
`IndexLinear` to the extent that it is possible.

## Index replacement

Expand Down
124 changes: 113 additions & 11 deletions doc/src/manual/arrays.md
Original file line number Diff line number Diff line change
Expand Up @@ -397,17 +397,6 @@ julia> x[1, [2 3; 4 1]]
13 1
```

Empty ranges of the form `n:n-1` are sometimes used to indicate the inter-index location between
`n-1` and `n`. For example, the [`searchsorted`](@ref) function uses this convention to indicate
the insertion point of a value not found in a sorted array:

```jldoctest
julia> a = [1,2,5,6,7];
julia> searchsorted(a, 4)
3:2
```

## Assignment

The general syntax for assigning values in an n-dimensional array `A` is:
Expand Down Expand Up @@ -648,6 +637,119 @@ julia> x[mask]
16
```

### Number of indices

#### Cartesian indexing

The ordinary way to index into an `N`-dimensional array is to use exactly `N` indices; each
index selects the position(s) in its particular dimension. For example, in the three-dimensional
array `A = rand(4, 3, 2)`, `A[2, 3, 1]` will select the number in the second row of the third
column in the first "page" of the array. This is often referred to as _cartesian indexing_.

#### Linear indexing

When exactly one index `i` is provided, that index no longer represents a location in a
particular dimension of the array. Instead, it selects the `i`th element using the
column-major iteration order that linearly spans the entire array. This is known as _linear
indexing_. It essentially treats the array as though it had been reshaped into a
one-dimensional vector with [`vec`](@ref).

```jldoctest linindexing
julia> A = [2 6; 4 7; 3 1]
3×2 Array{Int64,2}:
2 6
4 7
3 1
julia> A[5]
7
julia> vec(A)[5]
7
```

A linear index into the array `A` can be converted to a `CartesianIndex` for cartesian
indexing with `CartesianIndices(A)[i]` (see [`CartesianIndices`](@ref)), and a set of
`N` cartesian indices can be converted to a linear index with
`LinearIndices(A)[i_1, i_2, ..., i_N]` (see [`LinearIndices`](@ref)).

```jldoctest linindexing
julia> CartesianIndices(A)[5]
CartesianIndex(2, 2)
julia> LinearIndices(A)[2, 2]
5
```

It's important to note that there's a very large assymmetry in the performance
of these conversions. Converting a linear index to a set of cartesian indices
requires dividing and taking the remainder, whereas going the other way is just
multiplies and adds. In modern processors, integer division can be 10-50 times
slower than multiplication. While some arrays — like [`Array`](@ref) itself —
are implemented using a linear chunk of memory and directly use a linear index
in their implementations, other arrays — like [`Diagonal`](@ref) — need the
full set of cartesian indices to do their lookup (see [`IndexStyle`](@ref) to
introspect which is which). As such, when iterating over an entire array, it's
much better to iterate over [`eachindex(A)`](@ref) instead of `1:length(A)`.
Not only will the former be much faster in cases where `A` is `IndexCartesian`,
but it will also support OffsetArrays, too.

#### Omitted and extra indices

In addition to linear indexing, an `N`-dimensional array may be indexed with
fewer or more than `N` indices in certain situations.

Indices may be omitted if the trailing dimensions that are not indexed into are
all length one. In other words, trailing indices can be omitted only if there
is only one possible value that those omitted indices could be for an in-bounds
indexing expression. For example, a four-dimensional array with size `(3, 4, 2,
1)` may be indexed with only three indices as the dimension that gets skipped
(the fourth dimension) has length one. Note that linear indexing takes
precedence over this rule.

```jldoctest
julia> A = reshape(1:24, 3, 4, 2, 1)
3×4×2×1 reshape(::UnitRange{Int64}, 3, 4, 2, 1) with eltype Int64:
[:, :, 1, 1] =
1 4 7 10
2 5 8 11
3 6 9 12
[:, :, 2, 1] =
13 16 19 22
14 17 20 23
15 18 21 24
julia> A[1, 3, 2] # Omits the fourth dimension (length 1)
19
julia> A[1, 3] # Attempts to omit dimensions 3 & 4 (lengths 2 and 1)
ERROR: BoundsError: attempt to access 3×4×2×1 reshape(::UnitRange{Int64}, 3, 4, 2, 1) with eltype Int64 at index [1, 3]
julia> A[19] # Linear indexing
19
```

When omitting _all_ indices with `A[]`, this semantic provides a simple idiom
to retrieve the only element in an array and simultaneously ensure that there
was only one element.

Similarly, more than `N` indices may be provided if all the indices beyond the
dimensionality of the array are `1` (or more generally are the first and only
element of `axes(A, d)` where `d` is that particular dimension number). This
allows vectors to be indexed like one-column matrices, for example:

```jldoctest
julia> A = [8,6,7]
3-element Array{Int64,1}:
8
6
7
julia> A[2,1]
6
```

## Iteration

The recommended ways to iterate over a whole array are
Expand Down

0 comments on commit 1c1fc03

Please sign in to comment.