Added some inbounds annotations in array.jl #24901

chethega · 2017-12-03T17:50:23Z

As far as I gathered concurrent resizes of arrays are forbidden, but concurrent modifications of values are allowed.

I added some obvious inbounds, and commented why some others cannot be safely added.

alyst · 2017-12-03T23:45:55Z

base/array.jl

@@ -1532,7 +1532,7 @@ julia> findnext(isodd, A, 2)
 function findnext(testf::Function, A, start::Integer)
    l = endof(A)
    i = start
-    while i <= l
+    @inbounds while i <= l


This will apply @inbounds to the user-provided testf as well.
There are also a few cases like this below.
For the Base it's better to use @inbounds really "atomically".

This will apply @inbounds to the user-provided testf as well.

Is this bad?

For the Base it's better to use @inbounds really "atomically".

Ok, I was not aware of this coding standard (I thought the code looks nicer if we pull out the inbounds as far as possible, instead of replicating it inside).

I'll change to comply. For cases a[i] = b[j], is there a preferred way of annotating inbounds separately for the getindex and setindex! ? The obvious way is to write bj=b[j];a[i]=bj and then separately annotate it; but this looks somewhat ugly.

Is this bad?

It's quite bad, because you are applying @inbounds to an unknown code. E.g. if it's fresh untested code, it'll crash Julia session.

I thought the code looks nicer if we pull out the inbounds as far as possible

Then it's easy to overlook. If the code within the loop gets modified, the @inbounds might no longer apply safely.
"Fine-grained" @inbounds might look pedantic in the user code, but Base has higher standards.

For cases a[i] = b[j], is there a preferred way of annotating inbounds separately for the getindex and setindex! ?

If from the context it follows that both a[i] and b[j] accesses are safe, you can just do @inbounds a[i] = b[j], of course.

See discussion at #24335.

alyst · 2017-12-03T23:58:57Z

base/array.jl

@@ -1749,7 +1749,7 @@ function find(A)
    cnt = 1
    inds = _index_remapper(A)
    warned = false
-    for (i,a) in enumerate(A)
+    for (i,a) in enumerate(A) #not inbounds in mulithreaded if A is concurrently mutated


What do you mean by the comment exactly?
A is read-only here (well, I is not), and the loop is not multi-threaded IIUC.
If any other thread modifies A it can lead to any kind of weird effects in any Base and non-Base function anyway.

We can safely corrupt memory if users decide to concurrently resize arrays (unless someone wants to go over array.c and check every single function).

On the other hand, if one thread writes to an array (increasing nnz) while another runs find, then we can return nonsense or throw an exception, but are not supposed to corrupt memory. Well, that is the previous behavior. Being conservative, I thought it better not to introduce security vulns in existing buggy-but-unexploitable programs, especially in my first PR to base, and for rather small performance gains.

Since it was not entirely obvious which accesses are inbounds in multithreaded, and I already spent the effort of thinking about it, I thought to better document it.

Or what is the intended behavior? Do you think memory-corrupting in this case is OK?

In practice, the two relevant changes are for pop! and push!; an always-predicted branch is not very expensive compared to the ccall, but it clears up the code_native / code_llvm (for people who like to see what their julia code actually produces), and push!/pop! are used quite extensively.

What I'm missing is why you think @inbounds while i <= l above is ok and this enumerate(A) is not.

The enumerate is inbounds, the later write I[cnt] = inds[i] has an inbounds getindex and a possibly out-of-bounds setindex!, if cnt>nnzA (because some other thread mutated A).

Thanks for the clarification. Personally, I would not add the "concurrent mutation" comment here (and in the other cases below):

as you see from our conversation, it's not explicit enough: it's not obvious that you are concerned about I[cnt] inbounds access (so it's rather should go to that line)

annotating for-loops with inbounds in Base code is a dangerous thing anyway

concurrent mutation of A is possible, but it's quite exotic scenario. It's a nice thing to take into account from security perspective, but we need some core developers opinion whether the code should be systematically annotated with these comments.

OTOH, you can still use @inbounds Aij = A[i,j] in the loops below

I placed the comments better now. If you think they are superfluous or misleading (because similar comments are possibly missing at other places), or that memory-corrupting in this case is admissible, feel free to modify my commit. I would not have commented / modified if the original version had been missing the bounds-check there, only raised an eyebrow. As I said, this PR tries to be conservative.

A less conservative modifiation in iterators.jl would be to make the getindex in enumerate(A::AbstractArray) inbounds. If the array is length-modified during the lifetime of the iterator, then we produce garbage anyway; the only question is whether we are willing to segfault / infoleak (oob reads typically don't lead to code exec, but this depends on the context and exploitable situations are possible). As you said, that's for some core dev to decide.

vtjnash · 2017-12-04T15:21:21Z

I think each of these needs a performance benchmark showing a significant (>25%) improvement, or we shouldn't do that one.

vtjnash · 2017-12-04T15:22:27Z

base/array.jl

@@ -1533,7 +1533,8 @@ function findnext(testf::Function, A, start::Integer)
    l = endof(A)
    i = start
    while i <= l
-        if testf(A[i])
+        @inbounds ai = A[i]


this isn't inbounds on the first iteration

i < 1 kills us? Sorry, will revert.

nalimilan · 2017-12-04T15:41:16Z

base/array.jl

-            I[cnt] = i
+        @inbounds aij = A[i,j]
+        if aij != 0
+            I[cnt] = i #not inbounds in mulithreaded if A is concurrently mutated


"multithreaded"

chethega · 2017-12-04T16:56:49Z

Re performance: The effect of inbounds on performance is mostly negligible (always predicted branch).

I think the most relevant benchmarkeable effect is on codesize; a predicted branch is so much cheaper than a ccall. Here I am again not so much worried about instruction-cache / decoding, but rather about legibility of the resulting code_llvm and code_native.

Due to the mostly negative reactions, discussions, and me making mistakes, I'd propose a compromise: Just stick to the four most important and most obvious @inbounds (shift!, unshift!, pop!, push!)?

KristofferC · 2017-12-04T17:17:56Z

Regarding legibility of the generated code. I rather we add some sort of filtering to the generated code than putting @inbounds in places that are not needed for performance.

Do you have any benchmarks for the shift!, unshift!, pop!, push! functions?

chethega · 2017-12-04T18:52:04Z

Sure, but take them with a mountain of salt (the variability is gigantic, the test is unrealistic, the functions are usually inlined and everything depends on context). But remember: push!/pop! are used in a lot of places, both in base and in user-code, and are typically inlined. The code looks so weird because I need a lot of calls in order to get the timing at least semi-stable, and sequentially writing a big percentage of my memory rather tests my OS at swapping.

_growend!(a::Vector, delta::Integer) =
    ccall(:jl_array_grow_end, Void, (Any, UInt), a, delta)


function push_nib!(a::Array{T,1}, item) where T
    # convert first so we don't grow the array if the assignment won't work
    itemT = convert(T, item)
    _growend!(a, 1)
    a[end] = itemT
    return a
end
    
function push_ib!(a::Array{T,1}, item) where T
    # convert first so we don't grow the array if the assignment won't work
    itemT = convert(T, item)
    _growend!(a, 1)
    @inbounds a[end] = itemT
    return a
end

@noinline function pushtest_nib(n)
    A = Vector{Int}(n)
    for i=1:n
        push_nib!(A, i)
    end
    A
end

@noinline function pushtest_ib(n)
    A = Vector{Int}(n)
    for i=1:n
        push_ib!(A, i)
    end
    A
end


function pushtestouter_nib(n)
   for i in 1:n
       pushtest_nib(n)
   end
end

function pushtestouter_ib(n)
   for i in 1:n
       pushtest_ib(n)
   end
end


pushtestouter_ib(5)
pushtestouter_nib(5)

pushtestouter_ib(5); pushtestouter_nib(5);
const Nm = 10_000
gc()

@time pushtestouter_ib(Nm);@time pushtestouter_ib(Nm);@time pushtestouter_ib(Nm);

gc()

@time pushtestouter_nib(Nm);@time pushtestouter_nib(Nm);@time pushtestouter_nib(Nm);

yielding

  1.414111 seconds (30.08 k allocations: 1.491 GiB, 17.40% gc time)
  1.249303 seconds (30.00 k allocations: 1.491 GiB, 6.28% gc time)
  1.238480 seconds (30.00 k allocations: 1.491 GiB, 6.02% gc time)
  1.263285 seconds (30.00 k allocations: 1.491 GiB, 6.09% gc time)
  1.257711 seconds (30.00 k allocations: 1.491 GiB, 6.07% gc time)
  1.254880 seconds (30.00 k allocations: 1.491 GiB, 6.08% gc time)

If you squint long enough you could guess at 1% improvement, tops, but realistically irrelevant (how would you test this reliably? Normally you just measure allocator state and background noise).

If you look at the code_native of push_ib! for a vector of Int, you see:

	pushq	%rbp
	movq	%rsp, %rbp
	pushq	%r14
	pushq	%rbx
	movq	%rsi, %r14
	movq	%rdi, %rbx
Source line: 16
	movabsq	$jl_array_grow_end, %rax
	movl	$1, %esi
	callq	*%rax
Source line: 17
	movq	(%rbx), %rax
Source line: 64
	movq	24(%rbx), %rcx
	xorl	%edx, %edx
Source line: 17
	testq	%rcx, %rcx
	cmovnsq	%rcx, %rdx
	movq	%r14, -8(%rax,%rdx,8)
Source line: 18
	movq	%rbx, %rax
	popq	%rbx
	popq	%r14
	popq	%rbp
	retq
	nopl	(%rax,%rax)

whereas push_nib!:

	pushq	%rbp
	movq	%rsp, %rbp
	pushq	%r14
	pushq	%rbx
	movq	%rsi, %r14
	movq	%rdi, %rbx
Source line: 8
	movabsq	$jl_array_grow_end, %rax
	movl	$1, %esi
	callq	*%rax
Source line: 64
	movq	24(%rbx), %rcx
	xorl	%eax, %eax
Source line: 9
	testq	%rcx, %rcx
	cmovnsq	%rcx, %rax
	leaq	-1(%rax), %rdx
	cmpq	%rcx, %rdx
	jae	L72
	movq	(%rbx), %rcx
	movq	%r14, -8(%rcx,%rax,8)
Source line: 10
	movq	%rbx, %rax
	leaq	-16(%rbp), %rsp
	popq	%rbx
	popq	%r14
	popq	%rbp
	retq
Source line: 9
L72:
	movq	%rsp, %rcx
	leaq	-16(%rcx), %rsi
	movq	%rsi, %rsp
	movq	%rax, -16(%rcx)
	movabsq	$jl_bounds_error_ints, %rax
	movl	$1, %edx
	movq	%rbx, %rdi
	callq	*%rax
	nopw	(%rax,%rax)

push! is inline and used often, hence we get a lot of bounds-check code emitted. The offending setindex! is literally one code-line below the ccall that makes it inbounds.

added some inbounds annotations

6e8c5bd

alyst reviewed Dec 3, 2017

View reviewed changes

chethega mentioned this pull request Dec 4, 2017

push!/pop! always does a ccall #24909

Closed

refined inbounds annotations

a4b8cba

vtjnash reviewed Dec 4, 2017

View reviewed changes

nalimilan reviewed Dec 4, 2017

View reviewed changes

reverted wrong inbounds in findnext

39cf181

kshyatt added the arrays [a, r, r, a, y, s] label Dec 18, 2017

oscardssmith closed this Dec 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added some inbounds annotations in array.jl #24901

Added some inbounds annotations in array.jl #24901

chethega commented Dec 3, 2017

alyst Dec 3, 2017

chethega Dec 4, 2017 •

edited

Loading

alyst Dec 4, 2017

nalimilan Dec 4, 2017

alyst Dec 3, 2017 •

edited

Loading

chethega Dec 4, 2017

alyst Dec 4, 2017

chethega Dec 4, 2017

alyst Dec 4, 2017

chethega Dec 4, 2017

vtjnash commented Dec 4, 2017

vtjnash Dec 4, 2017

chethega Dec 4, 2017

nalimilan Dec 4, 2017

chethega commented Dec 4, 2017

KristofferC commented Dec 4, 2017

chethega commented Dec 4, 2017

Added some inbounds annotations in array.jl #24901

Added some inbounds annotations in array.jl #24901

Conversation

chethega commented Dec 3, 2017

Choose a reason for hiding this comment

chethega Dec 4, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alyst Dec 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vtjnash commented Dec 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chethega commented Dec 4, 2017

KristofferC commented Dec 4, 2017

chethega commented Dec 4, 2017

chethega Dec 4, 2017 •

edited

Loading

alyst Dec 3, 2017 •

edited

Loading