Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added some inbounds annotations in array.jl #24901

Closed
wants to merge 3 commits into from
Closed

Added some inbounds annotations in array.jl #24901

wants to merge 3 commits into from

Conversation

chethega
Copy link
Contributor

@chethega chethega commented Dec 3, 2017

As far as I gathered concurrent resizes of arrays are forbidden, but concurrent modifications of values are allowed.

I added some obvious inbounds, and commented why some others cannot be safely added.

base/array.jl Outdated
@@ -1532,7 +1532,7 @@ julia> findnext(isodd, A, 2)
function findnext(testf::Function, A, start::Integer)
l = endof(A)
i = start
while i <= l
@inbounds while i <= l
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will apply @inbounds to the user-provided testf as well.
There are also a few cases like this below.
For the Base it's better to use @inbounds really "atomically".

Copy link
Contributor Author

@chethega chethega Dec 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will apply @inbounds to the user-provided testf as well.

Is this bad?

For the Base it's better to use @inbounds really "atomically".

Ok, I was not aware of this coding standard (I thought the code looks nicer if we pull out the inbounds as far as possible, instead of replicating it inside).

I'll change to comply. For cases a[i] = b[j], is there a preferred way of annotating inbounds separately for the getindex and setindex! ? The obvious way is to write bj=b[j];a[i]=bj and then separately annotate it; but this looks somewhat ugly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this bad?

It's quite bad, because you are applying @inbounds to an unknown code. E.g. if it's fresh untested code, it'll crash Julia session.

I thought the code looks nicer if we pull out the inbounds as far as possible

Then it's easy to overlook. If the code within the loop gets modified, the @inbounds might no longer apply safely.
"Fine-grained" @inbounds might look pedantic in the user code, but Base has higher standards.

For cases a[i] = b[j], is there a preferred way of annotating inbounds separately for the getindex and setindex! ?

If from the context it follows that both a[i] and b[j] accesses are safe, you can just do @inbounds a[i] = b[j], of course.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion at #24335.

base/array.jl Outdated
@@ -1749,7 +1749,7 @@ function find(A)
cnt = 1
inds = _index_remapper(A)
warned = false
for (i,a) in enumerate(A)
for (i,a) in enumerate(A) #not inbounds in mulithreaded if A is concurrently mutated
Copy link
Contributor

@alyst alyst Dec 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by the comment exactly?
A is read-only here (well, I is not), and the loop is not multi-threaded IIUC.
If any other thread modifies A it can lead to any kind of weird effects in any Base and non-Base function anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can safely corrupt memory if users decide to concurrently resize arrays (unless someone wants to go over array.c and check every single function).

On the other hand, if one thread writes to an array (increasing nnz) while another runs find, then we can return nonsense or throw an exception, but are not supposed to corrupt memory. Well, that is the previous behavior. Being conservative, I thought it better not to introduce security vulns in existing buggy-but-unexploitable programs, especially in my first PR to base, and for rather small performance gains.

Since it was not entirely obvious which accesses are inbounds in multithreaded, and I already spent the effort of thinking about it, I thought to better document it.

Or what is the intended behavior? Do you think memory-corrupting in this case is OK?

In practice, the two relevant changes are for pop! and push!; an always-predicted branch is not very expensive compared to the ccall, but it clears up the code_native / code_llvm (for people who like to see what their julia code actually produces), and push!/pop! are used quite extensively.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm missing is why you think @inbounds while i <= l above is ok and this enumerate(A) is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enumerate is inbounds, the later write I[cnt] = inds[i] has an inbounds getindex and a possibly out-of-bounds setindex!, if cnt>nnzA (because some other thread mutated A).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. Personally, I would not add the "concurrent mutation" comment here (and in the other cases below):

  1. as you see from our conversation, it's not explicit enough: it's not obvious that you are concerned about I[cnt] inbounds access (so it's rather should go to that line)
  2. annotating for-loops with inbounds in Base code is a dangerous thing anyway
  3. concurrent mutation of A is possible, but it's quite exotic scenario. It's a nice thing to take into account from security perspective, but we need some core developers opinion whether the code should be systematically annotated with these comments.

OTOH, you can still use @inbounds Aij = A[i,j] in the loops below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I placed the comments better now. If you think they are superfluous or misleading (because similar comments are possibly missing at other places), or that memory-corrupting in this case is admissible, feel free to modify my commit. I would not have commented / modified if the original version had been missing the bounds-check there, only raised an eyebrow. As I said, this PR tries to be conservative.

A less conservative modifiation in iterators.jl would be to make the getindex in enumerate(A::AbstractArray) inbounds. If the array is length-modified during the lifetime of the iterator, then we produce garbage anyway; the only question is whether we are willing to segfault / infoleak (oob reads typically don't lead to code exec, but this depends on the context and exploitable situations are possible). As you said, that's for some core dev to decide.

@vtjnash
Copy link
Member

vtjnash commented Dec 4, 2017

I think each of these needs a performance benchmark showing a significant (>25%) improvement, or we shouldn't do that one.

@@ -1533,7 +1533,8 @@ function findnext(testf::Function, A, start::Integer)
l = endof(A)
i = start
while i <= l
if testf(A[i])
@inbounds ai = A[i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't inbounds on the first iteration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i < 1 kills us? Sorry, will revert.

base/array.jl Outdated
I[cnt] = i
@inbounds aij = A[i,j]
if aij != 0
I[cnt] = i #not inbounds in mulithreaded if A is concurrently mutated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"multithreaded"

@chethega
Copy link
Contributor Author

chethega commented Dec 4, 2017

Re performance: The effect of inbounds on performance is mostly negligible (always predicted branch).

I think the most relevant benchmarkeable effect is on codesize; a predicted branch is so much cheaper than a ccall. Here I am again not so much worried about instruction-cache / decoding, but rather about legibility of the resulting code_llvm and code_native.

Due to the mostly negative reactions, discussions, and me making mistakes, I'd propose a compromise: Just stick to the four most important and most obvious @inbounds (shift!, unshift!, pop!, push!)?

@KristofferC
Copy link
Member

Regarding legibility of the generated code. I rather we add some sort of filtering to the generated code than putting @inbounds in places that are not needed for performance.

Do you have any benchmarks for the shift!, unshift!, pop!, push! functions?

@chethega
Copy link
Contributor Author

chethega commented Dec 4, 2017

Sure, but take them with a mountain of salt (the variability is gigantic, the test is unrealistic, the functions are usually inlined and everything depends on context). But remember: push!/pop! are used in a lot of places, both in base and in user-code, and are typically inlined. The code looks so weird because I need a lot of calls in order to get the timing at least semi-stable, and sequentially writing a big percentage of my memory rather tests my OS at swapping.

_growend!(a::Vector, delta::Integer) =
    ccall(:jl_array_grow_end, Void, (Any, UInt), a, delta)


function push_nib!(a::Array{T,1}, item) where T
    # convert first so we don't grow the array if the assignment won't work
    itemT = convert(T, item)
    _growend!(a, 1)
    a[end] = itemT
    return a
end
    
function push_ib!(a::Array{T,1}, item) where T
    # convert first so we don't grow the array if the assignment won't work
    itemT = convert(T, item)
    _growend!(a, 1)
    @inbounds a[end] = itemT
    return a
end

@noinline function pushtest_nib(n)
    A = Vector{Int}(n)
    for i=1:n
        push_nib!(A, i)
    end
    A
end

@noinline function pushtest_ib(n)
    A = Vector{Int}(n)
    for i=1:n
        push_ib!(A, i)
    end
    A
end


function pushtestouter_nib(n)
   for i in 1:n
       pushtest_nib(n)
   end
end

function pushtestouter_ib(n)
   for i in 1:n
       pushtest_ib(n)
   end
end


pushtestouter_ib(5)
pushtestouter_nib(5)

pushtestouter_ib(5); pushtestouter_nib(5);
const Nm = 10_000
gc()

@time pushtestouter_ib(Nm);@time pushtestouter_ib(Nm);@time pushtestouter_ib(Nm);

gc()

@time pushtestouter_nib(Nm);@time pushtestouter_nib(Nm);@time pushtestouter_nib(Nm);

yielding

  1.414111 seconds (30.08 k allocations: 1.491 GiB, 17.40% gc time)
  1.249303 seconds (30.00 k allocations: 1.491 GiB, 6.28% gc time)
  1.238480 seconds (30.00 k allocations: 1.491 GiB, 6.02% gc time)
  1.263285 seconds (30.00 k allocations: 1.491 GiB, 6.09% gc time)
  1.257711 seconds (30.00 k allocations: 1.491 GiB, 6.07% gc time)
  1.254880 seconds (30.00 k allocations: 1.491 GiB, 6.08% gc time)

If you squint long enough you could guess at 1% improvement, tops, but realistically irrelevant (how would you test this reliably? Normally you just measure allocator state and background noise).

If you look at the code_native of push_ib! for a vector of Int, you see:

	pushq	%rbp
	movq	%rsp, %rbp
	pushq	%r14
	pushq	%rbx
	movq	%rsi, %r14
	movq	%rdi, %rbx
Source line: 16
	movabsq	$jl_array_grow_end, %rax
	movl	$1, %esi
	callq	*%rax
Source line: 17
	movq	(%rbx), %rax
Source line: 64
	movq	24(%rbx), %rcx
	xorl	%edx, %edx
Source line: 17
	testq	%rcx, %rcx
	cmovnsq	%rcx, %rdx
	movq	%r14, -8(%rax,%rdx,8)
Source line: 18
	movq	%rbx, %rax
	popq	%rbx
	popq	%r14
	popq	%rbp
	retq
	nopl	(%rax,%rax)

whereas push_nib!:

	pushq	%rbp
	movq	%rsp, %rbp
	pushq	%r14
	pushq	%rbx
	movq	%rsi, %r14
	movq	%rdi, %rbx
Source line: 8
	movabsq	$jl_array_grow_end, %rax
	movl	$1, %esi
	callq	*%rax
Source line: 64
	movq	24(%rbx), %rcx
	xorl	%eax, %eax
Source line: 9
	testq	%rcx, %rcx
	cmovnsq	%rcx, %rax
	leaq	-1(%rax), %rdx
	cmpq	%rcx, %rdx
	jae	L72
	movq	(%rbx), %rcx
	movq	%r14, -8(%rcx,%rax,8)
Source line: 10
	movq	%rbx, %rax
	leaq	-16(%rbp), %rsp
	popq	%rbx
	popq	%r14
	popq	%rbp
	retq
Source line: 9
L72:
	movq	%rsp, %rcx
	leaq	-16(%rcx), %rsi
	movq	%rsi, %rsp
	movq	%rax, -16(%rcx)
	movabsq	$jl_bounds_error_ints, %rax
	movl	$1, %edx
	movq	%rbx, %rdi
	callq	*%rax
	nopw	(%rax,%rax)

push! is inline and used often, hence we get a lot of bounds-check code emitted. The offending setindex! is literally one code-line below the ccall that makes it inbounds.

@kshyatt kshyatt added the arrays [a, r, r, a, y, s] label Dec 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants