Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminating allocation of ccall root in simple cases #22734

Merged
merged 3 commits into from
Jul 15, 2017
Merged

Conversation

yuyichao
Copy link
Contributor

Applied on top of #22684, this should eliminate most if not all of the allocation in ccall that was previously only non-allocationg when using the to be deprecated & syntax.

@yuyichao
Copy link
Contributor Author

As shown in the test, this PR along should elminate the allocation for Ref calling convention on a mutable type. Together with #22684, it should always be possible to transform custom ptr_arg_cconert and ptr_arg_unsafe_convert to somthing like cconvert(Ref{T}, ...) = (ptr_arg_cconvert(T, ...), Ref{T}(ptr_arg_unsafe_convert(T, ...))) and unsafe_convert(T, obj) = unsafe_convert(Ref{T}, obj[2]) with appropriate inlining without additional allocations.

@yuyichao
Copy link
Contributor Author

The one merged later in this pr and #22684 should include the following test case. It doesn't currently pass on any of the PR but pass with the two combined.

diff --git a/test/codegen.jl b/test/codegen.jl
index 83ceb9b937..fb73e8ae83 100644
--- a/test/codegen.jl
+++ b/test/codegen.jl
@@ -146,6 +146,25 @@ Base.unsafe_convert(::Type{Ptr{BadRef}}, ar::BadRef) = Ptr{BadRef}(pointer_from_

 breakpoint_badref(a::MutableStruct) = ccall(:jl_breakpoint, Void, (Ptr{BadRef},), a)

+struct PtrStruct
+    a::Ptr{Void}
+    b::Int
+end
+
+mutable struct RealStruct
+    a::Float64
+    b::Int
+end
+
+function Base.cconvert(::Type{Ref{PtrStruct}}, a::RealStruct)
+    (a, Ref(PtrStruct(pointer_from_objref(a), a.b)))
+end
+Base.unsafe_convert(::Type{Ref{PtrStruct}}, at::Tuple) =
+    Base.unsafe_convert(Ref{PtrStruct}, at[2])
+
+breakpoint_ptrstruct(a::RealStruct) =
+    ccall(:jl_breakpoint, Void, (Ref{PtrStruct},), a)
+
 if opt_level > 0
     @test !contains(get_llvm(isequal, Tuple{Nullable{BigFloat}, Nullable{BigFloat}}), "%gcframe")
     @test !contains(get_llvm(pointer_not_safepoint, Tuple{}), "%gcframe")
@@ -161,6 +180,10 @@ if opt_level > 0
     breakpoint_badref_ir = get_llvm(breakpoint_badref, Tuple{MutableStruct})
     @test !contains(breakpoint_badref_ir, "%gcframe")
     @test !contains(breakpoint_badref_ir, "jl_gc_pool_alloc")
+
+    breakpoint_ptrstruct_ir = get_llvm(breakpoint_ptrstruct, Tuple{RealStruct})
+    @test !contains(breakpoint_ptrstruct_ir, "%gcframe")
+    @test !contains(breakpoint_ptrstruct_ir, "jl_gc_pool_alloc")
 end

 function two_breakpoint(a::Float64)

@yuyichao yuyichao requested review from vtjnash and Keno July 10, 2017 19:58
@andyferris
Copy link
Member

Cool. Getting rid of & as a special case will be nice. :)

The allocation that had bothered me previously was when I wanted to ccall a pointer to an object on the (Julia) caller's stack, which gets written to by the callee. Basically I wanted a C function to initialise an isbits immutable cheaply, but as is common in C/C++/fortran/etc, the function I was calling did not return the value but rather wrote to a reference. In certain simple cases the allocation/GC costs can be a bit inconvenient, just for an integer or something.

My question is - are we closer to removing allocations to local Refs which get passed to a ccall and unwrapped afterwards? (Or something else to this effect)

E.g.

function f()
    x = Ref{Int}() # heap allocation?
    ccall((:foo, :bar), Void, Tuple{Ptr{Int}}, x)
    return x[]
end

@yuyichao
Copy link
Contributor Author

#22684

@andyferris
Copy link
Member

duh, thanks

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need any docs changes?

elseif head === :foreigncall
args = e.args
nccallargs = args[5]::Int
# Only arguments escape, GC root arguments do not escape.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most arguments also don't escape (get copied to the stack or passed in registers). AFAIK, only those with a declared type of Any can escape?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"escape" here refer to the structure/layout. If it's used as an argument then we do need to allocate memory to make it have the correct layout. If that use doesn't escape the pointer, it'll be the job of llvm-alloc-opt.cpp to remove the heap allocation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, the reference to gc made this confusing. but also because the main purpose of the gc-root arguments is to indicate what object(s) need(s) to be pinned in memory during the call. Perhaps a clearer comment might be to mention that this function is computing whether the structure layout is observable from the ccall target, but that the gc-root arguments are not passed through to the child. But also, is that actually true?

Copy link
Contributor Author

@yuyichao yuyichao Jul 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding that comment.

But also, is that actually true?

It is not true just from the ccall itself. E.g. I think it is perfectly valid if one write

function cconvert(....)
    a = Ref(b)
    (a, pointer_from_objref(a))
end
function unsafe_convert(..., v)
    v[2]
end

For the ccall, a will only appear in the root slot but it cannot be deconstructed since there are escaped use of a's address. OTOH, if we scan through all the uses and there isn't any escaped use, then I think it is valid that we can elide the allocation. I mean, the user is not even using it.....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yes, that sounds correct.

next_i += 1

symequal(args[i], tupname) || continue
splice!(args, i, vals)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment that this splice is removing gc roots

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*replacing gc roots.

Sure.

@@ -1380,7 +1380,7 @@ static std::pair<CallingConv::ID, bool> convert_cconv(jl_sym_t *lhd)
if (lhd == jl_symbol("stdcall")) {
return std::make_pair(CallingConv::X86_StdCall, false);
}
else if (lhd == jl_symbol("cdecl")) {
else if (lhd == jl_symbol("cdecl") || lhd == jl_symbol("ccall")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ccc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaultcc ? ccc will probably just be cdecl. I just want to make it clear that this is the default one for ccall. Or I can just use cdecl if we will never make anything else the default in any platform/target dependent way. That's also why this is intentionally left out as valid calling convention in the parser.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok. mhm, this seems fine as-is

@vtjnash
Copy link
Member

vtjnash commented Jul 14, 2017

A note in https://github.com/JuliaLang/julia/blob/69a0886c3906f8fee1f83b52f5c948d01855147b/doc/src/manual/calling-c-and-fortran-code.md#garbage-collection-safety

and the foreigncall head in list at https://github.com/JuliaLang/julia/blob/69a0886c3906f8fee1f83b52f5c948d01855147b/doc/src/devdocs/ast.md#expr-types

We will want to add an entry to NEWS at some point too, but maybe just open an issue to deprecate & syntax and we can add the NEWS there.

@yuyichao
Copy link
Contributor Author

A note in https://github.com/JuliaLang/julia/blob/69a0886c3906f8fee1f83b52f5c948d01855147b/doc/src/manual/calling-c-and-fortran-code.md#garbage-collection-safety

What do you think should be added there? I don't think this actually change any user visible API for ccall.

We will want to add an entry to NEWS at some point too, but maybe just open an issue to deprecate &syntax and we can add the NEWS there.

This should be the less significant one. If any, news entry should probably go to #22684

@yuyichao
Copy link
Contributor Author

Does this need any docs changes?

Apart from the reduction in allocation the only user visible change should be the format of the internal Expr(:foreigncall). I searched through all mentions of it and updated accordingly (being different from ccall makes this a great deal easier...) There is one mention of it in the doc that's updated and I couldn't find the actual meaning of the argument mentioned anywhere other than code comments. We probably should at some point but that's another PR....

ah, ok. mhm, this seems fine as-is

I'll add a comment about it there...

@vtjnash
Copy link
Member

vtjnash commented Jul 14, 2017

Just a reminder / example of correct rooting in ccall and a (additional) warning / example that the argument pointer memory is destroyed immediately when the ccall returns, unless the user made sure to also leak the object elsewhere global.

For instance, what can we say about the pointer seen by consecutive ccalls to the same Ref object?

I think experience has shown that we can never say enough about how to allocate objects correctly for use in a ccall.

@vtjnash
Copy link
Member

vtjnash commented Jul 14, 2017

Apart from the reduction in allocation

Yep, that's the part I was thinking should have an explicit note :)

@yuyichao
Copy link
Contributor Author

yuyichao commented Jul 14, 2017

the argument pointer memory is destroyed immediately when the ccall returns, unless the user made sure to also leak the object elsewhere global.

This actually haven't changed. The change in this PR is purely an optimization since it is based on escape info. If the pointer is take for the object (or if the object is used in any way that is not ccall root or getfield (and setfield! and isdefined in furture PR)) the allocation won't be eliminated.

@yuyichao
Copy link
Contributor Author

Added more comment for each of the cases above. Also added a commit attempting to make the relation between cconvert and unsafe_convert more clare (it was not mentioned in the docstring that the allocation should be done in cconvert and that it's result should be kept alive for as long as the result of unsafe_convert needs to be valid). Also mention that tuple can be used to return multiple objects that needs to be alive.

[`unsafe_convert`](@ref) to handle. The result of this function should be kept valid (for the GC)
until the result of [`unsafe_convert`](@ref) is not needed anymore.
This can be used to allocate memory that will be accessed by the `ccall`.
If multiple object needs to be allocated, a tuple of the objects can be used as return value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple objects need to

@@ -881,7 +884,7 @@ trunc
"""
unsafe_convert(T,x)

Convert `x` to a value of type `T`
Convert `x` to a C argument of type `T` where `x` is the return value of `cconvert(T, ...)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean? x is whatever the user inputs, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it isn't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh? the signature says unsafe_convert(T,x)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, depend on what do you mean by user input. x is the input to this function, sure, but it's basically never what the user inputs. It must always be the return value of cconvert.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not very clear what you mean. x is the input to the unsafe_convert function in this docstring. So how is cconvert relevant to that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should elaborate on that a bit in the docstring then, probably with a cross-reference to the full ccall docs which I hope explain this in more detail. Something like "This function should not usually be called directly, it gets called on the output from cconvert"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently the full document of the function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh? I'm saying the part that you're adding here is unclear and would be better if phrased differently. "where x is the ..." doesn't make sense in this context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think I've explained what I wanted to say clear enough so I can take any suggestions as long as the first sentense doesn't make it sound like this is doing a simple convertion and make it clear that the x is the output of cconvert.

By "the full document of the function" I mean I don't think it is better explained in the manual and I think it makes sense to describe it here in the doc string since it's a property of the function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, you're saying there's nothing more detailed to cross-reference to then? My interpretation or suggestion is to explain the intended usage as another sentence after the sort-of-obvious Convert `x` to a C argument of type `T` . The "where" phrasing doesn't make that restriction very clear to me.

# GC root arguments do not.
# Also note that only being used in the root slot for this ccall itself
# does **not** mean that the object is not needed during the ccall.
# However, if it's address is never taken
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its address

# Also note that only being used in the root slot for this ccall itself
# does **not** mean that the object is not needed during the ccall.
# However, if it's address is never taken
# and the object is never used in a way that escapes it's layout, we can be sure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its layout

next_i += 1

symequal(args[i], tupname) || continue
# Replace the gc root arguments with it's fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argument singular or plural? if singular, "root argument with its fields" - if plural, "root arguments with their fields"

@yuyichao yuyichao force-pushed the yyc/foreigncall branch 2 times, most recently from 496a09d to 71a09d9 Compare July 15, 2017 13:12
# If the slot is not leaf type, it could be either isbits or not.
# If it is actually an isbits type and the type inference can infer that
# it can rebox the `b.x` if we simply call `pointer_from_objref(b.x)` on it.
# Instead, explictly load the pointer from the `RefValue` so that the pointer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicitly (missing second i)

yuyichao added 3 commits July 15, 2017 14:53
Explicitly specify the number of ccall arguments and put the gc root arguments at the end.
This way we can specify variable number of roots, making it easier for typeinf to split the root.
into smaller allocations.
Also fix and test an invalid `unsafe_convert` definition in Base.
The result of `pointer_from_objref` on isbits types is never valid to use
across safepoint.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants