ccall should automatically convert integers #132

JeffBezanson · 2011-07-18T22:20:34Z

This can be done for arguments whose types can be inferred. It will make things more convenient in general, but the cost is that it won't be easy to predict when an explicit conversion needs to be written. At the very least it will improve the situation with literals, so you don't need to write int32(0).

The text was updated successfully, but these errors were encountered:

StefanKarpinski · 2011-07-19T03:14:01Z

This makes me think: can we use type inference with numeric literals? This is sort of the opposite of what other languages do, but it would probably give us 99% of what we want: when a numeric literal is used in a context where we can figure out what type it ought to be — e.g. x + 0 — can we just make it be of the right type automatically? When inference is impossible, numeric literals are just of the default type for that kind of literal. Sure, it doesn't work if you have a function or something, but whatever. It handles most of the cases and requires writing zero(x) and one(x) a million times fewer.

StefanKarpinski · 2011-07-19T03:15:20Z

In fact, maybe this is just specific case of that — that would make it less confusing: variables have to be cast, literals will automatically be converted to the right type.

JeffBezanson · 2011-07-19T03:36:13Z

Actually that is exactly what Haskell does. + is defined only for arguments of the same type, so one argument can determine the type of the operation and force a literal to be of the same type. I don't see how we can do that though. Both arguments determine what method to apply, and any method signature can be defined, so there isn't a unique answer to what type an argument "ought" to be.

StefanKarpinski · 2011-07-19T19:34:26Z

Back to the original issue: can we have ccall always convert(T,v) before applying? That would make the way ccall works a lot simpler and more transparent.

JeffBezanson · 2011-07-19T19:42:35Z

The big problem here is the way we handle pointers. If you pass an Int32 to a ccall expecting Ptr{Int32}, it passes a pointer to your integer value. But converting an integer to a pointer uses the bits of the integer as the value of the pointer.

Another issue is when to insert the calls to convert. Since ccall uses normal function call syntax, it's hard for the front end to know exactly what is and isn't a ccall. I guess we could make ccall a reserved word.

StefanKarpinski · 2011-07-19T20:11:39Z

The big problem here is the way we handle pointers. If you pass an Int32 to a ccall expecting Ptr{Int32}, it passes a pointer to your integer value. But converting an integer to a pointer uses the bits of the integer as the value of the pointer.

See, I didn't know that. ccall is kind of a big mystery and has all sorts of magical behaviors. It would be nice to make it less magical. I think it's acceptable to require calling pointerto(1) if you want a pointer to an integer. You have to do one = 1; and later use &1 in C, so this is not too difficult. Maybe we want a different function than just convert for this — cconvert or something like that — so that it can have different behavior than normal convert.

Another issue is when to insert the calls to convert. Since ccall uses normal function call syntax, it's hard for the front end to know exactly what is and isn't a ccall. I guess we could make ccall a reserved word.

I would be completely find with ccall being a reserved word. It's a pretty key piece of infrastructure and its not like it's actually reasonable for anyone to add methods to it or redefine it or anything like that.

JeffBezanson · 2011-07-19T20:31:24Z

If a garbage collection occurs between calling pointerto() and the ccall that uses it, you could get a segfault. There's a reason for the magic. Actually, exec in process.j seems to have this bug already.
It seems we would need to expose the needed object pinning to julia, like

@pin begin
    ...
    p = pointerto(pin(x))
    ...
    ccall(...)
end

StefanKarpinski · 2011-07-19T21:23:48Z

Hmm. That's nasty. How about preventing GC for the duration of the function instead?

JeffBezanson · 2011-07-19T21:27:46Z

Preventing GC during the ccall doesn't help, since the problem is GC between pointerto and ccall. Preventing GC during the function that uses ccall is a complete non-starter.

StefanKarpinski · 2011-07-20T14:00:48Z

How about this: pointerto doesn't actually create a pointer to an object immediately, but rather just wraps it to indicate to ccall that it should take a pointer to the object in question, rather than applying convert to it as it normally would. In fact, in this case, we should just write pointerto(obj) as Ref(obj) — since that already essentially does that. Then we just need to make the conversion of a Ref to a C pointer value be a pointer to its referent, which seems sane. Since the actual dereference is under ccall's control, GC can be stopped before it happens. Does this make sense?

JeffBezanson · 2011-07-20T15:56:22Z

I think that would do it, since we need some way to tag that a value needs to be converted differently. Doing heap allocation for this is unfortunate though. Actually you can already do this by wrapping numbers in arrays, x => [x].

I can rewrite the ccall to keep the original arguments so the code generator can make sure they are gc roots:

ccall(:f, R, (A, B), arg1, arg2)

to

ccall(:f, R, (A, B), convert(A,arg1), convert(B,arg2), arg1, arg2)

So there is no need to disable GC.

StefanKarpinski · 2011-07-20T16:17:19Z

Do you mean this:

ccall(:f, R, (A, B), convert(A,arg1), convert(B,arg2))

? Any chance that the compiler can figure out that heap allocation is unnecessary because the Ref object doesn't escape the function? This, of course, requires a special understanding that passing something to ccall doesn't actually cause it to escape, while at the same time causing it to become a GC root during the execution of the ccall.

JeffBezanson · 2011-07-20T16:27:46Z

Well, that's probably possible with enough of my time :)

I think the bigger problem is that a lapack call would go from this:

int32(m), int32(n), QR, int32(m), jpvt, tau, work, lwork, info

to this:

Ref(m), Ref(n), QR, Ref(m), jpvt, tau, work, Ref(lwork), info

which is not the kind of improvement I was hoping for by filing this issue :)

StefanKarpinski · 2011-07-20T18:38:02Z

But the LAPACK calling convention is clearly not the norm — it's very ugly. Optimizing for it seems silly. I think that the latter version is actually much simpler to understand: it's shorter and the fields that should be passed as pointers are instead passed as Refs. That's a pretty simple thing for users to wrap their heads around:

Convert is called to convert each argument to the type expected by the function
Refs are used for values passed by reference (i.e. pointer)

This would be infinitely more explainable and usable than what we currently have, IMO.

JeffBezanson · 2012-02-10T03:31:41Z

You know, the syntax &x is available. We could use that in ccall to mark which things get their address passed, while everything else simply has convert called on it. And it can be syntactic, i.e. no particular function or object type is involved, so there are no efficiency or escape analysis issues.

StefanKarpinski · 2012-02-10T04:08:22Z

Yes!!

- conversions are inserted for the arguments, to the ccall argument types - syntax &x is used to pass a pointer to a scalar

JeffBezanson · 2012-02-16T06:10:51Z

Ok, I have this change basically implemented on a new branch. However, it is causing a lot of unexpected difficulty. Mostly, there are some performance regressions of about 10-15% in randmatstat and printfd. There were some changes to inlining behavior, but those did not seem to be the cause of the regressions, and I'm still not sure what's going on.

The other problem is GC roots. The reason I wrote

ccall(:f, R, (A, B), convert(A,arg1), convert(B,arg2), arg1, arg2)

is so the code generation for ccall has access to the original arguments before conversion, in case they are new arrays that get converted to pointers. Now, this might seem silly, since lowering has to pull expressions into variables anyway to avoid duplicating side-effects:

temp = [1,2]
ccall(..., convert(A, temp), temp)

But what if we have

f(x,y) = ccall(:f, Void, (Ptr,Ptr), x,y)
g() = f([1,2],[3,4])

We have to avoid lowering and inlining this to

g() = ccall(:f, Void, (Ptr,Ptr), convert(Ptr,[1,2]), convert(Ptr,[3,4]))

Oops!

I don't have this fully worked out. If lowering emits the argument twice, that at least signals to the inliner that it needs to generate a temporary variable. I guess I could alternate real/dummy arguments:

g() = (t1=[1,2];t2=[3,4];ccall(:f, Void, (Ptr,Ptr), convert(Ptr,t1), t1, convert(Ptr,t2), t2))

Every other argument is completely superfluous, but has to be there!

Needless to say, this problem isn't unique to ccall, and any use of array-to-pointer conversion is a potential ticking time bomb. But in ccall we have to deal with it since we insert the conversions.

Yet another problem is conversion inside &:

i = 0
ccall(:f, Void, (Ptr{Int16},), &i)

The current implementation doesn't support this, and you have to write

ccall(:f, Void, (Ptr{Int16},), &int16(i))

To insert the conversion automatically, we'd have to take apart the type Ptr{Int16}. And we don't know what the type is until later. Maybe a solution is to define a special function ptr_arg_convert used for & arguments instead of convert, defined as follows:

ptr_arg_convert{T}(::Type{Ptr{T}}, x) = convert(T, x)

StefanKarpinski · 2012-02-16T14:00:38Z

I can't comment on the performance issue since I'm not really in a position to advise. I'm not clear on how gc can possibly occur during a ccall and cause the gc rooting thing to be a problem. No gc will happen while the ccall is executing and by the time it returns, its safe to free the memory unless it's rooted somewhere else. The ptr_arg_convert approach seems reasonable since only we have to use it and it really only needs that one method definition. It's really just a way of leveraging our dispatch system to avoid writing C code that looks into the parameter of the Ptr.

JeffBezanson · 2012-02-16T16:37:05Z

Just think of the example f(convert(Ptr,[1,2]),convert(Ptr,[3,4])).

StefanKarpinski · 2012-02-17T01:03:58Z

Spell it out. I'm not getting it. The ccall takes the pointers and computes something with them. As long as it doesn't return the pointers, expecting them to still point to valid memory, it's fine. If f does something like that, then you shouldn't pass it un-rooted memory.

JeffBezanson · 2012-02-17T01:36:23Z

When [3,4] is allocated, a GC can be triggered, freeing [1,2], then both pointers are passed in but the first one is now invalid. And it can be hard to tell what is "rooted" because of code transformations.

StefanKarpinski · 2012-02-17T21:00:57Z

A crappy hack would be to temporarily root all literals while evaluating the expression they appear in.

JeffBezanson · 2012-02-17T21:27:13Z

Well, it has nothing to do with literals. It could be any expression f(x). The fundamental problem is that you can make a reference to an object that is not visible as such. As long as that situation exists, any fixes will just be pushing the problem elsewhere.

hand_grenade() = pointer([1,2])

There is no problem if you don't call pointer. In the case of ccall, there is no problem as long as you pass the underlying julia object to the ccall --- that way I can be sure it stays rooted no matter what.

StefanKarpinski · 2012-02-19T02:34:06Z

Right, I get that pointer can still be used to make "hand grenades" like this, but my hack would take care of the specific case of ccall, no? That was all I was proposing. We can either do away with pointers in Julia altogether or allow them and make sure people know they're dangerous like they are in C. There's a lot fewer places where they can and should be used — almost exclusively to pass to ccall, in fact. Maybe we can design a mechanism that's pretty specific to ccall and is safe. Like using an ArrayPointer object that contains the array and an index and is converted specially by ccall into a pointer. That still leaves the problem of pointers returned from C. Maybe we can have a BarePointer type that can be returned and converted into an ArrayPointer by combining it with the appropriate array object. If the BarePointer doesn't point into the array object given, an exception is thrown; if it does point into the array object given, it can safely be converted into an ArrayPointer object. You can still do arithmetic, etc. with an ArrayPointer object, but a BarePointer is completely opaque and cannot be operated on.

JeffBezanson · 2012-03-07T01:45:55Z

This is still in progress! I put the branch back in sync. How do you feel about merging into master even if the mysterious printfd perf regression remains? Perhaps we can address it over time while enjoying the benefits of the better ccall?

StefanKarpinski · 2012-03-07T18:17:25Z

I'm cool with that. I think at this point shaking out the issues in the new-and-improved ccall is more important than printf performance. So, yeah, let's do that. Better send a message to the dev list explaining the change though.

choffstein · 2012-03-07T21:04:29Z

Pardon the potentially stupid qustion, but why, in your above example f(convert(Ptr,[1,2]),convert(Ptr,[3,4])), JeffBezanson, is [1,2] being freed during the GC? What is causing it to be marked? You have an anonymous object pointing to your literal object. It seems to me, given the way the AST is set up, that there should be an existing reference to it on the current stack-frame.

I guess unless you are tracing your AST in realtime to see if the reference is still needed, maybe you can just say "don't mark or sweep anything on the current stack-frame"? Which, thinking out-loud, would lead to some incredibly unwieldy memory usage for poorly written applications with lots of single-use, large, temporary variables.

Or really, isn't the issue just with anonymous variables -- especially those pointing to literals? It seems like you might have a nice corner case for root objects pointing to the current stack-frames literal pool?

StefanKarpinski · 2012-03-07T21:25:45Z

For one thing, the AST doesn't exist anymore at run-time because everything is JITed to machine code before running. Not sure about the rest...

JeffBezanson · 2012-03-07T21:27:20Z

Don't worry about it. Ignore the man behind the curtain. Everything will be fine.

StefanKarpinski · 2012-03-07T21:28:09Z

Hahahahaha. Well, I'll trust in that.

choffstein · 2012-03-07T21:30:03Z

Well, the lack of AST makes sense at run-time if everything is JITed (I forgot that pleasant detail of Julia -- still getting acquainted over here). I still don't understand why the literal is getting marked by the GC.

I'd prefer to not ignore the man behind the curtain and rather understand his seemingly crazy behavior.

JeffBezanson · 2012-03-07T21:46:41Z

The core issue is that a Ptr is basically an integer type, as in C. Currently we assume that a Ptr points to a random address, and so we don't do anything to interpret that address. So if a Ptr happens to refer to a julia object, it will not prevent that object from being freed. Conceivably, we could determine that the Ptr refers to a julia object, but that would be expensive.

JeffBezanson · 2012-03-07T21:48:42Z

Also, you will want to re-read your sources on mark/sweep GC; marking is what happens to live objects.

choffstein · 2012-03-07T22:29:51Z

Are you saying that the literal isn't live? I'm assuming you're saying it isn't live because it is unreachable. I would say that it SHOULD be live, or at least that is what I would expect in writing that code. Treating Ptr as an integer type and not checking if it is referring to a julia object certainly leads to "unexpected" behavior from a code-writers point of view.

I understand that checking if Ptr refers to a julia object definitely creates overhead, but it seems like it the right solution to match coder expectation. Plus, and maybe this is just my naiveté, but it seems to me like the overhead isn't in the speed-critical areas of the code I would write anyway -- just in conversions with ccall.

JeffBezanson · 2012-03-07T22:36:22Z

The important thing to remember is that you don't really need to explicitly convert objects to Ptr. The only real use of that is calling C, and if you use ccall normally (which converts for you, as in ccall(:foo, Void, (Ptr,), array)) there is no problem.

StefanKarpinski · 2012-03-07T22:43:41Z

Are you saying that the literal isn't live? I'm assuming you're saying it isn't live because it is unreachable. I would say that it SHOULD be live, or at least that is what I would expect in writing that code. Treating Ptr as an integer type and not checking if it is referring to a julia object certainly leads to "unexpected" behavior from a code-writers point of view.

I understand that checking if Ptr refers to a julia object definitely creates overhead, but it seems like it the right solution to match coder expectation. Plus, and maybe this is just my naiveté, but it seems to me like the overhead isn't in the speed-critical areas of the code I would write anyway -- just in conversions with ccall.

It's not a matter of checking anything about the Ptr object itself. Making the existence of a Ptr object that points to a Julia object prevent that Julia object from being garbage collected requires scanning all live Ptr objects each time a Julia object is considered for GC to see if that pointer happens to point to the object. That's really expensive.

StefanKarpinski · 2012-03-07T22:49:58Z

One could potentially make Ptrs more like some kind of Ref object that explicitly references another Julia object:

type Ref{T}
  value::T
end

This kind of construct interacts just fine with GC. However, then the problem comes when you want a pointer that doesn't refer to any Julia object: an opaque pointer returned from ccall or a pointer into the middle of an array.

add rem(::Integer, ::Type{T<:Integer}}) functions

…JuliaLang#132) grabbag of other style fixes

* Delete v1 track icons * Delete svg version of track icon

The track repo no longer contains an icon (JuliaLang#132)

* Functions for trimming outliers and robust statistics (JuliaLang#132). - Added trim, trim!, winsor, winsor!, trimvar. - Deprecated trimmean. * remove dep warns * finished docs for winsor, trim, etc

Enclose signatures_at in try/catch (fixes #132)

ghost assigned JeffBezanson Jul 18, 2011

JeffBezanson added a commit that referenced this issue Feb 16, 2012

ccall improvements for issue #132

a378b75

- conversions are inserted for the arguments, to the ccall argument types - syntax &x is used to pass a pointer to a scalar

JeffBezanson closed this as completed Mar 8, 2012

StefanKarpinski pushed a commit that referenced this issue Feb 8, 2018

Merge pull request #132 from JuliaLang/remint

426114c

add rem(::Integer, ::Type{T<:Integer}}) functions

KristofferC added a commit to KristofferC/julia that referenced this issue Feb 18, 2018

refactor parsing using enums and structs and use enums in more places (…

ccde7dd

…JuliaLang#132) grabbag of other style fixes

cmcaine pushed a commit to cmcaine/julia that referenced this issue Sep 24, 2020

Delete v1 track icons (JuliaLang#132)

6d10216

* Delete v1 track icons * Delete svg version of track icon

cmcaine pushed a commit to cmcaine/julia that referenced this issue Sep 24, 2020

Remove v1 icon notice from README

e1d2658

The track repo no longer contains an icon (JuliaLang#132)

Keno pushed a commit that referenced this issue Oct 9, 2023

Enclose signatures_at in try/catch (fixes #132)

7454432

Keno pushed a commit that referenced this issue Oct 9, 2023

Merge pull request #135 from JuliaDebug/teh/fix_132

ab60978

Enclose signatures_at in try/catch (fixes #132)

ccall should automatically convert integers #132

ccall should automatically convert integers #132

Comments

JeffBezanson commented Jul 18, 2011

StefanKarpinski commented Jul 19, 2011

StefanKarpinski commented Jul 19, 2011

JeffBezanson commented Jul 19, 2011

StefanKarpinski commented Jul 19, 2011

JeffBezanson commented Jul 19, 2011

StefanKarpinski commented Jul 19, 2011

JeffBezanson commented Jul 19, 2011

StefanKarpinski commented Jul 19, 2011

JeffBezanson commented Jul 19, 2011

StefanKarpinski commented Jul 20, 2011

JeffBezanson commented Jul 20, 2011

StefanKarpinski commented Jul 20, 2011

JeffBezanson commented Jul 20, 2011

StefanKarpinski commented Jul 20, 2011

JeffBezanson commented Feb 10, 2012

StefanKarpinski commented Feb 10, 2012

JeffBezanson commented Feb 16, 2012

StefanKarpinski commented Feb 16, 2012

JeffBezanson commented Feb 16, 2012

StefanKarpinski commented Feb 17, 2012

JeffBezanson commented Feb 17, 2012

StefanKarpinski commented Feb 17, 2012

JeffBezanson commented Feb 17, 2012

StefanKarpinski commented Feb 19, 2012

JeffBezanson commented Mar 7, 2012

StefanKarpinski commented Mar 7, 2012

choffstein commented Mar 7, 2012

StefanKarpinski commented Mar 7, 2012

JeffBezanson commented Mar 7, 2012

StefanKarpinski commented Mar 7, 2012

choffstein commented Mar 7, 2012

JeffBezanson commented Mar 7, 2012

JeffBezanson commented Mar 7, 2012

choffstein commented Mar 7, 2012

JeffBezanson commented Mar 7, 2012

StefanKarpinski commented Mar 7, 2012

StefanKarpinski commented Mar 7, 2012