Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Lazier and more interface-based conversion #617

Open
tkf opened this issue Nov 17, 2018 · 16 comments
Open

RFC: Lazier and more interface-based conversion #617

tkf opened this issue Nov 17, 2018 · 16 comments

Comments

@tkf
Copy link
Member

tkf commented Nov 17, 2018

Rather than trying to convert and copy values between Python and Julia, I propose to:

  1. Enhance pyjlwrap and implement most of Python data model such that virtually all Julia values are usable with pyjlwrap. For example, we can translate getindex to __getitem__ etc. such that we can use an AbstractArray without translating it to a Numpy array. We can also implement __array__ in Julia side to make numpy.asarray fast.

  2. Restrict automatic conversion to Julia types which supports ("practically") lossless round-trip (Julia -> Python -> Julia). Maybe something like:

Union{
    Nothing,
    Integer,
    NpyNumber,  # Int64, Float16, Float32, Float64, ...
    Array{<: NpyNumber},
    String,
    Dates.AbstractTime,  # or maybe only the concrete subtypes
    IO,                  # ditto
}

Here is a reference implementation that shows this approach works: https://github.com/tkf/PyBase.jl

It would solve, e.g.,:
#11
#175
#507
#555
#616
JuliaPy/PyPlot.jl#391
JuliaPy/PyPlot.jl#400
JuliaPy/pyjulia#122
JuliaPy/pyjulia#123
SciML/diffeqpy#21

There are many details that can be refined (e.g., rather than using a single Python class, maybe define a few base class? Or even create Python class on-demand?) but first I'd like to know if it is a reasonable direction for PyCall.jl.

@stevengj
Copy link
Member

I'm in favor of making pyjlwrap more full-featured and being at least somewhat more conservative about automatic conversions, although the exact boundaries are debatable.

@tkf
Copy link
Member Author

tkf commented Nov 17, 2018

Good to know! I'll look into Python C API to translate what I did to Julia/PyCall code.

@bicycle1885
Copy link

Is there anything I can help here? If this will happen soon (say, within a few months), I'd like to make it as soon as possible for some reason.

@tkf
Copy link
Member Author

tkf commented Feb 12, 2019

The roadmap is explained in #629. This feature would be a breaking change so I think we need to wait until 2.0. Currently I think there is just one issue #636 before releasing 1.19. My understanding is that, once 1.19 is released, we can start implementing it. But I'm not super confident that this can happen in a few months. Sorry about non-definitive answer...

@bicycle1885
Copy link

Thank you. So, maybe I should wait for #636 and v1.19 at least. Then, if you need some human resources for this and #629, please let me know!

@tkf
Copy link
Member Author

tkf commented Feb 15, 2019

So #636 is now closed and 1.19 would probably be released soon. But I think the only type of human resources required here is the ones who write the code (I know it's a bit harsh to say but that's kind of always true in OSS...). I'm not working on it right now and I guess there are no one else. If you want to implement it please go ahead and do it. I think there are some useful code in my PyBase.jl so I can point to how to integrate that.

By the way, I always assume that @stevengj wants to use C-API for implementing it. If using Python "shim" class as I did PyBase.jl is OK, adding this feature would be pretty easy (basically just copying the code from PyBase.jl). So convincing him to go to this direction, maybe at least for the first implementation, is another way to speed it up. (But using C-API may not be so hard. It's just that how difficult it would be is more uncertain to me.)

@bicycle1885
Copy link

Thanks, maybe I need to familiarize myself with the code of PyCall.jl. I've glanced over PyBase.jl and found that approach is really elegant and promising! I'm going to spare my time to understand more about these two packages this weekend.

@tkf
Copy link
Member Author

tkf commented Apr 8, 2019

@marius311 Answering your comments in #676

From a quick glance it looks like the way to proceed is to give the jlwrap object __add__, __mul__, etc... via some mechanism or another somewhere near https://github.com/JuliaPy/PyCall.jl/blob/master/src/pytype.jl#L379, either by adding to that if-clause, or something else.

This probably would work (not 100% certain as it's a C API) but it would need to create a Python callable object so it wouldn't be as efficient as it can be. If we want to squeeze out performance, we may need to use https://docs.python.org/3/c-api/typeobj.html#number-object-structures

But we can also start from slow implementation based on PyBase.jl and try to make it correct by adding tests as much as possible first. See my previous comment #617 (comment)

@tkf
Copy link
Member Author

tkf commented Apr 8, 2019

@stevengj Are you OK with start implementing this feature based on Python shim class as in PyBase.jl? That is to say, having a Python class with a bunch of methods like this:

    def __add__(self, other):
        return self.__add(self.__jlwrap, other)

    def __sub__(self, other):
        return self.__sub(self.__jlwrap, other)

    def __mul__(self, other):
        return self.__mul(self.__jlwrap, other)

    def __matmul__(self, other):
        return self.__matmul(self.__jlwrap, other)

where __jlwrap is the Julia object that is wrapped and __add is the Julia function implementing the Python/Numpy semantics.

See: https://github.com/tkf/PyBase.jl/blob/bfaa050362204d913c41294d8092a10ffe097b0a/src/pybase_jl/wrappers.py#L206-L491

We can always move things to use C API after implementing enough test set to make sure the implementation gets the right semantics.

@c42f
Copy link

c42f commented Jul 3, 2019

I started looking at what it would take to solve #507, which eventually led to this issue. For what it's worth I think the PyBase approach looks very promising and I'd prefer to wait for that rather than solving these various issues independently.

@tkf
Copy link
Member Author

tkf commented Jul 26, 2019

I just discussed with @stevengj in hackathon. He prefers to avoid indirection (Python shim class) and implements everything using C API.

@cjdoris
Copy link

cjdoris commented Sep 8, 2019

Is this being actively worked on? If not, I have started adding some methods to pyjlwrap myself. Now I've started it shouldn't take long to do most of the data model.

@tkf
Copy link
Member Author

tkf commented Sep 8, 2019

I think nobody is working on it. It would be great if you can send a PR.

@cjdoris
Copy link

cjdoris commented Sep 9, 2019

Ok I'll work on it.

@cjdoris
Copy link

cjdoris commented Dec 20, 2019

I have just made a PR for jlwrap.

On the topic of automatic conversion, my two cents is:

  • From Julia to Python, we can be fairly liberal and try to convert as much as we can. It's easy for the user to specify a different conversion by simply coercing the value to a PyObject themselves.
  • We should only convert to types in the Python standard library, so e.g. no automatic conversion to a numpy array. I'd say that vectors should translate to lists, matrices to lists of lists, etc. We should provide alternatives, such as converting arrays to bytes or bytearray or array.array or numpy.ndarray...
  • From Julia to Python, it is harder for the user to specify what they want, so conversion here should be much more conservative.
  • I think only simple, scalar, immutable types should be converted to their native Julia type (so None, strings, numbers, tuples (?), dates). We could also automatically apply wrappers, so sequences get wrapped in PyVector, buffers in PyArray, mappings in PyDict etc, since it is at least easy to convert these back to PyObject or to a different wrapper if that's what you want.
  • Conversions should depend only on types and not on values. Currently, if a list of lists happens to all be the same length, it is converted to a matrix, or otherwise to a vector of vectors. This makes it hard to predict what will happen.

@TomDeWeer
Copy link

Is there any progress on this so far? I'm having performance problems when working with sparse arrays and I think this might be the cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants