Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion] Should functions like spawn take a coroutine object, or the classic fn+args+kwargs tuple? #122

Closed
njsmith opened this issue Nov 20, 2016 · 15 comments

Comments

@njsmith
Copy link
Contributor

njsmith commented Nov 20, 2016

Right now, curio.run and curio.spawn take a coroutine object, i.e., the return value from calling an async function:

async def my_async_function(...):
    ...
curio.spawn(my_async_function(arg1, arg2))

This is fairly elegant and I'm not sure it's wrong, but I've run into a few situations where it's feeling a bit awkward, so I wanted to raise for the discussion the possibility that they should instead take an async callable + arguments:

curio.spawn(my_async_function, arg1, arg2)

The arguments for taking a callable:

  • It's more parallel to the non-async/await way of doing things, which is nice for teaching and general simplicity. (I think this is more important than it might look: if it weren't for curio.run and curio.spawn, you could get a long way with beginners pretending that await f(...) is a single indivisible piece of syntax, an "async call". Obviously you want to break this down eventually, but curio.run and curio.spawn force you to do it immediately.)
  • In some cases, the function that takes the coroutine might actually decide whether or not to actually run it based on some logic. But the CPython runtime is quite certain that if you ever create a coroutine object, you are definitely going to run it, and it will yell at you if you don't. (A case that I ran into is a supervisor class's spawn_supervised(coro) method, which might error out if the supervisor is not running. Of course my tests have to invoke this, so get an unavoidable warning...) For consistency, it would be nice if these functions used the same calling convention as curio.spawn.
  • In some cases, the function that takes the coroutine might need to be able to run it multiple times. Examples would include curio.tcp_server, or the spawn_supervised method on a task supervisor that supports automatically restarted crashed tasks (like Erlang supervisors do). For consistency, it would be nice if these functions used the same calling convention as curio.spawn.

The arguments against taking a callable:

  • Breaks compatibility, but eh, that's why this is "experimental".

  • It's nice to collect up the fn/args/kwargs into a single object, so as to avoid confusion between arguments-for-the-function versus arguments-for-run-or-spawn. In particular, if we actually made this change, then probably the best way to do it would be to make the signature of run and spawn be:

    def run(fn, *args, *, ... kwargs for run ...):
        ...

    i.e., positional arguments belong to the function, but kwargs are reserved for run or spawn. Or we might not even want to allow positional args at all, and just make it a bare zero-args callable:

    def run(fn, *, ... kwargs for run ...):
       ...

    The counter-argument to this is that we already have perfectly nice and idiomatic ways in Python to bundle up fn/args/kwargs together:

    curio.run(partial(my_async_function, arg1, arg2))

    And we have to use this style anyway for things like tcp_server, so we might as well use it everywhere.

@imrn
Copy link

imrn commented Nov 20, 2016

On 11/20/16, Nathaniel J. Smith [email protected] wrote:

  1. curio.spawn(my_async_function(arg1, arg2))
  2. curio.spawn(my_async_function, arg1, arg2)

spawn(co) is much better. Control over the creation and manupulation
of the 'co' is at my hands. Not behind an event horizon.

The arguments for taking a callable:

  • It's more parallel to the non-async/await way of doing things, which is
    nice for teaching

I dont think it is an important reason. Current form is also handy.

  • CPython runtime is quite certain that if you ever create a coroutine object,
    you are definitely going to run it, and it will yell at you if you don't.

It is the awkwardness of Python.

  • In some cases, the function that takes the coroutine might need to be able
    to run it multiple times. Examples would include curio.tcp_server

It is the nature of such functions. Like calling threads. No better solution
exists for them. If such a 'need' exists in a function then, there is no choice,
use it. Apart from this, the convention similar to 'spawn(co)' is still better.

I think no generalization is possible for all cases. That's why we
have diversified (sometimes partially redundant) forms,
grammars, syntaxes, linguistic elements..

@dabeaz
Copy link
Owner

dabeaz commented Nov 20, 2016

I like the current approach.

@njsmith
Copy link
Contributor Author

njsmith commented Nov 21, 2016

I like the current approach.

OK. But, I mean... I liked the current approach too, until I started using it more and started running into some issues. Obviously it's your project and you don't owe me anything, but I admit it's a bit frustrating to write down some concrete analyses of the issues and then get two replies that brush these aside without engaging with the trade-offs.

@dabeaz
Copy link
Owner

dabeaz commented Nov 21, 2016

Sorry. Didn't mean to be curt--I'm often very scattered for time with kids and other things going on. I've been meaning to get back here. A few thoughts...

I like the current approach because it is extremely minimal and simple. You provide a coroutine and curio runs it. One of my major complaints with asyncio is that it is just too complicated for me to remember how to do anything. Getting the event loop. A about a half-dozen methods all related to running things, all slightly different. I don't like it. Saying run(coro) seems really simple to me. I think it's pretty telling that they're now thinking about adding a similar feature to asyncio.

I also like the current approach because it is easy to have optional keyword arguments attached to the run() and spawn() functions without worrying about conflicts with arguments being passed to the coroutine. If we're going to take positional and keyword arguments with *args and **kwargs, it gets more complicated. I've also never really liked having to explicitly pass arguments via a tuple and dictionary either. Admittedly, this is mostly personal preference. However, I really do like how you can just import curio and start doing things immediately without doing anything too strange involving calling conventions.

I must admit that the fact that coroutines don't immediately evaluate until awaited is something that takes some getting used to though. There are all sorts of crazy ways of passing them around in an unevaluated state and then doing things later.

@dabeaz
Copy link
Owner

dabeaz commented Nov 28, 2016

So, I was recently doing a bunch of coding involving Curio and threads. You're right, the calling conventions are different. Naturally this issue makes it impossible to "unsee" the difference. I'm not entirely sure I'm going to do anything about it (yet), but I see what you mean here. Need to think about it more.

@dabeaz
Copy link
Owner

dabeaz commented Dec 26, 2016

Confession: I'm still thinking about this issue. You're absolutely right, there is an inconsistency here. There are a couple of different ways of spawning work in Python (in the library).

There are threads:

t = threading.Thread(target=func, args=(x,y,z), kwargs={ 'name': val })

There is multiprocessing:

pool.apply(func, (x,y,z), {'name':val})

And there are executors (from concurrent.futures):

pool.submit(func, x, y, z, name=val)

Asyncio has a run_in_executor() function that takes positional arguments only:

loop.run_in_executor(exc, func, x, y, z)

If you want keyword arguments, you'd have to partial it:

loop.run_in_executor(exc, partial(func, x, y, z, name=val))

Of these, I personally like the concurrent.futures approach of taking *args and **kwargs. This is the approach taken by the curio run_in_thread() and run_in_process() functions already. It avoids a lot of the line noise associated with threads and multiprocessing pools (i.e., having to give argument tuples and dicts).

Getting back to spawn(), I'm still wresting with the idea of making it more consistent with run_in_thread() and run_in_process(). Maybe it would make sense to have it work like this:

await spawn(func, x, y, z, name=val)

I could make spawn() recognize both this and the current approach. For example, if you passed it a coroutine object, it could do what it does now. If you passed it a callable and arguments, it could call func(args) to create the coroutine.

I'm just not sure what to do with additional keyword arguments to spawn(). Perhaps they should be eliminated in favor of attributes on the created task instead. For example:

task = await spawn(func, x, y, z, name=val)
task.daemon = True

Or there could simply be "reserved" keyword arguments to curio recognizes and that you'd have to avoid in called code.

Anyways, I'm still thinking about this. Wondering if anyone has any renewed ideas about it here.

@imrn
Copy link

imrn commented Dec 27, 2016

Apart from consistency what exactly we will gain from this?
Coroutine calling convention makes much sense..

co = cofunc(...)
await co # If you want serial execution
await spawn(co) # If you want parallel execution

Infact on the contrary bringing such a convention to
processes/threads is more elegant. But it may not
be meaningful for them.

pco = Process(...)
await pco # This would not meaningful for a process.
proc = await spawn(pco)
await proc.cancel()
Or something like that...

I remember we also had been talking about
killing processes at another issue..

@imrn
Copy link

imrn commented Dec 27, 2016

Coroutines and Threads/Process DO NOT share same
conceptually foundation. Although they 'seem' to be
of the same form at first sight..

@dabeaz
Copy link
Owner

dabeaz commented Dec 27, 2016

Basically, it's just consistency. It's not an entirely unimportant thing in my experience, but maybe it's impossible to get right in all cases. I was just thinking that if spawn() changed, then you might also turn to changing functions such as timeout_after() as well. Hmmm. Now I'm not so sure.

@njsmith
Copy link
Contributor Author

njsmith commented Dec 28, 2016 via email

@imrn
Copy link

imrn commented Dec 28, 2016

Wait... How didn't I see it?

pco = Process/Thread(...)
await pco # This would not meaningful for a process.

Of course it is meaningful: It is equivalent to launching
AND JOINING the process/thread..

Himmm...
Now, I'm seriously thinking about Process/Thread
wrappers making them indistinguishable from
coroutines..

Anyway currently we don't have much api to control and
query threads/processes although python provides them.
(i.e. killing)

As we make progress on them lets keep this issue alive..

@dabeaz
Copy link
Owner

dabeaz commented Dec 28, 2016

As an aside, I do not hold the standard library sacred with respect to how curio might potentially interact with threads or processes. So, all experimentation is encouraged :-).

@imrn
Copy link

imrn commented Dec 28, 2016 via email

@dabeaz
Copy link
Owner

dabeaz commented Mar 3, 2017

I've continued to think about this issue for awhile. Curio has grown some new functions for launching processes, performing work, and so forth. Some consistency is order. I'm going to modifiy curio to use a common signature for "launching" work whether it be a task, a thread, a process, or whatever:

async def coro(x, y, z):
        ...

await some_kind_of_work(coro, x, y, z, options=value)

Keyword arguments, if any, are reserved for options related to some_kind_of_work function, not the coro/func being called. If keyword arguments need to be passed to the target, partial() can be used.

Because of some backwards compatibility concerns with existing code, passing an instantiated coroutine to certain functions (spawn, run, etc.) will still work. However, documentation and examples will prefer the new approach.

@dabeaz
Copy link
Owner

dabeaz commented Mar 26, 2017

I have modified curio to fairly universally adopt an API such as:

 await spawn(coro, arg1, arg2, ... , argn)

The older API such as:

await spawn(coro(arg1, arg2, ..., argn))

will continue to work however. Basically, you can either pass an instantiated coroutine or a callable with arguments will be executed to produce a coroutine.

@dabeaz dabeaz closed this as completed Mar 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants