Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to define a struct without exposing pyo3_GilGuard to user ? #1808

Closed
dbsxdbsx opened this issue Aug 18, 2021 · 25 comments
Closed

How to define a struct without exposing pyo3_GilGuard to user ? #1808

dbsxdbsx opened this issue Aug 18, 2021 · 25 comments
Labels

Comments

@dbsxdbsx
Copy link

win10

[dependencies.pyo3]
version = "0.14.2"
features = ["auto-initialize"]

At present, I could define a struct with pyo3 like this:

struct PyEnv<'py> {
    py: Python<'py>,
}

impl<'py> PyEnv<'py> {
    fn new(gil: &'py GILGuard) -> Self {

        let py = gil.python();

        Self {
            py,
        }
    }
}

And what I want is like this:

pub struct PyEnv<'py> {
    gil:GILGuard,
    py: Python<'py>,
    env_name: String,
}

impl<'py> PyEnv<'py> {
    pub fn new() -> Self {
        let gil = Python::acquire_gil();
        let py = gil.python();//.to_owned();
        //
        let torch = PyModule::import(py, "xxx").expect("wrong import torch");
        let distributions = PyModule::import(py, "torch.distributions").unwrap();
        let kl = PyModule::import(py, "torch.distributions.kl").unwrap();

        Self {
            gil,
            py,
        }
    }
}

But due to the lifetime issue, I can not do this. Is there a workaround?

@davidhewitt
Copy link
Member

davidhewitt commented Aug 19, 2021

Usually the solution to this kind of problem in Rust is to push control of the lifetime to the user.

So instead of acquiring the GIL inside your function, ask the user to acquire and pass the Python token in:

pub struct PyEnv<'py> {
    py: Python<'py>,
    env_name: String,
}

impl<'py> PyEnv<'py> {
    pub fn new(py: Python<'py>) -> Self {
        let torch = PyModule::import(py, "xxx").expect("wrong import torch");
        let distributions = PyModule::import(py, "torch.distributions").unwrap();
        let kl = PyModule::import(py, "torch.distributions.kl").unwrap();

        Self {
            py,
        }
    }
}

@mejrs
Copy link
Member

mejrs commented Aug 19, 2021

If you want to produce a struct holding a Python object(s), you probably want this:

struct Foo{
    field: Py<...>
}

Note that both your structs as written are bound to the 'py lifetime, which means they'll only be valid for as long as you are holding the gil. Is that actually what you want?

@dbsxdbsx
Copy link
Author

dbsxdbsx commented Aug 20, 2021

@mejrs , I want to create a struct, that user would not need to ever realize it is something with python. So the solution @davidhewitt said is not what I want. What I want is like this:

pub struct PyEnv {
    gil:GILGuard,//maybe no needed
    py: Py<Python>,//error:missing lifetime
    env_name: String,
}

impl PyEnv {
    pub fn new(env_name: &str) -> Self {
        let gil = Python::acquire_gil();
        let py = gil.python();//.to_owned();
        //
        let torch = PyModule::import(py, "xxx").expect("wrong import torch");
        let distributions = PyModule::import(py, "torch.distributions").unwrap();
        let kl = PyModule::import(py, "torch.distributions.kl").unwrap();

        Self {
            gil,
            py,
            env_name: env_name.into(),
        }
    }
}

As the code above , field py is error due to lifetime issue. Actually, I wonder why need to setting lifetime for type Python in Pyo3. With crate rust-cpython, there is no lifetime needed for type Python.

With the lifetime issue, I've also googled to check if lifetime of a field could be related to another field of the same struct---unfortunately, it seems this is not supported in rust.

@davidhewitt
Copy link
Member

davidhewitt commented Aug 21, 2021

How about not storing the GIL at all, and instead using Python::with_gil each time you need to interact with Python code?

Unless you're doing something in a tight loop which would lead to lots of GIL locking & unlocking, this would also give your users the flexibility to multithread. If you create a struct like you're attempting above, you'll end up locking the Python code to the same thread as this struct.

If you really need to store the GIL token in the struct (and lock it to a single thread), then store GILGuard without Python:

pub struct PyEnv {
    gil:GILGuard,
    env_name: String,
}

impl PyEnv {
    pub fn new(env_name: &str) -> Self {
        let gil = Python::acquire_gil();
        let py = gil.python();;
        //
        let torch = PyModule::import(py, "xxx").expect("wrong import torch");
        let distributions = PyModule::import(py, "torch.distributions").unwrap();
        let kl = PyModule::import(py, "torch.distributions.kl").unwrap();

        Self {
            gil,
            env_name: env_name.into(),
        }
    }
}

@davidhewitt
Copy link
Member

@mejrs - maybe this is a suitable reason why we shouldn't deprecate / remove GILGuard in #1788 and instead try to find a way to fix its issues...

@dbsxdbsx
Copy link
Author

dbsxdbsx commented Aug 21, 2021

How about not storing the GIL at all, and instead using Python::with_gil each time you need to interact with Python code?

Unless you're doing something in a tight loop which would lead to lots of GIL locking & unlocking, this would also give your users the flexibility to multithread. If you create a struct like you're attempting above, you'll end up locking the Python code to the same thread as this struct.

If you really need to store the GIL token in the struct (and lock it to a single thread), then store GILGuard without Python:

pub struct PyEnv {
    gil:GILGuard,
    env_name: String,
}

impl PyEnv {
    pub fn new(env_name: &str) -> Self {
        let gil = Python::acquire_gil();
        let py = gil.python();;
        //
        let torch = PyModule::import(py, "xxx").expect("wrong import torch");
        let distributions = PyModule::import(py, "torch.distributions").unwrap();
        let kl = PyModule::import(py, "torch.distributions.kl").unwrap();

        Self {
            gil,
            env_name: env_name.into(),
        }
    }
}

@davidhewitt , the solution you recommended may be similar to what I've done with rust-cpython crate, like the step function in this example---each time it has to acquire something with python, but not directly manipulate on a (reference of) python object.

In my project, there is only one struct need to deal with python. And also unfortunately, just like you mentioned, it would have to loop quite a long time with python(maybe more than 1,000,000 steps). So, with the original solution I mentioned(the very 1st post), it would be both efficent and freindely enough to struct user, if there is no exposure of GILGuard when newing the struct--- and with such a struct, it is easy for user to know that the number of instances of the struct, is the number of python intepreter process(or GIL) held at present, with which it would be easy to extend to multi-thread code with just rust code.

@davidhewitt
Copy link
Member

I think there's a misunderstanding here: it's not possible to have more than one GIL. If your struct holds a GILGuard inside it, then no other Rust threads will ever complete a call to Python::acquire_gil, unless you temporarily release the GIL with py.allow_threads

This is why I suggest making you user pass py: Python into the original struct - then they are able to control when threads are locked to Python or not.

@dbsxdbsx
Copy link
Author

@davidhewitt ,then how to launch multi python interpreter processes with pyo3? As far as I know, py.allow_threads is not real parallism.

@mejrs
Copy link
Member

mejrs commented Aug 23, 2021

@mejrs - maybe this is a suitable reason why we shouldn't deprecate / remove GILGuard in #1788 and instead try to find a way to fix its issues...

Storing the gilguard in a struct doesn't really strike me as idiomatic. Is there a good use for it? I don't see people storing MutexGuard or cell::Ref in structs anywhere.

We can deprecate it (or add something like "this will be deprecated in the future") and see what pops up.

@mejrs
Copy link
Member

mejrs commented Aug 23, 2021

@davidhewitt ,then how to launch multi python interpreter processes with pyo3? As far as I know, py.allow_threads is not real parallism.

Pyo3 doesn't support running different subprocesses like Python's multiprocessing module. It does allow it in other ways, by letting you use multithreaded Rust on the Rust side or by releasing the GIL while Rust code is running. See the paralellism chapter of the book for more about that.

@dbsxdbsx
Copy link
Author

@mejrs , I've checked the paralellism chapter before, but I am still a little confused. That releasing the GIL while Rust code is running --- just like the example in the mentioned chapter, I think this means paralellism between processes of python and rust, NOT REAL paralellism among python threads(instead, it should be called concurrency), right? And That multithreaded Rust on the Rust side refers to the rayon example,right? But this example shown is used for building a python module. And what I want ,on the contrary, is to take several pure python module in rust, so I guess this is not the case to use rayon.

@birkenfeld
Copy link
Member

There is no "paralellism among python threads". It can't exist.

@birkenfeld
Copy link
Member

is to take several pure python module in rust,

Please explain what you mean by that.

@dbsxdbsx
Copy link
Author

@birkenfeld , I want to call a python function in paralellism with rust just like with pure python code--- In python , due to GIL, multi-process has to be used to do so, instead module thread in python can only used for concurrecy( for IO bound project), not paralellism(for CPU bound project). I want to know how to do the same thing in rust with pyo3.

@birkenfeld
Copy link
Member

Well, that should be doable, as long as only one Rust thread (at a time) tries to acquire the GIL, you can have true parallelism between the Rust threads and the thread running Python.

@dbsxdbsx
Copy link
Author

Well, that should be doable, as long as only one Rust thread (at a time) tries to acquire the GIL, you can have true parallelism between the Rust threads and the thread running Python.

@birkenfeld , so I think this is the confusing part. Refering to " true parallelism", you mean threads between rust and python, but not threads between different python processes, right? The latter is what I need .

@birkenfeld
Copy link
Member

birkenfeld commented Aug 24, 2021

If by "processes" you don't mean processes but threads, correct. PyO3 cannot disable/work around the GIL for you, sadly.

@dbsxdbsx
Copy link
Author

If by "processes" you don't mean processes but threads, correct. PyO3 cannot disable/work around the GIL for you, sadly.

@birkenfeld , sorry for the mix using of "process" and "thread".
Do you mean that for example... I have 3 rust threads, and in each thread I have an instance inited with pyo3 respectively and doing some task by calling some python code, then these instances are actually running in true parallelism(just like using multi processes in pure python project for true parallelism)?

@birkenfeld
Copy link
Member

No. As I said, only one Rust thread can call into Python at a time. Rust/PyO3 does not allow you to get around the GIL.

@davidhewitt
Copy link
Member

To add to @birkenfeld's answer: No, only one thread running Python code can ever make progress. That's the whole design of the GIL.

Think of it as a high contention Mutex - if multiple Rust threads are trying to access data from a Mutex, they have to wait for each other to make progress. The same is true for the GIL: if multiple Rust threads are trying to run Python code, they must wait for each other to finish so that they can hold the GIL in turn.

If you want what you call "true parallelism like multiprocessing" then you can do exactly what multiprocessing does: use std::process::Command or fork() to split your program into multiple processes and use IPC to communicate between them. Each process will have its own GIL, and so you can have one Python thread per process.

However, I must then ask why not just use multiprocessing? The whole point of looking at a language like Rust for parallelism is that you don't need to go through all the pain of multiple processes and IPC. Rust really does let you run threads in parallel, unlike Python.

Looping this back to the original question, this is why we don't recommend storing GILGuard in a struct. You say you want to make it so the user doesn't know you use Python, however if they ever try to acquire the Python GIL themselves in a different thread their program will deadlock and they won't understand why.

If you want to have multiple Rust threads running in parallel, the right design for interacting with Python is to use Python::with_gil so that each thread only holds the GIL for as little time as possible.

If using Python::with_gil inside your struct's implementation is too much of a performance overhead (I suggest you benchmark this before you assume it is), then make your user pass py: Python to the struct themselves, so that they are aware of the Python GIL constraint and can control it as suits them.

@dbsxdbsx
Copy link
Author

dbsxdbsx commented Aug 24, 2021

@davidhewitt , of course I don't ever want to think of multiprocessing,nor storing GIL, as rust already offers a nice way to do task in parallism. But when code is interacting with python, I have to.

Besides, after discussion, now I know it is impossible to make the python code run in true parrallism with pyo3 in a rust program of a single process .

In addition, though using Python::with_gil can make code more readable than using Python::acquire_gil (because no need to care about locking), it is not suitable for my project, as I have to store status of python object.

@davidhewitt
Copy link
Member

Just as a follow-up you may want to watch #576, however this is a far-off future feature still.

@dbsxdbsx
Copy link
Author

@davidhewitt , thanks for the link, and hopefully, it coulde be one of pyo3 feature, also may become an advantage over other similar crates.
And thanks for all of your reply, @birkenfeld @mejrs.

@LegionMammal978
Copy link

LegionMammal978 commented Nov 19, 2021

Storing the gilguard in a struct doesn't really strike me as idiomatic. Is there a good use for it? I don't see people storing MutexGuard or cell::Ref in structs anywhere.

@mejrs, personally, I had been sharing an Rc<GILGuard> across all of my wrapper structs, so as to avoid putting 'py lifetimes everywhere and having to use a callback to create my global context struct. I later refactored the code to call Python::acquire_gil at every place I actually needed to use the GIL.

To answer your question, the difference here is that from a Mutex<T> one can obtain a MutexGuard<'a, T>, and from a RefCell<T> one can obtain a Ref<'b, T>, but from a Python<'py> one cannot obtain a GILGuard; in fact, from a GILGuard one can obtain a Python<'py>. That makes name of GILGuard seem somewhat odd to me, but I'm no expert in Rust naming conventions.

@dbsxdbsx
Copy link
Author

dbsxdbsx commented Nov 20, 2021

Storing the gilguard in a struct doesn't really strike me as idiomatic. Is there a good use for it? I don't see people storing MutexGuard or cell::Ref in structs anywhere.

@mejrs, personally, I had been sharing an Rc<GILGuard> across all of my wrapper structs, so as to avoid putting 'py lifetimes everywhere and having to use a callback to create my global context struct. I later refactored the code to call Python::acquire_gil at every place I actually needed to use the GIL.

To answer your question, the difference here is that from a Mutex<T> one can obtain a MutexGuard<'a, T>, and from a RefCell<T> one can obtain a Ref<'b, T>, but from a Python<'py> one cannot obtain a GILGuard; in fact, from a GILGuard one can obtain a Python<'py>. That makes name of GILGuard seem somewhat odd to me, but I'm no expert in Rust naming conventions.

@LegionMammal978 , thanks for reply. Do you mean that GILGuard has to be initialized first? If that is the case, I would say "Ok, it is fine, because I don't care the order of initialization." And I just want to hide GILGuard in my struct.

And different from what you did in your project, since I don't want extra overhead, I don't use Python::acquire_gil, but put a struct member of Python<'py>---WIth this, the struct instance would always hold key of a python interpreter, no need to call Python::acquire_gil.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants