Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use 'static lifetime in BoxFuture for all object-store APIs #6587

Closed
kylebarron opened this issue Oct 18, 2024 · 10 comments · Fixed by #6619
Closed

Use 'static lifetime in BoxFuture for all object-store APIs #6587

kylebarron opened this issue Oct 18, 2024 · 10 comments · Fixed by #6619
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@kylebarron
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I'm writing a new Python binding for the object-store crate (differences from the existing one detailed here).

I'm trying to present streaming APIs to the user instead of materializing an entire result stream upfront. This worked for get, where we can expose the stream returned by GetResult::into_stream as a Python async iterable.

When I tried to do present a streaming result API for list, I tried to cast the result of ObjectStore::list to

let stream: BoxStream<'static, object_store::Result<ObjectMeta>> = store.list(prefix);

and got that the store does not live long enough.

error[E0597]: `store` does not live long enough
   --> object-store-rs/src/list.rs:175:18
    |
168 |     store: PyObjectStore,
    |     ----- binding `store` declared here
...
175 |     let stream = store.as_ref().list(prefix.map(|s| s.into()).as_ref());
    |                  ^^^^^ borrowed value does not live long enough
176 |     let stream2: BoxStream<'static, object_store::Result<ObjectMeta>> = stream;
    |                  ---------------------------------------------------- type annotation requires that `store` is borrowed for `'static`
...
189 | }
    | - `store` dropped here while still borrowed

It's not possible to use a lifetime other than 'static in a struct exported to Python, so I'm not sure if it's possible to wrap list as a stream currently.

@tustvold mentioned in discord that it may be possible to change other methods to use 'static in the next major object-store release.

Describe the solution you'd like

The most straightforward solution would be to change the ObjectStore trait to use the 'static lifetime. But I'm open to other solutions. I don't fully understand async lifetimes well enough to know all the implications here.

Describe alternatives you've considered

Additional context

@kylebarron kylebarron added the enhancement Any new improvement worthy of a entry in the changelog label Oct 18, 2024
@tustvold
Copy link
Contributor

I'm curious how python handles the lifetimes present in the futures and if it possible to do something similar for streams?

Currently all the futures have a non-static lifetime, but your comments indicate it is only the streams where this causes issue?

@kylebarron
Copy link
Contributor Author

kylebarron commented Oct 19, 2024

To the extent I understand it's okay with futures because they're evaluated immediately, either asynchronously via https://github.com/PyO3/pyo3-async-runtimes or synchronously via a tokio runtime. It matters whether the result of the future can be materialized as data and presented to Python or whether it should be preserved as a stream.

With the stream returned from ObjectStore::get, I'm storing the BoxStream itself in a struct exported to Python https://github.com/developmentseed/object-store-rs/blob/c157fe2cb8da14c1e8eee297014bb4695d683cc3/object-store-rs/src/get.rs#L111-L115. Then when the Python async iteration calls __aiter__, that creates a new future based on polling the stream.

But I think this is only possible because BoxStream is 'static. For another lifetime, you'd have to annotate the struct containing the stream with that lifetime, which pyo3 immediately rejects.

@tustvold
Copy link
Contributor

Ok so if I follow, this would mean that in order for tokio to poll the stream, the python async code would need to invoke it in a timely manner? Or to put it differently, if the python code is busy doing something else, the tokio work would get starved. I wonder if this is going to lead to the same sorts of issues we've run into multiplexing CPU bound tasks on the same threadpool? Perhaps there needs to be some sort of buffering between the two regardless, which would obviate any lifetime shenanigans? The same issue would also potentially apply to regular futures

TBC I am not opposed to changing the signature, just trying to ensure the python bindings work as well as possible 😅

@kylebarron
Copy link
Contributor Author

kylebarron commented Oct 22, 2024

I'm not fully clear on all the details on how the pyo3-async-runtimes integration works. From its readme:

in pyo3-asyncio, we decided the best way to handle Rust/Python interop was to just surrender the main thread to Python and run Rust's event loops in the background.

And I know there's revamped async work going on in PyO3/pyo3#1632 and https://github.com/wyfo/pyo3-async, which is suppose to remove some overhead of integrating rust and python async.


To answer your questions:

Ok so if I follow, this would mean that in order for tokio to poll the stream, the python async code would need to invoke it in a timely manner? Or to put it differently, if the python code is busy doing something else, the tokio work would get starved.

I think so, yes. At least in my current implementation of this, if __anext__ is never called, then the tokio future's next() is never called. So it relies on Python to drive this async.

I wonder if this is going to lead to the same sorts of issues we've run into multiplexing CPU bound tasks on the same threadpool?

Potentially, but I'd say this is up to the user to ensure the Python async runtime doesn't get blocked, and that they're not running CPU bound tasks on the main thread.

Perhaps there needs to be some sort of buffering between the two regardless, which would obviate any lifetime shenanigans?

I'm not sure how to implement this 😅 , or else I'd try to implement it now.

@kylebarron
Copy link
Contributor Author

just trying to ensure the python bindings work as well as possible

Trying to think of other things on my wish list to mention for the Python bindings...

One thing that would be great but I certainly don't expect to change is if object_store used something like an arrow::buffer::Buffer, which permits externally-allocated memory. Right now in a put we always have to make a copy from Python-backed memory to Rust-backed memory first. Or is there some way to use external memory with a bytes::Bytes?

@tustvold
Copy link
Contributor

Or is there some way to use external memory with a bytes::Bytes?

There have been various discussions about exposing the bytes vtable externally, but they've not gotten anywhere AFAICT. This would really be my preferred path, it seems unfortunate to break from the ecosystem with our own wrapper

@kylebarron
Copy link
Contributor Author

it seems unfortunate to break from the ecosystem with our own wrapper

Absolutely agree, even though it's unfortunate for my use case.

@tustvold
Copy link
Contributor

There's actually some recent activity on exposing the ability to construct Bytes for "foreign" memory -tokio-rs/bytes#742

@kylebarron
Copy link
Contributor Author

tokio-rs/bytes#437 (comment)

My implementation takes a simpler approach of just keeping an owner object around for the duration that is necessary.

Seems much like how I understand arrow::buffer::Buffer to work

@kylebarron
Copy link
Contributor Author

bytes 1.9 was released with Bytes::from_owner!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants