-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assess integrating ownership #77
Comments
One concrete option is to put this as a v.next goal, and use it as motivation to cut a sane version of timely that we can leave out there before breaking all sorts of things. |
Historical note: this is almost how timely used to work, back when it was just getting off the ground. If you dig into the operator construction that It wouldn't be too hard to sneak this back in, if only as a hack, by enriching the There is some technical complexity to worry about, in that the progress tracking logic currently assumes (and should probably be corrected) that all uses of an output produce the same number of messages. If we sneak a |
I like the idea, as it moves timely closer to Rust's ideal of providing zero-cost abstractions. I have to admit that I struggled making a similar transition, while working on carboxyl – it still clones a lot internally. All the lifetime handling suddenly becomes much harder to get right. However, once it's done it should allow to write more efficient code. |
I've pondered this a bit more, with my hands in the guts for other reasons, and here are some thinkings: The main place that ownership is mis-used is in It seems totally reasonable to have one listener who gets owned data, and an arbitrary number of listeners who implement a different trait (rather than Another advantage to this is that we could have the type of the owning listener as part of the We could also try and be fancier and have two types of I think the two types of |
A general question about this: at the moment, I have not really processed any large data. However, it is quite possible I might end up with allocations of a couple of megabytes in dataflow streams. An extreme case would be a data cube with a volume of, say, 128^3 elements each containing a few dozen bytes or so. Thus, eventually I really want to make sure there is no cloning, unless I say so in the code. Can this be entirely controlled under this proposal or would there be any points left, where data is cloned implicitly? |
At the moment the only time you should experience clones is if you re-use a stream of data. That is, if you have let stream_of_big_things = ...;
stream_of_big_things
.map( ... )
.other_stuff( ... );
stream_of_big_things
.different_stuff( ... ); In this case, your data would be cloned by timely. But if you comment out either of the uses (leaving just one) there shouldn't be any clones. We would have to talk about serialization and stuff like that, in which case there will be some effective copying when the serialization and possible deserialization happens. In this proposal (that of the issue) I would love to remove the Edit: Yup, just checked and if I remove the Edit 2: There is also a |
Okay, so while one has to be careful, it is possible to avoid the clones today. That's good to know. Serialization is a different story. Usually with problems like this, either way you try to keep as much data local as possible and keep the communication to other workers at a minimum. Does serialization happen on its own without calls to About ergonomics, it seems similar to an iterator, does it not? One consumer being able to take ownership sounds like passing the stream by value (like I am sure, there are some implementation details, that might make that change tricky. But in an ideal world it would be awesome to simply use the same kind of API as an iterator. Is it possible to help out with experimentation or implementation in a useful way? |
I think it could end up totally tolerable, but it breaks an abstraction barrier and asks users to be aware of whether they are the final consumer of a stream or not. Proposal later on in this comment. First, though: I think it is a bit different from iterators, in that if you call e.g. In particular, if you hand out a stream of Instead, I think there is a pretty easy stop-gap (which may address most cases). Right now there are lots of methods on streams that take a At the same time, we can add a new method on fn flat_map_ref<I, F>(stream: &Stream<G,D>, func: F) -> Stream<G, I::Item>
where
I: IntoIterator,
I::Item: Data,
F: Fn(G::Timestamp, &[D])->I; which allows pretty much anyone to look at Perhaps we add convenience methods like Changing most of the methods to take a I do really like the idea that timely exposes ownership and lets you be certain about it, even if it means that you occasionally have to write let stream_of_big_things = ...;
stream_of_big_things
.cloned()
.map( ... )
.other_stuff( ... );
stream_of_big_things
.different_stuff( ... ); which, doesn't really seem all that horrible written like that. |
Right now it is done automatically, but only when data move between processes (that is, only for operators that require the There is a separate trait,
What do you think of me prototyping For example, even with |
Independently, it is a fine question to ask: "could we have a stream of I think this gets into some sticky details, where the operators that can use these references will be fairly restricted, and probably couldn't be able to stash the references for example. But I could imagine you wanting to define a dataflow fragment that can observe and fully retire a bunch of reference, with the expectation that batch-by-batch it runs to completion before ownership is handed of to the consumer of the stream. I'm not clear if there are dataflow fragments that (i) make sense, and (ii) can't be expressed with something like stream.flat_map_ref(|time, data|
data.iter()
.filter(...)
.map(...)
.cycle()
.take(100)
.zip(0 .. 100)
); but if there are better idioms, we could try and present them. I don't know that we'll actually be able to expose an actual iterator type that you act on, because we need to eventually go monadic on it and turn it back to a stream. |
Sounds good. Happy to provide feedback. The The question of non-'static streams still may be relevant though, I think. Continuing with the large data cube use case, you mentioned just sending boundaries, which is exactly what needs to happen. While I think it would generally be tolerable to clone just a small subset of the data, it would likely still benefit the scaling efficiency of a particular algorithm on a data set of a given size not to. To be more concrete, it could look like this with cloning: struct Cube {...};
struct Boundary {...};
fn boundaries(cube: &Cube) -> Boundary { /* copies data internally */ }
stream
.flat_map_ref(|_time, data| data.iter().map(boundaries))
.exchange(|boundary| neighbour_index(boundary)) And I think this is fine already and everything that comes now, may be me getting carried away with premature optimization… That said, if cloning could be avoided, then it may look like this: struct Boundary<'a> {...};
fn boundaries<'a>(cube: &'a Cube) -> Boundary<'a> { /* borrows */ } The same streaming code as above would then produce a There probably still is a case for being able to send part of a data structure over the network without copying and only materializing it into an owned structure on the receiving end. But instead of streams containing non-'static entities, it may be more to solve this with a specialized operator and a more involved serialization method. I guess one would have to express a relation between two datatypes like It sounds fairly complicated to build this and implementing that new trait for users probably will be fairly unsafe… Perhaps we should not bother too much. Still I'm curious what you think about this. |
Ahaha. You can totally do this with Abomonation. :) (( unfun caveat: would only be safe if you use lifetimes bound by that of the But I think realistically, as soon as you hit an To flesh out your example a bit, do you imagine wanting: struct Cube {...};
struct BoundaryRef<'a> {...};
cube_stream
.as_ref()
.map(|x: &Cube| BoundaryRef::from(x))
.do_some_stuff()
.maybe_eventually_exchange();
cube_stream
.take_ownership()
.exchange(|boundary| neighbour_index(boundary)) The cube_stream
.flat_map_ref(|cube_refs|
cube_refs
.map(|cube_ref| BoundaryRef::from(cube_ref))
.do_some_stuff()
)
.maybe_eventually_exchange(); I could imagine some |
For what it is worth, the way this works at the moment is that you can serialize anything implementing On the receive side things are a bit better. When you deserialize a |
I could totally imagine providing an This removes the opportunistic ability to exploit ownership and avoid serialization when within a process (e.g. if you are running at a smaller scale), but ensures that you don't do a clone just to serialize the data (and then drop the clone). I could also imagine something like It isn't conceptually hard to do this, but it does break a few abstractions and makes you wish things were clearer. E.g. when you write your Edit: Right now you can do this in custom timely operators by doing a |
Okay, I am starting to understand the scary name :)
No, as I said, for this case the |
It's hard to tell this without benchmarking, but I could well imagine that this ability is worth more than avoiding the clone for serialization. In a typical HPC setup you have single nodes with dozens of CPU cores and you likely want to use multi-threading on each node. So communication between processes should be somewhat rarer than communication within a process.
This sounds more promising. Maybe combine fn exchange_cow<R, L, T, U, V>(stream: &Stream<G, T>, route: R, logic: L) -> Stream<G, V>
where R: Fn(&T) -> (u64, &U),
L: Fn(Cow<U>) -> V,
U: ExchangeData, T: Data, V: Data On the sending worker, with Could this be implemented as a custom operator in today's timely? More radically, it would still be useful to just leave out fn exchange_cow<'a, R, T, U>(stream: &'a Stream<G, T>, route: R) -> Stream<G, Cow<'a, U>>
where R: Fn(&T) -> (u64, &U),
T: ExchangeData, U: Data But here goes the non-'static stream again… |
I am starting to see the complexity here. What I am trying to get to, is roughly like this: fn combine<G: Scope>(Stream<G, Cube>, Stream<G, Boundary>) -> Stream<G, Cube> { ... }
let cube: Stream<G, Cube> = ...;
let boundaries = cube.map(get_boundary).exchange(route);
let new_cube = combine(cube, boundaries); I guess, what I am worried about is that I can write the non-stream logic for Anyway, the essence is, I can take the exchanged data by reference for my domain logic. By the time the data has been exchanged, the clone has already happened though, so this does not buy me anything. The kind of operator that I would need to avoid the clones is probably like this: fn exchange_and_combine<G: Scope, T: Data, U: ExchangeData, V: Data>(
stream: &Stream<G, T>,
route_and_borrow: impl Fn(&T) -> (u64, &U),
combine: impl Fn(&[T], Cow<[U]>) -> V
)
-> Stream<G, V> ( The more I think about this, the more feasible it seems to write custom operators like above. The open question is whether I can exploit the cow semantics that way. PS: I hope I am not making this too much about my own use case, but I feel there is a general concern to be addressed here. |
Let's look at your example and see what I would imagine doing: n combine<G: Scope>(Stream<G, Cube>, Stream<G, Boundary>) -> Stream<G, Cube> { ... }
let cube: Stream<G, Cube> = ...;
let boundaries = cube.map(get_boundary).exchange(route);
let new_cube = combine(cube, boundaries); I think what you would probably go for is replacing the I think it will be tricky to avoid a Edit: I may have misunderstood. Based on the signature // update in place
fn(&mut Cube, &[Boundary]) which would make it more clear that we would need to copy some
No, this is great. A variety of requirements are helpful! |
Ah, yes, there is a trade-off here, even in an ideal case. When I make a new Workers not being coordinated means, that the computation happens in detached background threads? I imagine, it would be hard to use scoped threads instead, would it not? |
Well, the clone isn't much more expensive than allocating a new I guess I would say, if you are allocating a new
Yes, scoped threads make little sense because the scope that (I think) you are hoping for is somehow based on a borrow of some In the dataflow setting, relatively little of the data are rooted on some common stack, but rather are rooted in the stacks of the worker threads. A worker with a |
Okay, makes sense. I guess I am underestimating the cost of allocation there. Well, the clones are probably not as bad, as I think. ;) |
I have some more observations about ownership and cloning in particular. I've started playing around with this in a branch, and there are some other places where clone was used, and I'll see if I can explain them and whether it's workable or not (they were missed because they get noticed in a different rustc pass).
There are possibly some other cases hiding out there. Removing One first step might be to try and make |
Another observation, possibly related: in writing various interfaces to the outside world, I often find that I start with a If we insist that data ingestion happens in an operator, we need to hand owned data somewhere just to get things going, which means we've fundamentally cloned all of the This suggests that we might want the type |
Would this mean the inner type is now the container of events within a timestamp? And would it allow streams that are statically guaranteed to only have a single piece of data per timestamp, in this context a Anecdotally, I occasionally find myself mentally stumbling over the nested event structure of timely's streams. There are multiple timestamps, each of which contain multiple events. If I understand the implication your suggestion as above, such a change in type signature would make the structure somewhat more obvious. |
I think it is roughly this (the "container"), but I don't believe that It's good to note what the stumbling blocks are. I'm not sure what can be done about this one, short of a type of stream that is known to have single elements per time (and .. maybe no data exchange, otherwise hard to ensure?). I think some things might be getting worse for you in the future, too; there is a good mathematical reason that each "batch" Perhaps the right way to think about single-elements/batcher per time are wrapper around Alternately, one could certainly write a (thin) layer on top of timely stuffs in which all operators stash their inputs until they are complete, and then deliver one burst of data at each time. E.g. you have a notificator that holds on to This wouldn't be too hard an API to prototype; what isn't immediately clear is what the API is. Perhaps similar to how we have now, but with the implied guarantee that each time |
Oh, I was just curious about the superficial similarity to my mental model. But it seems I was wrong about that. I would probably rather file this under getting used to the concept rather than a problem with the API itself. It did make sense to me though, as soon as I had to think about non-trivial data exchange patterns. |
Timely dataflow exclusively moves owned data around. Although operators are defined on
&Stream
references, the closures and such they take as arguments can rely on getting buffers of owned data. Themap
operator will, no matter what, be handed owned data, and if this means that the contents of a stream need to be cloned, timely will do that.This seems a bit aggressive in cases where you want to determine the length of some strings, or where you have a stream of large structs (e.g. 75 field DB records) and various consumers want to pick out a few fields that interest them.
It seems possible (though not obviously a good idea) that this ownership could be promoted so that timely dataflow programmers can control cloning and such by having
Owned streams,
Stream<Data>
with methods that consume the stream and provide owned elements of typeData
.Reference streams,
&Stream<Data>
whose methods provide only access to&Data
elements, which the observer could then immediately clone if they want to reconstruct the old behavior. Maybe a.cloned()
method analogous toIterator
s.cloned()
method?I think this sounds good in principle, but it involves a lot of careful detail in the innards of timely. Disruption is fine if it fixes or improves things, but there are several consequences of this sort of change. For example,
Right now
Exchange
pacts require shuffling the data, and this essentially requires owned data, when we want to shuffle to other threads in process. It does not require it for single thread execution (because the exchange is a no-op) nor for inter-process exchange (because we serialize the data, which we can do from a reference).Similarly,
exchanged
data emerges as "unowned" references when it comes out of Abomonation. Most operators will still hit this with aclone
, though for large value types the compiler can in principle optimize this down. To determine the length of aString
, the code would almost certainly still clone the string to check its length (it does this already, so this isn't a defect so much as a missable opportunity).Anything we do with references needs to happen as the data are produced at their source, or involve
Rc<_>
types wrapping buffers and preventing the transfer of owned data. This is because once we enqueue buffers for other operators, we shift the flow of control upward and lose an understanding of whether the references remain valid. This could be fine, because we can register actions the way we currently register "listeners" with theTee
pusher, but would need some thought.Operators like
concat
currently just forward batches of owned data. If instead they wanted to move references around, it seems like they would need to act as a proxy for any interested listeners, forwarding their interests upstream to each of their sources (requiring whatever the closure used to handle data be cloneable or something, if we want this behavior).Such a "register listeners" approach could work very well for operators like
map
,filter
,flat_map
and maybe others, where their behavior would be to wrap a supplied closure with another closure, so that a sequence of such operators turn in to a (complicated) closure that the compiler could at least work to optimize down.Mostly, shifting around where computation happens, as well as whether parts of the timely dataflow graph are just fake elements that are actually a stack of closures, is a bit of a disruption for timely dataflow, but it could be that doing it sanely ends up with a system where we do less copying, more per-batch work with data in the cache, eager filtering and projection, and lots of other good stuff.
The text was updated successfully, but these errors were encountered: