Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft of generic concurrency algorithm #73

Merged
merged 16 commits into from
Nov 29, 2020
Merged

First draft of generic concurrency algorithm #73

merged 16 commits into from
Nov 29, 2020

Conversation

jonhoo
Copy link
Owner

@jonhoo jonhoo commented Nov 22, 2020

Implements #45 (comment).

Fixes #45 and #70 by allowing separate implementations of the data structure that re-use the concurrency primitive.
Fixes #67 by making ReadGuard::map public.

/cc @jeff-hiner, @comath, @mattdm -- you've all expressed interest in having a single-map version in the past.


This change is Reviewable

@jonhoo
Copy link
Owner Author

jonhoo commented Nov 22, 2020

Note to self: I also got an email pointing out that it's probably not okay for us to alias a Box, even if we never cause a data race specifically. We should probably just store the raw pointer (probably through a NonNull) instead. I should fix that before landing this.

@p7g
Copy link

p7g commented Nov 22, 2020

I feel that "publish" could possibly be a more descriptive name for WriteHandle::refresh.

(Hello from YouTube by the way lol)

@ratmice
Copy link

ratmice commented Nov 24, 2020

I feel that "publish" could possibly be a more descriptive name for WriteHandle::refresh.

"flush" seems like it might be okay too

@jeff-hiner
Copy link

I feel that "publish" could possibly be a more descriptive name for WriteHandle::refresh.

"flush" seems like it might be okay too

Yeah, flush seems like a good name, as it's synonymous with network/file sync operations that already carry the meaning of "this is expensive and may block your current thread for a bit".

@jonhoo jonhoo mentioned this pull request Nov 26, 2020
@code-elf
Copy link
Contributor

Just wanted to say that this is awesome, and a similar idea to what I had in mind in my follow-up proposal in #48, just far more elegant. Excited for this one, thank you! 💜

This partially addresses #74. There is still a violation in implementing
`ShallowCopy` for `Box`, but that one we'll tackle separately.
It makes the code so much nicer.
A nod to #76 is also worthwhile here.
We are constantly removing items from the beginning of it and pushing
things onto the end, which causes a lot of reallocations for a `Vec`,
but not for a `VecDeque`.

Fixes #79.
@code-elf
Copy link
Contributor

code-elf commented Nov 29, 2020

I finally got a chance to take a proper look at this! Tomorrow I'm going to see about implementing maps with different value storages with it and seeing what else comes up, but one thing I can see immediately:

I think ShallowCopy will need to be a part of left-right. It's a concern that's going to come up and need a solution in almost any serious usage of the package. At the very least, I think it needs to provide it; ideally it would also solve some of the boilerplate for the implementation.

I haven't tried any of this out, but just to throw out some ideas for this: Absorb's type parameter could probably require ShallowCopy (which with the derive macro should be no problem to achieve) so both absorb_* functions could receive an owned operation, and maybe it could abstract away the usage of ManuallyDrop in absorb_first in some way.

Other than that, this all seems awesome and I'll report back after having given it a proper try!

Edit to add: Alternatively, if the goal is truly to keep left-right to be as generic as possible, maybe ShallowCopy (and some helpers?) is worth extracting into its own package?

@jonhoo jonhoo marked this pull request as ready for review November 29, 2020 03:51
Repository owner deleted a comment from codecov bot Nov 29, 2020
@jonhoo
Copy link
Owner Author

jonhoo commented Nov 29, 2020

@MayaWolf That's exciting — I look forward to seeing what you come up with (though it may take me a little while to get back to you — apologies in advance).

As far as ShallowCopy goes, I think it probably belongs in its own crate more so than in left-right since there's nothing specific to left-right about it. That said, I also don't know that it'd be useful anywhere else, so 🤷 Unfortunately, I think ShallowCopy will also have to change quite a bit to deal with the unsoundness identified in #74 and that arises from rust-lang/unsafe-code-guidelines#35 (comment). I've started sketching out a plan in #81, but it is ugly. Would love some more eyes on it!

I'll also note that I don't think ShallowCopy can appear in left-right's API specifically. The reason for that is that you do not want to shallow copy the oplog entries, nor the entire wrapped data structure. Instead, you generally want to shallow copy the individual values stored in the wrapped data structure (such as the values in evmap). But left-right knows nothing about "values". All it knows is that you have two copies of some T that can absorb opaque operations, and at that level of abstraction ShallowCopy isn't really useful.

Repository owner deleted a comment from codecov bot Nov 29, 2020
Repository owner deleted a comment from codecov bot Nov 29, 2020
@jonhoo jonhoo merged commit 81d84ee into master Nov 29, 2020
@jonhoo
Copy link
Owner Author

jonhoo commented Nov 29, 2020

@MayaWolf Actually, it looks like ShallowCopy may go away entirely! 🎉 Take a look at https://github.com/jonhoo/rust-evmap/blob/787ffb755b8b76536b1a9650f5cc0d21b8abb5b3/evmap/src/aliasing.rs from #81 for what I came up with instead.

@code-elf
Copy link
Contributor

@jonhoo Awesome, I'll look into this!

For now, I've implemented basic versions of both a single-value map, and a map using a hashset as the value storage, and left-right really makes this incredibly easy and convenient. The only unpleasant thing is really the boilerplate of ShallowCopy and ManuallyDrop, but I've not been able to come up with a way around this while also keeping left-right as generic as it is.

There are, however, two things about the public API I find curious/questionable:

  • Why is operation passed mutably? Not a huge issue, but for something that absolutely, positively has to stay the same between runs, that seems weird. If it's just because of retain, does predicate really need to be an FnMut?
  • Similarly, it seems very strange to be given other in the absorb methods. It seems that this is primarily because of JustCloneRHandle - and honestly, I believe direct_write is something that would make sense to have built into left_right itself. What do you think?

@jonhoo
Copy link
Owner Author

jonhoo commented Nov 29, 2020

@MayaWolf Hopefully the change in #83 that gets rid of ShallowCopy should make your life a whole lot easier!

Out of curiosity, what use do you have for a map that has a HashSet as the value storage?

As for your API questions:

  • operation is passed mutably to absorb_first in case the implementation needs to store information in it to ensure that absorb_second has the exact same effect as absorb_first. For example, to fix #78, it's likely that we need to store things like hashes and indices in the operation so that absorb_second does not invoke user-defined functions like PartialEq and Hash a second time (since they may be non-deterministic).
  • other is passed in mainly because it is easy to provide, and so I'd rather provide it in case a consumer needs it (like evmap does). The other reason it is passed in is, again, because it may be necessary for some implementations to ensure that their absorption is deterministic between the two maps (e.g., by ensuring that the same hasher is used).
  • (I split your direct_write into its own point) I don't think left-right can implement the direct_write optimization directly, since it does not have a safe way (I don't think) to produce a duplicate of T that will deterministically apply operations the same as T. For example, in evmap, we need to ensure that any HashBags we construct in the new Values use the same hasher as the originals. Clone won't do, since it does not promise to have the same initialized state as the T it clones. For example, it could be that if you clone a HashMap it gets its own dedicated hasher, with a different random seed, which would yield different iteration orders. Like #78 indicates, we cannot generally rely on "sane" implementations of user-defined traits.

@jonhoo
Copy link
Owner Author

jonhoo commented Nov 29, 2020

@MayaWolf I'll add to ^ that if you come across any part of the left-right documentation that could be better, please let me know/file an issue/file a PR. Concurrency primitives are hard enough that they warrant very good documentation, and the only way to get to that is to document things as we discover holes :)

@code-elf
Copy link
Contributor

code-elf commented Nov 29, 2020

@jonhoo I'm working with at-least-once delivery guarantees and a multitude of unique has-many relationships - so for something like member lists, a HashSet is just easiest to work with. To be clear, this would be entirely possible with what evmap provides out of the box, but HashSet allows me to add values without having to check whether they already exist first, and is probably a little more efficient than evmap's general-purpose value bag.

While I get the arguments of passing operation mutably and other in case they're needed, I'm not sure with it helping to minimize user errors - if anything, it just moves the potential for them into the implementations of Absorb. But if we consider that better, then yeah, it makes a lot of sense!

This is more of a question because I may very well be missing something - but does calling other.data.hasher(); actually have an advantage over calling self.data.hasher()? Unless I'm missing something, all this does is ensure the maps use each other's hasher, not that they actually use the same one.

As for direct_write, I believe all we would need for this is to add a clone_first method (with perhaps a better name. create_second?) to the Absorb trait - it could have a default impl of calling self.clone() if Self: Clone (edit: actually I'm not sure whether Rust allows default method implementations to have type guards? Either way if it does I think this would be good!), and unsafe implementations like evmap could override it. It would also have the added benefit of making new's type parameter not require Clone!

So far, the docs seemed on point and very easy to understand - but that's admittedly coming from someone already quite familiar with the internals of evmap!

@jonhoo
Copy link
Owner Author

jonhoo commented Nov 30, 2020

@MayaWolf Ah, yes, that makes sense! Neat to see that this already makes sense in more contexts :D

So, I think that operation must be passed mutably, because the alternative is to pass &O, which would mean implementations cannot stash deterministic info in the O, and would be left without any way to do it. Implementors of Absorb already have to be quite careful, so I'm completely happy with also requiring them to not abuse the &mut O.

Hehe, no, you're right, it doesn't. What it does buy you is the ability to get at a &Hasher while mutating self. Normally, that's super annoying. I think evmap could probably be tweaked a bit to use the hasher from self instead, but it'd likely require a bit more complex code to satisfy the borrow checker. The way we ensure that the two maps have the same hasher is that we construct one with the hasher of the other. The reason we need to also pass the &Hasher around is so that when we construct new HashBags, those also use the same hasher.

Hmm, I'm conflicted about adding direct_write to left-right directly. I feel like it adds complexity to the trait for something that can be fairly easily implemented externally. Now there's an additional method on Absorb that implementors need to make sure they got right, and an additional optimization they must understand. We also couldn't lift the Clone requirement on new easily, since we still need to have a read copy even if is initially empty and will just use clone_into to "fill" initially. I'll have to think on that one a little.

@jonhoo jonhoo deleted the left-right branch December 21, 2020 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why not make map_ref of ReadGuard public? Interest in a single value map?
7 participants