-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#3821] Support Set
for exact buffers in Flux.buffer
#3822
Conversation
Hey, @Sage-Pierce ! I apologize it took some time to respond. I reworked the bufferTimeout operator with fair backpressure recently, that's the reason for it. Considering all existing buffer* implementations I believe their behaviour should be consistent across the offering. I wonder how this requirement would influence other implementations as it feels a bit risky in the face of concurrency. Can you share your thoughts? After looking at the codebase it seems the authors didn't consider this scenario and assumed there's no filtering happening and once an item is added, the collection's size increases. I wonder whether including this check and handling is the way to go or perhaps limiting the possible Let me know, thanks! |
Hi @chemicL 👋 No worries about the delay 😄 I didn't immediately look into the other
I don't think that there's any more risk with this change regarding concurrency, since
I don't think that there is currently another way to implement "give me |
I came up with this:
Just as a conversation starter :) I do imagine this doesn't look as nice and the performance would be incomparable. I'll try to digest the rest of the comments and review the other implementations next. For now, can you also prepare a few sample {input, output} sets so that we know what the end goal is? I mean a sequence 1, 2, 1, 3 would yield [1, 2] and [1, 3] for n == 2, but would yield [1, 2, 3] for n == 3. Is that desired? Can you share some real world scenarios that come to mind that this would benefit? I tend to first try to understand the need and then try to work towards a solution that matches the expectations. This potential mismatch regarding expected supplied aggregator types is puzzling and it would be neat if we could comprehensively address this. |
Ah nice, that was abstractly what I had in my head, but I couldn't come up with that
The test I wrote for this changeset covers the basic expectation I think, and it looks like you already understand my intent quite well. I'll just format those and a few more below: Given
In my use case, I am iterating over time-bucketed data elements (from a database) and executing an I/O-bound process on them (a service call). That service call is maximally efficient when passed |
Thanks. For Replace
with
and you can observe the same outcome. Out of the others, |
reactor-core/src/main/java/reactor/core/publisher/FluxBuffer.java
Outdated
Show resolved
Hide resolved
For the Given
Due to that significant difference in how Given that there isn't actually a "hanging" issue with Thoughts? |
Thanks for following up. I agree that
In my view, the current behaviour when presented with a I think in order to merge something we'd need to cover all As this currently doesn't work correctly nor consistently we should make an effort to bring more clarity in the docs and tests. For
For
I understand this requires more work so please let me know if you're still keen to contribute. I'd just like us to have a consistent UX across similar operators and that requires a holistic approach. I'll be away for a week but if you make any progress, please do commit and I'll review the changes when I'm back. Thanks again @Sage-Pierce and I look forward to where this discussion leads us :) |
@chemicL I don't mind taking a stab at all of that 😄 May take some time, but may have an updated review next week. |
- Make `Collection` behavior consistent among all FluxBuffer* operators - Added several more tests for all FluxBuffer* operators covering usage of `Set`
…hat take a `bufferSupplier`
@chemicL I believe I have addressed your feedback, and I look forward to your re-review when you return. I will be on vacation for the first half of July, so it may take me a bit to follow up on further feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the follow-up. Please have a look at my comments. For the skip != maxSize
case I have different expectations for the results in the test cases after reading the javadoc. These should:
- a) Either comply with the existing specification (new buffer after
skip
items were emitted by the source). - b) Or adjust the spec to the special case.
With a) it would be necessary to emit the current buffers despite their size being smaller than max but according to considered emitted items so far that fall into the window observed by a particular buffer... And maxSize == skip
should not be a special case. So this would lead to an inefficiency in just emitting smaller buffers and that's not your goal.
I think we're left with b) and I guess that in such a case the overlapping buffers would need to be smaller in size once they fall out of scope. For the disjoint case (skip >= maxSize
) we can just pretend that the discarded items were never emitted.
Additionally, please note the "Discard support" section in the relevant Javadocs -> they should also explain what's happening here.
Let me know your thoughts and thanks for the effort so far.
@@ -3123,6 +3129,10 @@ public final Flux<List<T>> bufferTimeout(int maxSize, Duration maxTime) { | |||
* will be emitted by the returned {@link Flux} each time the buffer reaches a maximum | |||
* size OR the maxTime {@link Duration} elapses. | |||
* <p> | |||
* Note that if buffers provided by bufferSupplier may return {@literal false} upon invocation | |||
* of {@link Collection#add(Object)}, buffer emission may be triggered when the buffer size is | |||
* less than the specified max size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* less than the specified max size. | |
* less than the specified max size. The element will be discarded in such a case. |
@@ -3163,6 +3173,10 @@ public final Flux<List<T>> bufferTimeout(int maxSize, Duration maxTime, Schedule | |||
* will be emitted by the returned {@link Flux} each time the buffer reaches a maximum | |||
* size OR the maxTime {@link Duration} elapses, as measured on the provided {@link Scheduler}. | |||
* <p> | |||
* Note that if buffers provided by bufferSupplier may return {@literal false} upon invocation | |||
* of {@link Collection#add(Object)}, buffer emission may be triggered when the buffer size is | |||
* less than the specified max size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion as above.
@@ -3230,6 +3244,10 @@ public final Flux<List<T>> bufferTimeout(int maxSize, Duration maxTime, | |||
* will be emitted by the returned {@link Flux} each time the buffer reaches a maximum | |||
* size OR the maxTime {@link Duration} elapses. | |||
* <p> | |||
* Note that if buffers provided by bufferSupplier may return {@literal false} upon invocation | |||
* of {@link Collection#add(Object)}, buffer emission may be triggered when the buffer size is | |||
* less than the specified max size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion as above.
@@ -3254,6 +3272,10 @@ public final <C extends Collection<? super T>> Flux<C> bufferTimeout(int maxSiz | |||
* will be emitted by the returned {@link Flux} each time the buffer reaches a maximum | |||
* size OR the maxTime {@link Duration} elapses, as measured on the provided {@link Scheduler}. | |||
* <p> | |||
* Note that if buffers provided by bufferSupplier may return {@literal false} upon invocation | |||
* of {@link Collection#add(Object)}, buffer emission may be triggered when the buffer size is | |||
* less than the specified max size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the same suggestion as above.
reactor-core/src/main/java/reactor/core/publisher/FluxBuffer.java
Outdated
Show resolved
Hide resolved
// It should never be the case that an element can be added to the first open | ||
// buffer and not all of them. Otherwise, the buffer behavior is non-deterministic, | ||
// and this operator's behavior is undefined. | ||
if (!b.add(t) && b == b0) { | ||
Operators.onDiscard(t, actual.currentContext()); | ||
s.request(1); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine it can be the case. In a recent buffer the item is not a duplicate, but in the older buffers it can appear to be a duplicate. Therefore, an item can be discarded only provided that it is a duplicate in all currently tracked buffers. That also necessitates that the old buffers get emitted despite being smaller in size, otherwise there is a risk of leaking memory. Consider the following:
maxSize = 3
skip = 1
input: 1|2|1|2|1|2
step1: 1
buf1 = 1 // ok
step2: 2
buf1 = 1|2 // ok
buf2 = 2 // ok
step3: 1
buf1 = 1|2 // fail(1)
buf2 = 2|1 // ok
buf3 = 1 // ok
step4: 2
buf1 = 1|2 // fail(2)
buf2 = 2|1 // fail(2)
buf3 = 1|2 // ok
buf4 = 2 // ok
and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a good catch. I rather implemented this as-is to satisfy an invariant of, "an item is discarded if it is not add-able to all in-flight buffers" (hence the comment). However, I can see how this should rather be, "an item is discarded if it is not add-able to any in-flight buffers".
Ah and regarding the build failure -> make sure to run |
After giving it some though, I wonder whether it makes sense to introduce a new operator for this particular purpose, like
which would emit the buffer once the condition is met and then start a new buffer. Usage:
And for the existing nuance with Collections that can return false, just document that it's not supported for buffer operators. WDYT? |
The more I think about it the more doubts I have :) I consulted RxJava's codebase, which shares the implementation of the To add to that, the suggestion I made with using Consider the following idea for your use case (with artificial delays introduced):
|
I feel I've re-invented the |
Heya @chemicL, I am back from vacation and catching up on your feedback 😄
This is true, and I think this makes sense, if you have any client that is choosing to use a
True, the examples I've provided so far imply some form of monotonically increasing emission, but I don't know that it's an assumption that affects the desired behavior. Even if the emission is more random, I think the desired generic behavior still makes sense, and again, I think the onus would be on the client to understand that providing a
I certainly see how this approach could be used as a workaround, but in all honesty, if that's the only way to accomplish what I'm looking for, I would be inclined to abandon my goal, since, IMO the resulting code complexity doesn't warrant the benefit. I would feel a bit awkward trying to explain to my teammates that this much code was responsible for implementing "give me N distinct items at a time". And as you are also pointing out, this proposed workaround may have some missing considerations.
I'm not immediately opposed to a new operator, though I know there is a low appetite for new operators to maintain in Reactor. After I update the code to incorporate your latest feedback, I'd like to see how you're feeling about these thoughts. At the very least, I think something should be changed to address the cases where a
|
Hey, thanks for getting back on this. Can you also reflect on the notes I made about the potential use of |
Been a busy week 😓
I do believe this could work for my me, though I think we both agree it's not obvious that this would be the correct operator to use for the more generic version of my use case. In particular, there isn't really a notion of a "stale" buffer for what I'm working on. I just care that there's "N distinct items", and not that it could take a long time to emit any given buffer because there are a bunch of duplicates. In fact, this would be great for my use in order to absolutely minimize load. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a few rounds I'm quite convinced that the exact buffer case (maxSize == skip
) is something that can be handled and makes sense. For maxSize != skip
I find inconsistencies in
- breaking semantics from existing implementation about opening the buffer every
skip
item emitted from the source - risks of keeping incomplete buffers in memory
- risks of stalling processing
Do you agree that bufferTimeout
is a good fit and can be documented as one that supports this type of Collection
together with buffer
with exact size buffers where this is an exceptional case where the skip
argument is ignored as buffers are created upon completion of the current buffer?
If so:
- I think we can discard changes in the
BufferSkipSubscriber
andBufferOverlappingSubscriber
- I wondered if we should implement abrupt
onError
signalled to downstream in caseCollection#add
returnsfalse
and document the requirement forCollection
to accept all signals, but I'm hesitant to do so. I think it might be better to leave the gate open in case we need to reconsider. For now the undefined behaviour should not impact the performance in any case with additional conditional statements. - Javadoc for
buffer
andbufferWhen
can explain the "exact" case exception which allowsCollection
to returnfalse
and will request more from upstream.
reactor-core/src/main/java/reactor/core/publisher/FluxBuffer.java
Outdated
Show resolved
Hide resolved
|
||
if (!added) { | ||
Operators.onDiscard(t, actual.currentContext()); | ||
s.request(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have doubts here - the demand flow without Set is that every buffer that is initiated has a chance to be fulfilled. If the downstream requests 1 it means a full buffer should be completed. As the buffers are delivered, the downstream can request more. With potentially non-deterministic implementations of Collection which can unpredictably return false some assumptions might not hold. There are a few assumptions here that dictate the flow that follows. I'd probably prefer to back away from implementations for skip != maxSize
as they become complicated to reason about and explain to users what are the risks (e.g. keeping a growing list of undelivered buffers). I think we could terminate early instead of stalling and deliver an error to the downstream and cancel the source in case the buffer is unable to accept the item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the early termination, I clarified in the general comment - better to leave the undefined behaviour in such case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the demand flow without Set is that every buffer that is initiated has a chance to be fulfilled. If the downstream requests 1 it means a full buffer should be completed. As the buffers are delivered, the downstream can request more. With potentially non-deterministic implementations of Collection which can unpredictably return false some assumptions might not hold.
Indeed, this is one of the reasons I originally implemented this with "if the data can't be added to all in-flight buffers, then discard it", which might avoid some of this weirdness.
I'd probably prefer to back away from implementations for skip != maxSize as they become complicated to reason about and explain to users what are the risks ... better to leave the undefined behaviour in such case.
I agree 😄
reactor-core/src/test/java/reactor/core/publisher/FluxBufferTest.java
Outdated
Show resolved
Hide resolved
Yep, at this point, I am in agreement. For overlapping or skipping buffers that use collections that might return
Agreed
Will do!
I'm on board with not doing anything about this for the cases where
I will include this in my updates. However, I don't think anything should change about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end I'm not sure if it makes sense to touch bufferTimeout
in this PR as per comment in the test suite and the fact that discard is not implemented. The same can be achieved with just eliminating duplicates on each buffer emission.
Sorry to have dragged you all the way and now we're back to square one.
I think that if we work on #3774 it could be possible to rework the bufferTimeout
operator in such a way to make this work too. For now, Let's just make an exceptional case for buffer with maxSize == skip
and merge your changes.
@@ -3123,6 +3126,10 @@ public final Flux<List<T>> bufferTimeout(int maxSize, Duration maxTime) { | |||
* will be emitted by the returned {@link Flux} each time the buffer reaches a maximum | |||
* size OR the maxTime {@link Duration} elapses. | |||
* <p> | |||
* Note that if buffers provided by bufferSupplier may return {@literal false} upon invocation | |||
* of {@link Collection#add(Object)}, buffer emission may be triggered when the buffer size is | |||
* less than the specified max size. The element will be discarded in such a case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is false.
@@ -3163,6 +3170,10 @@ public final Flux<List<T>> bufferTimeout(int maxSize, Duration maxTime, Schedule | |||
* will be emitted by the returned {@link Flux} each time the buffer reaches a maximum | |||
* size OR the maxTime {@link Duration} elapses, as measured on the provided {@link Scheduler}. | |||
* <p> | |||
* Note that if buffers provided by bufferSupplier may return {@literal false} upon invocation | |||
* of {@link Collection#add(Object)}, buffer emission may be triggered when the buffer size is | |||
* less than the specified max size. The element will be discarded in such a case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is false as well.
@@ -3230,6 +3241,10 @@ public final Flux<List<T>> bufferTimeout(int maxSize, Duration maxTime, | |||
* will be emitted by the returned {@link Flux} each time the buffer reaches a maximum | |||
* size OR the maxTime {@link Duration} elapses. | |||
* <p> | |||
* Note that if buffers provided by bufferSupplier may return {@literal false} upon invocation | |||
* of {@link Collection#add(Object)}, buffer emission may be triggered when the buffer size is | |||
* less than the specified max size. The element will be discarded in such a case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this one is also false. Discard support is not present in bufferTimeout
for buffer.add
returning false
.
@@ -3254,6 +3269,10 @@ public final <C extends Collection<? super T>> Flux<C> bufferTimeout(int maxSiz | |||
* will be emitted by the returned {@link Flux} each time the buffer reaches a maximum | |||
* size OR the maxTime {@link Duration} elapses, as measured on the provided {@link Scheduler}. | |||
* <p> | |||
* Note that if buffers provided by bufferSupplier may return {@literal false} upon invocation | |||
* of {@link Collection#add(Object)}, buffer emission may be triggered when the buffer size is | |||
* less than the specified max size. The element will be discarded in such a case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also false here.
Flux.just(1, 1, 1, 1, 1, 1, 1) | ||
.<Set<Object>>bufferTimeout(3, Duration.ofSeconds(2), HashSet::new) | ||
.as(it -> StepVerifier.create(it, 3)) | ||
.expectNext(Collections.singleton(1), Collections.singleton(1), Collections.singleton(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it of little practical use in the end -> the result is that a buffer is emitted every time maxSize
items are consumed from the source, not when the buffer size is at the maxSize
limit.
Sounds good to me! I'll revert the changes related to
And also sounds good 😄 I will go ahead and update this PR with your recommendations! |
Thanks for the contribution @Sage-Pierce :) I labelled it as an enhancement. This actually has a "It's a feature, not a bug" kind of vibe to me and I feel we are not really addressing a common flaw but added support for a special case as demonstrated by our lengthy discussion. It is a grey area, unfortunately. Anyways, thanks for all the back and forth, glad we arrived at the final destination 🚢 |
FluxBuffer
to request 1 when buffer is not modifiedSet
for exact buffers in Flux.buffer
If a Set is used as the destination in
Flux.buffer
, the stream will not hang if/when there are duplicates in a given bufferPreviously,
FluxBuffer
was not taking the result of adding to the buffer into account. If adding to the buffer does not result in modifying it, an extrarequest(1)
should be issued.Fixes #3821