Skip to content
This repository has been archived by the owner on Nov 15, 2019. It is now read-only.

[WIP] [COMMENTS?] [QUARKS-230] Add timer triggered window aggregations #167

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dlaboss
Copy link
Member

@dlaboss dlaboss commented Jul 15, 2016

No description provided.

*
* @see #aggregate(BiFunction)
*/
<U> TStream<U> timedAggregate(long period, TimeUnit unit, BiFunction<List<T>, K, U> aggregator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the issues I had with this in a previous system was that with many partitions the behaviour was not desired in that if the partition did not change the window still fired, thus wasting cpu cycles to produce the same result. Thus I wonder if it should be more along the lines of:

Aggregation of window partition on any partition change with a minimum period of period between aggregations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reasonable question.
In the particular use case that came up, it was OK/desirable that an unchanged partition still yielded an aggregation. e.g., the, less than perfect, interface that was desired between the device and the iothub was to publish events on the "current location" of the device even if it hadn't moved a meaningful distance.
A more efficient, less chatty, device/iothub interface would have been to only publish under that condition.
So maybe a timed-aggregator interface that only supported timed-trigger-if-changed semantics might not be acceptible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if the continue sending updates even if nothing has changed might be better handled by a separate operation, then it could be applied to anything, rather than just a window.

Something like pass any input tuple to the output, but send the last tuple if nothing has been received for the declared period.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... per-partition. Sounds like that separate operation could be a count(1)-timedAggregate-evenIfDidntChange. Does it make sense to force such a user to have to use a timed-trigger-if-changed window followed by this separate operation rather than just use a count(N)-timedAggregate-evenIfDidntChange? (I can imagine it would be ok, just want to be sure)

@ddebrunner
Copy link
Contributor

On timed batch:

I think part of my concern is that the api is meant to separate out what defines the window contents (TWindow) from how it is processed (methods on TWindow).

timedBatch seems to be mixing the concepts of what is contained in the window and when it is processed, by say evicting tuples in last(10) window after a second, so that the contents of the window no longer match its definition.

It seems that it's really trying to define a window like:

last(long period, TimeUnit unit, int max)

where the contents of the window is the last period seconds with a maximum of the max most recent tuples.

Then batch is applied to this window as normal.

@ddebrunner
Copy link
Contributor

Maybe a better question for timedBatch(3, SECONDS, func), is when would it be used over

last(3, SECONDS).batch(func)

?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants