-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement latest_by_offset() UDAF #4782
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @purplefox, awesome to see this!
Some thoughts, (appologies if these have been covered already):
- Null handling: always ignoring
null
values seems arbitrary. Some users may want the nulls, others may not. Choosing one means the UDAF doesn't work for the others. Can we not support both? (I may have to be two separate UDAFs until we enhance our UDAF framework ... no sure). - Is there any reason why you've not added variants for complex types
ARRAY
,MAP
andSTRUCT
? Or is this not possible with the current UDAF framework? Ah... missing a UDAF variant of SchemaProvider... boo! ;)
ksqldb-functional-tests/src/test/resources/query-validation-tests/latest-offset-udaf.json
Show resolved
Hide resolved
I have a couple of points of confusion to raise here: First (easy one?) is naming: please lets not call this "latest" anything. It may seem like a pedantic linguistic quibble but "late" means something to do with time, which this function is not really concerned. I find a useful way to think about this is based on an observation that aggregate functions usually come in pairs, like min/max or lag/lead for example. Once we actually get around to having the long-asked-for latest-record-based-on-timestamp (see e.g. #1128, 2 years old already) then we will want to keep Second: i'm not sure this function is actually even particularly useful in this form. If we expose it as a general UDAF then folks are going to use it with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please put this new function in the docs-md content.
Hey Nick, not sure I'd agree here. Thinking about the kinds of use cases where a "latest" aggregate is useful: Latest stock ticker prices, latest reading from an IoT sensor, latest position of a ride share driver... All these require consistency, and for that the data needs to be partitioned on the id (stock_id, sensor_id etc). If the data isn't partitioned on the id, and we implement an aggregate based on rowtime rather than offset order, then data will get repartitioned to the key, resulting in an aggregate that is only eventually consistent, as different partitions may lag one another. This is unlikely to be the behaviour the user requires for a stock price update! Imho, latest by offset is what most users are going to need. Yes, that means the data needs to be originally partitioned by the id, but that's just the right thing to do for use cases that require consistency (not eventual). |
Imo for the kinds of use cases that this is intended for... stock price updates, IoT sensor updates I suspect ignoring nulls is ok for now (most probably they won't send nulls). I'd suggest going with this for now and creating a new variant which doesn't ignore nulls if there is user demand for it.
Yeah, I think the UDAF framework doesn't support this. |
Hey Tim, thanks for taking time to address my comments. I think perhaps I
didn't explain it well: I am on no account arguing that fills often need
the "latest whatever", and that sometimes they're data will already be
keyed and partitioned in such a way that this is possible with the proposed
function. However, the blessing and the curse of SQL is that it's built of
standardized blocks and the user rightly expects that those blocks can be
assembled in any grammatically-correct combination and provide predictable
results. Let's not add something which we know ahead of time will break
that promise in a variety of not-so-corner cases.
Re your counter example: please don't think I propose that
latest-by-rowtime would be a cure-all for this, I don't! (Although I do
think what you describe as "eventual consistency" is substantially better
than "often just plain wrong", as it always ends up producing the desired
answer :-) )
The whole question of "give me the latest record for some key (for any
given definition of latest)" I suggest is a separate question, that's going
to require a more holistic approach than a UDAF. Acknowledge that's not
what your trying to fix here, just emphasizing where my comments are coming
from.
…On Mon, Mar 16, 2020, 10:42 PM Tim Fox ***@***.***> wrote:
Second: i'm not sure this function is actually even particularly useful in
this form. If we expose it as a general UDAF then folks are going to use it
with group by clauses and then often not get the result that they expect.
Why ? because group-by of anything other than the key of the input topic
messages causes a repartition (and i'm not even 100% certain that we don't
repartition for some of those too) - and once you repartition then the
order of messages in the topic fed to the UDAF is effectively random
because it interleaves messages from the partitions of the input topics
which could be at significantly different offsets from each other and
consumed at different speeds. I'm really against introducing a
general-purpose function like this which only works-as-intended under a
very narrow set of conditions but we have no guard-rails in place to
prevent a user shooting herself in the foot with it.
Hey Nick, not sure I'd agree here.
Thinking about the kinds of use cases where a "latest" aggregate is
useful: Latest stock ticker prices, latest reading from an IoT sensor,
latest position of a ride share driver... All these require consistency,
and for that the data needs to be partitioned on the id (stock_id,
sensor_id etc). If the data isn't partitioned on the id, and we implement
an aggregate based on rowtime rather than offset order, then data will get
repartitioned to the key, resulting in an aggregate that is only eventually
consistent, as different partitions may lag one another. This is unlikely
to be the behaviour the user requires for a stock price update!
Imho, latest by offset is what most users are going to need. Yes, that
means the data needs to be originally partitioned by the id, but that's
just the right thing to do for use cases that require consistency (not
eventual).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4782 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABCXJIEJJL2NP5KIFF33PLLRH4ES7ANCNFSM4LMPZU5Q>
.
|
I'm a pragmatist and I prefer to spend my time finding a useful solution that solves the issue for the 90% in a short amount of time, rather than spending 10 times as long finding a solution that will benefit only a further 10% of users. (Numbers are figurative, obviously ;) ) I think that's particularly important for a project like ksqlDB where we need to encourage adoption above all else. I think we should merge this as I think it will provide value. We can always provide further functions if there is demand at a later date. Nick - I think you've made valid points even though I don't necessarily agree with them. If you have a strong objection to this then please use your right to veto it by requesting changes on a review then it can go back to the debate stage. Otherwise I will merge it. |
i see this is merged anyway - i understand it's the end of the working day now where you are @purplefox so i understand. i only came back here to say two quick things: (1) - you didn't reply to my request for a change in name of the udaf - did that get overlooked ? (2) -re
i won't engage in this game of "you're either with us or you're against us". It's clear from the originating issue (3985) that there's a variety of evolving opinions around this, including from yesterday. By this standard I could go ahead and find any github issue where there are a variety of opinions, hack up a solution i personally preferred, and then "dare" other community members to stop me merging it. |
I think that there was a long discussion on the GitHub issue pointing out different pros/cons of different aspects, and I don't think that @purplefox just "hacked up a solution he prefers" -- it was more or less the consensus on the issue to do this PR. However, asking somebody to "vote it down" is not really helpful to find consensus either. Also, to not given people time to respond (also mind the time difference) and to just merge a PR seems undesired. Just my 2 cents. |
@blueedgenick You raise an interesting point about the reordering. I'd not thought of that! As you say, if KSQL is doing an internal repartition then the ordering by offset may become nondeterministic if there is more than one partition. Humm....
The problem is... how is the user supposed to know this by looking at the query?!? It's not very intuitive. Problem is, to support a UDAF that actually saved the latest value based on some timestamp, e.g. ROWTIME, we need to enhance our UDAF framework, which is more work. Depending on this enhancement to implement a better/alternative 'latest' style UDAF would delay its release, and is probably the reason it's been 2 years and still no
Ideally, we'd have
But of course, this is more work. Though once we've done that work we'd not really want to keep the oddly behaving Options I see are:
@blueedgenick regarding the naming ... do you have any alternative suggestions? |
That's making some big assumptions about people's data IMHO. |
The reordering issue was discussed on the Github issue, too: #3985 (comment) The main problem is that even if Hence, we can only make it deterministic if we introduce a proper operation that upserts a stream into a table without changing the key (ie, something like the Personally, I would never ship a "broken" feature but some people are more pragmatic than I am. |
Indeed.
The PR was already approved before Nick commented. The process, as I understand it, is if it's approved it can merged. If someone subsequently has strong objections then they can start a new debate and submit new PRs. If folk don't like that process let's start a discussion about changing it. Moreover I specifically gave Nick the opportunity to prevent the merge and a further grace period of a full working day before actually merging it, which I didn't really need to as it was already approved, but I did that anyway. If you don't want a PR merged than please make that clear, either in a comment or by requesting changes. If that had happened I also wouldn't have merged it. It didn't happen.
|
Good to know - thanks.
Agreed. However, this seems much less of an issue than the nondeterminism of dealing with offsets. Offsets are an artefact of Kafka, where as the event-time is an artefact of the system being modeled. Hence, in my mind, have a nondeterministic result because the system allows, for example, two IOT sensor readings, with the same id, with the same event time, but different values, seems acceptable: how would we be able to choose between them with no other data? However, have nondeterminism introduced because we're using offsets and a repartition has transparently happened behind the scenes: that's less than ideal IMHO. Personally, I'd prefer not to have such UDAFs released as they're just going to confuse people or damage their view of KSQL when the result doesn't turn out to be 'correct' as they see it.
Would this fix the situation where an IOT sensor stream wasn't partitioned by the sensor id? If it does, can you explain as I don't follow.
I hear you. There's a cost involved in maintaining this going forward. Even if we deprecate the UDAF for new queries we'll need to continue on-going support for historic queries already using it. I guess we could drop it at v1.0. |
Luckily, that's not what happened, and your interpretation seems way off the mark, and a little condescending. |
Even if a PR is approved, IMHO, if somebody raises a concern it should not be ignored.
I disagree. IIRC, you left a comment and merged the PR about 9 to 12 hours later (if this is your definition of
That is what @blueedgenick did IMHO... He explicitly requested to at least change the name of the UDAF. I agree that offset non-determinism is worse than timestamp non-determinism -- and I also think that for this case, we can actually resolve the timestamp no-determinism if we avoid repartitioning.
Well, if both message have the same ID and same timestamp and land in the same partition, the offset can be used as "tie breaker". If you reprocess the same data from the input topic, this tie breaker will give you a deterministic result (and it would be "correct" assuming that no re-ordering happened when the sensor sent the data to the topic). However, using the offset as tie breaker in a repartition topic is non-deterministic because each time the query is re-run, the repartition topic is re-populated and thus the offsets between runs changes. Does this make sense?
It does not directly. However, for this case I would recommend a different pattern. Instead of doing auto-repartitioning, a user must repartition the input explicitly, ie, use two queries. For this case, the repartitioning would be done only once (by the first query) -- and if the second "to-table-query" is repeated (but not the first repartitioning query) the result would be deterministic again as the input to the query did not change. For the case that the repartition query is repeated, we can explain to the users that this result in a different input data stream for the second query (because repartitioning is non-deterministic) and thus, if the input data is different, it is not reasonable to ask for the same result. Hence, splitting it into two queries gives us a better way to explain what happens to users, while an internal repartitioning (and a user might be unaware of it) introduces some non-determinism the user cannot control. Does this make sense? |
Description
Please see #3985
Implements latest_by_offset() UDAF which computes the latest value for a column. Latest being defined as offset order.
Testing done
Added new unit test and QTT test
Reviewer checklist