-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reinstate CheckpointAsync #616
Comments
@bartelink Is your request to add support for Manual Checkpointing? |
Yes, in essence; that's the key facility that's missing compared to the CFP v2 API. To do the full port, I will also need a way to do an equivalent of |
I really want start migrating various apps of ours to the V3 SDK wrt CFP logic. I really can't use it as it is. Is there any rough roadmap re when there'll be space to consider this? |
The CheckpointAsync method I'm not sure there will be a point if we move forward with enabling a higher degree of parallelization beyond the partition (ie, multiple threads can be reading the same partition, each thread with a different range of partition key values within the same partition). Won't this higher degree of processing solve the need for buffering? Exposing the partition information won't happen, as this is something that is not exposed anywhere in the V3 SDK. At max, we could think about exposing the LeaseToken for context. While it currently works as 1 lease 1 partition, we are looking into expanding that as mentioned in the first paragraph, so it cannot be inferred from the LeaseToken. Regarding your Document point, you have multiple options:
|
That depends on exactly what you have up your sleeves ;) The benefits of being able to decouple checkpointing from read/process/write cycle include:
Unfortunately, I could go on. The V2 API provides a very powerful scheme; archiving homegrown solutions was possible as a direct result. Can you share some more information as to the design of this scheme please in order to allay my concerns? While it was not my first choice to end up implementing a scheme leaning on this facility, it ultimately provides a very high throughput facility which would be a significant loss. Inferring things from LeaseTokens definitely does not interest me. My desire for information as to which partition a received batch comes from arises from:
Thanks for the serialization suggestions. Might I suggest the CFP migration example show Document parsing vs doing that with Dynamic (have not tried to attack it or looked at the code, but it would seem that it would be relatively easy yet valuable to demo?)? |
@ealsur still really interested on this - we're looking to move to V4 but can't even get off V2 until this API comes back |
This is coming back in V3, right now we got jumped by high-pri work. We want to have that and Context and Estimator with all leases for March. |
Hi @ealsur , just wanted to check if ETA is still sometime in March? |
Sadly it got a bit pushed back but prioritized still. We are working to release Change Feed pull support and this will be worked right after. |
@ealsur, aside from bringing |
V4 won't have Change Feed Processor for the time being (short time), we are working on the base API surface. V4 is not ready for production and it needs first to pass review of base APIs, once base APIs are approved, we can start to onboard features. |
I see. By "short time", what do you estimate? |
Now that the RU cost discrepancies in V3 are finally resolved, the most significant blocker for moving to V3 for transactional processing has been removed from my perspective. This brings this Issue back into focus for me; @ealsur
For my roadmap purposes, indicative dates are naturally always welcome, but I guess I'm also wondering when all 3 concerns ([RU consumption]((#990 (comment)), the reintroduction of I'm ready to validate all of these in the context of V4 when they land. |
Once Change Feed pull model goes GA, I have one PR to enable Estimator per lease, and then another PR to introduce Checkpoint. The blocker for all this is GA of Change Feed pull model. |
Any chance of a quick update on how this is all looking in terms of dependencies? We've been long-fingering various CFP issues on the basis that we'll be moving from V2 to V3 within a reasonable timeframe. (CheckpointAsync and source context for delivered batches are the critical items of interest that represent blockers) |
Is your feature request related to a problem? Please describe.
I've decoupled my consumption from my checkpointing (i.e., I don't necessarily synchronously process all change feed items, instead letting reading get ahead of the checkpointable position in order that I can manage buffering and retries for performance)
Describe the solution you'd like
Provide an overload that exposes something akin to the
IChangeFeedProcessorContext.CheckpointAsync
method which the v2 CFP API formerly exposed (it's present butinternal
atm)Describe alternatives you've considered
Only alternative is to use the v2 CFP SDK, but that closes off tonnes of options and is not tenable from my perspective.
Additional context
I flagged this to @ealsur some time ago in the context of some other work in this repo. While Port to Azure.Cosmos / v3 SDK jet/equinox#144 illustrates that the V3 SDK provides some very nice cleanup in general for a relatively complex use case, not being able to port the CFP aspect easily presents a problem in the medium term with having client teams adopt the V3 SDK.
Consumption code
CheckpointAsync
)IChangeFeedProcessorContext
(see related: Provide partitionid to ChangeFeedObserverFactory.Create #400)Document
parsing codeDocument
class has become internal in V3 - it'd be nice to have a sample illustrate how one might most cheaply probe documents to determine whether they are parseable as a given type as was formerly possible.The text was updated successfully, but these errors were encountered: