Add begin and wait API for LRO #811

tadelesh · 2022-04-26T10:20:22Z

For mgmt. plane SDK, we want to add an API to create poller and poll final result simultaneously. It is an addition for current create poller API (BeginCreateOrUpdate). The possible name for the API is the same with the rest operation name (CreateOrUpdate). The reason is in mgmt. scenario, user always emphasis on the resource result regardless the operation is LRO or not. The problem is duplicity and may involve confusion. Since this should be a non-breaking feature, we may gather more user feedback first.

The text was updated successfully, but these errors were encountered:

jhendrixMSFT · 2022-04-26T15:22:41Z

Another option as discussed offline is to provide a wrapper that customers use to get the same behavior.

type OperationWaiter[TResult any] struct {
    poller *runtime.Poller[TResult]
    err    error
}

func (ow OperationWaiter[TResult]) Wait(ctx context.Context, freq time.Duration) (TResult, error) {
    if ow.err != nil {
        return *new(TResult), ow.err
    }
    return ow.poller.PollUntilDone(ctx, freq)
}

func NewOperationWaiter[TResult any](poller *runtime.Poller[TResult], err error) OperationWaiter[TResult] {
    return OperationWaiter[TResult]{poller: poller, err: err}
}

This would allow callers to condense an LRO into a one-line call.

result, err := NewOperationWaiter(client.BeginSomeLRO(context.Background(), nil)).Wait(context.Background(), time.Second)

xboxeer · 2022-05-07T03:24:45Z

Based on our user study, true feelings of the LRO and none LRO operation is that they make the code look inconsistent. I know we have customer who write infra level code that would be ok with detailed control but there is customer who write none infra level code, consistency and easy to understand is one of their priorities, for those LRO operation, can we provide a none LRO-look-like call as an alternative?

tadelesh · 2022-08-18T07:46:27Z

We've noticed that it usually happened that service change an operation from synchronized to asynchronized which cause Go SDK breaking changes. In order to prevent such breaking changes, I'd prefer to generate both synchronized and asynchronized function for LRO. Besides preventing breaking changes, it could also give a simple usage for customer who does not care of the LRO undelaying logic. @jhendrixMSFT @JeffreyRichter What do you think?

jhendrixMSFT · 2022-08-18T16:00:27Z

It's an interesting idea. Do you have an idea of how many breaking changes are caused by this?

tadelesh · 2022-08-19T02:09:25Z

Among current 7 Go mgmt. SDK packages which have v2+ version, 2 of them have the breaking of such change. And I have met more during my swagger review work. Here are some examples:

tadelesh · 2022-08-25T04:05:27Z

Got another one from SQL. Azure/azure-rest-api-specs#19953

JeffreyRichter · 2022-08-29T22:00:16Z

So first, let me say that services should not make this specific breaking change and so there should not be a lot of these.
However, they do happen, and I appreciate the effort to not break Go SDK customers.
I'll also add that the service team is making this break for a reason and not just to make the customer experience worse.

So, my concern is that if the SDK initiates an LRO and this succeeds but then the polling fails, we return an error to the customer and now the customer doesn't know (since we put multiple operations into a single method) what to do: Did initiating the operation fail and so it never got started? Or, did it get started and now there is a problem with knowing its state?

The error recovery for these 2 scenarios is VERY different and a customer would have to write the proper code to deal with them separately/correctly. Having a wrapper function makes this much harder.

tadelesh · 2022-09-06T15:39:28Z

Can we add a new PollingError to distinguish two steps?
Here is some prototype:
Custom error to wrap inner error:

type PollingError[T any] struct {
	inner  error
	Poller *Poller[T]
}

func (e *PollingError[T]) Error() string {
	return e.inner.Error()
}

func (e *PollingError[T]) Unwrap() error {
	return e.inner
}

func NewPollingError[T any](err error, poller *Poller[T]) error {
	return &PollingError[T]{
		inner:  err,
		Poller: poller,
	}
}

Synchronized LRO:

func (client *LinkedServerClient) Delete(ctx context.Context, resourceGroupName string, name string, linkedServerName string, options *LinkedServerClientDeleteOptions) (LinkedServerClientDeleteResponse, error) {
	lroOptions := LinkedServerClientBeginDeleteOptions(*options)
	poller, err := client.BeginDelete(ctx, resourceGroupName, name, linkedServerName, &lroOptions)
	if err != nil {
		return LinkedServerClientDeleteResponse{}, err
	}
	return poller.PollUntilDone(ctx, nil)
}

Usage:

func TestLinkedServerClient_Delete(t *testing.T) {
	cred, err := azidentity.NewDefaultAzureCredential(nil)
	if err != nil {
		log.Fatalf("failed to obtain a credential: %v", err)
	}
	ctx := context.Background()
	client, err := NewLinkedServerClient("subid", cred, nil)
	if err != nil {
		log.Fatalf("failed to create client: %v", err)
	}
	_, err = client.Delete(ctx, "rg1", "cache1", "cache2", nil)
	if err != nil {
		var pollingErr *runtime.PollingError[LinkedServerClientDeleteResponse]
		if errors.As(err, &pollingErr) {
			poller := pollingErr.Poller
			_ = poller // retry with poller
			log.Fatalf("Inner error: %+v", errors.Unwrap(err))
		} else {
			log.Fatalf("Other error: %+v", err)
		}
	}
}

The generic type is a problem for now. But if this way could work, it should be some more elegant implementation.

JeffreyRichter · 2022-09-06T15:52:27Z

Your sample codes shows how to detect the error but not how to properly recover from the error.
If the client gets a PollingError, how does it continue polling without initiating the operation again?
Also, if the error is a networking error, then was it due to the initiating call or a polling call?
In theory, you could wrap any underlying error with PollingError but the proper error handling becomes much more complex now.

xboxeer · 2022-09-07T03:20:22Z

@JeffreyRichter I think in Chenjie's code

if errors.As(err, &pollingErr) {
			poller := pollingErr.Poller
			_ = poller 
			log.Fatalf("Inner error: %+v", errors.Unwrap(err))
		} else {
			log.Fatalf("Other error: %+v", err)
		}

If the error is a polling error, developers can get that poller from the error and try poll until done again
If the error is not a polling, developers can try to re-launch the request, or just simply throw that as an exception
I understand there will be situation that the poller failed again, however that would be some policy setting in customer's code, both our old way of LRO and the new way of LRO cannot handle the failure situation happened in error recovering code

tadelesh · 2022-09-07T07:50:16Z

Yes. Here is the comparison:
Async:

	poller, err := client.BeginDelete(ctx, "rg1", "cache1", "cache2", nil)
	if err != nil {
		// retry for start operation failure
	}
	_, err=poller.PollUntilDone(ctx, nil)
	if err != nil {
		// retry for polling failure
	}

Sync:

	_, err = client.Delete(ctx, "rg1", "cache1", "cache2", nil)
	if err != nil {
		var pollingErr *runtime.PollingError[LinkedServerClientDeleteResponse]
		if errors.As(err, &pollingErr) {
			poller := pollingErr.Poller
			// retry for polling failure
		} else {
			// retry for start operation failure
		}
	}

tadelesh · 2022-09-26T06:34:13Z

@jhendrixMSFT To answer your concern about the execution time increasement if we provide a synchronized method for LRO, I list the LRO execution time distribution as follows. Most operations can be done in 20 seconds. Regarding we also provide asynchronized begin method for customer to choose, it will not block the customer and I think the change is worthy.

Some statistics related to total time for LRO based on CLI telemetry data:
88% of the LRO calls could be completed in 32sec
92% in ~1min (64 sec)
95% in 2min+ (130 sec)
The raw data could be found in polling time consumption analysis - Azure CLI.xlsx

Time(ms)	operations	percentage
1	92	0.00
2	94	0.004265747
4	118	0.005354874
8	146	0.006625522
16	479	0.021737157
32	1235	0.056044654
64	2325	0.105509167
128	3911	0.177482302
256	6025	0.273416228
512	8143	0.369531675
1024	10524	0.477582138
2048	12760	0.57905246
4096	14827	0.672853512
8192	16653	0.755717916
16384	18200	0.82592122
32768	19482	0.884098748
65536	20356	0.923761118
131072	21003	0.953122164
262144	21433	0.972635687
524288	21669	0.983345435
1048576	21827	0.99051552
2097152	21930	0.99518969

tadelesh added the design-discussion An area of design currently under discussion and open to team and community feedback. label Apr 26, 2022

tadelesh self-assigned this Aug 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add begin and wait API for LRO #811

Add begin and wait API for LRO #811

tadelesh commented Apr 26, 2022

jhendrixMSFT commented Apr 26, 2022

xboxeer commented May 7, 2022

tadelesh commented Aug 18, 2022

jhendrixMSFT commented Aug 18, 2022

tadelesh commented Aug 19, 2022

tadelesh commented Aug 25, 2022

JeffreyRichter commented Aug 29, 2022

tadelesh commented Sep 6, 2022 •

edited

Loading

JeffreyRichter commented Sep 6, 2022

xboxeer commented Sep 7, 2022

tadelesh commented Sep 7, 2022 •

edited

Loading

tadelesh commented Sep 26, 2022 •

edited

Loading

Add begin and wait API for LRO #811

Add begin and wait API for LRO #811

Comments

tadelesh commented Apr 26, 2022

jhendrixMSFT commented Apr 26, 2022

xboxeer commented May 7, 2022

tadelesh commented Aug 18, 2022

jhendrixMSFT commented Aug 18, 2022

tadelesh commented Aug 19, 2022

tadelesh commented Aug 25, 2022

JeffreyRichter commented Aug 29, 2022

tadelesh commented Sep 6, 2022 • edited Loading

JeffreyRichter commented Sep 6, 2022

xboxeer commented Sep 7, 2022

tadelesh commented Sep 7, 2022 • edited Loading

tadelesh commented Sep 26, 2022 • edited Loading

tadelesh commented Sep 6, 2022 •

edited

Loading

tadelesh commented Sep 7, 2022 •

edited

Loading

tadelesh commented Sep 26, 2022 •

edited

Loading