Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove context as a parameter the user provides #474

Closed
wants to merge 1 commit into from

Conversation

rylev
Copy link
Contributor

@rylev rylev commented Nov 1, 2021

As a continuation to discussion started here:

This removes the Context parameter from all pipeline based operations while keeping the context as an internal detail (meaning all policies still have access to the Context and can use it as a key/value store (though this functionality still needs to be implemented).

We originally included Context as a parameter for all operations because we believed it would be essential for cancelation just like in other SDKs. However, #457 and the related discussion showed that we don't need to send an explicit struct for doing cancelation since Rust's futures allow for cancelation through other means. The only remaining feature that Context is used for is as a key/value store which is a secondary feature that is not widely used. Requiring the context argument in every operation call is a heavy price to pay to support this feature.

This change makes it so that (by default) users don't have to pass a Context object since this would require the user to type Context::new when almost always they don't need a custom context object. Instead the context is constructed internally so that the pipeline (and thus all individual policies) can still take advantage of the Context.

Open questions:

  • Presumably there are cases where the user wants to set a key/value in the Context key/value store when setting up an operation. After this PR this will no longer be possible. Since Rust does not offer default args or method overloading, to add this feature back we will need to add a new method for each operation that allows setting Context. Before we add this back, I'd like to understand what use cases there are for the user to set a key/value pair in the Context from outside of the pipeline.
  • Right now each operation is required to construct a Context object that gets passed to a PipelineContext object. We may be able to simply construct the Context from inside the PipelineContext constructor.

cc @JeffreyRichter @heaths

Copy link
Contributor

@MindFlavor MindFlavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes! I've hated that useless Context since we added it few months ago! 👍😀

I wholeheartedly agree with this PR. ❤️

I would go further though: now we have an aptly named PipelineContext that instantiates an useless Context (I've shown an example below).

Can we remove the Context altogether? Any required pipeline-related context information should go in PipelineContext anyway.

options: GetCollectionOptions,
) -> crate::Result<GetCollectionResponse> {
let mut request = self.prepare_request_with_collection_name(http::Method::GET);

let mut pipeline_context = PipelineContext::new(ctx, ResourceType::Collections.into());
let mut pipeline_context =
PipelineContext::new(Context::new(), ResourceType::Collections.into());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can tackle this in a following PR but I think we can get rid of the Context here as well: PipelineContext is already enough (no need to instantiate an inner, useless Context).

Copy link
Contributor

@yoshuawuyts yoshuawuyts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how much lighter these API calls feel after removing of the required Context param. I think this will feel a lot nicer to use by end-users. And because individual methods still have both *Options instances and required arguments that are passed, it doesn't seem like we're losing much in terms of configurability or expressivity.

I'm in favor of this!

@rylev rylev force-pushed the remove-context-parameter branch from 39ff91d to acd0feb Compare November 1, 2021 12:36
@JeffreyRichter
Copy link
Member

I'm not sure what happened with the discussion here. The Context still serves a purpose; it was never just about cancellation. And customers need a way to specify these key/value pairs and pass down into the pipeline. This functionality is still a requirement so we can't just get rid of Context without having some way to pass the key/value pairs through the system. The key/value pairs are required to make distributed tracing work and to override policy behavior on a per-operations basis such as to change the number of retries.

So, if we can't come up with a replacement for passing the key/value pairs down, then we still need Context.

@MindFlavor
Copy link
Contributor

And customers need a way to specify these key/value pairs and pass down into the pipeline. This functionality is still a requirement so we can't just get rid of Context without having some way to pass the key/value pairs through the system. The key/value pairs are required to make distributed tracing work and to override policy behavior on a per-operations basis such as to change the number of retries.

Ok, that makes sense.
If the only remaining purpose is to influence the pipeline execution I suggest to change its name to something like PipelineOptions. Context is misleading, especially since it does not represent the cancellation context anymore.

Also I would prefer to make it optional.

What do you think?

@yoshuawuyts
Copy link
Contributor

yoshuawuyts commented Nov 2, 2021

@MindFlavor Now that we know we should keep exposing the ability for customers to pass key-value pairs 1 into the API calls, it opens up the ability for us to accurately design the design of our endpoint calls as a whole.

Another ergonomic challenge we face similar to "empty context", is having "empty options" instances:

// no options set
let options = CreateDatabaseOptions::new();
let database = client.create_database(Context::new(), "my_database", options).await?;

// various options set
let options = CreateDatabaseOptions::new()
    .consistency_level(ConsistencyLevel::Strong);
let mut cx = Context::new();
cx.insert("key", "value");
let database = client.create_database(cx, "my_database", options).await?;

I believe we could experiment with the async builder pattern to instead enable us to do something along the lines of:

// no options set
let database = client.create_database("my_database").await?;

// various options set
let database = client
    .create_database("my_database")
    .consistency_level(ConsistencyLevel::Strong)
    .insert("key", "value")
    .await?;

Adopting an API like this would take a lot of figuring out. In particular I'm uncertain how far we can get without .await respecting IntoFuture. But I think now that we have a sharper understanding of the requirements for our APIs, it's worth taking a step back and looking at our designs as a whole.

Footnotes

  1. For this use I think we'd be better served using a typemap, but for simplicity let's assume we'd use a hashmap-like interface for now.

@MindFlavor
Copy link
Contributor

I believe we could experiment with the async builder pattern to instead enable us to do something along the lines of:

// no options set
let database = client.create_database("my_database").await?;

// various options set
let database = client
    .create_database("my_database")
    .consistency_level(ConsistencyLevel::Strong)
    .insert("key", "value")
    .await?;

❤️

This is basically the same API I was aiming to before we switched to the actual approach. Notice the similarities with this old example:

// Let's get 3 entries at a time.
let response = client
.list_documents()
.consistency_level(response.unwrap())
.max_item_count(3i32)
.execute::<MySampleStruct>()
.await?;
)

So I really like your proposal 😃. In my opinion the fluent syntax is much easier to read than spreading the parameters in multiple, different structs (context, options and so on...) but this is fundamentally different to what we are doing right now so 🤷 ...

@heaths
Copy link
Member

heaths commented Nov 2, 2021

So, if we can't come up with a replacement for passing the key/value pairs down, then we still need Context.

@JeffreyRichter what key/value pairs, exactly? If you mean something like a property bag in JS, options model in .NET, etc., I would think each method - if it needs it - would follow the same practice. Typically, that's {MethodName}Options across most languages.

@JeffreyRichter
Copy link
Member

About Context: Let's take a step back and look at the overall architecture.

  1. Customers create a client using 3 things: an endpoint, credentials, and options. These 3 things apply to ALL operations performed via this client object. Also, in other languages, clients must be thread safe (usually done by making policies objects immutable) so that multiple threads can share a client - for Rust, it is OK to clone a client (but this can cause a problem with credentials).

  2. Internally, the client ctor creates a pipeline which is an ordered list of policies (the passed-in credential gets turned into a policy). The concept of a "pipeline" is not generally exposed to customers; it is an internal implementation detail. There are many policies in the pipeline such as retry, telemetry, logging, transport, credential, etc. Almost all of these policies can be configured by the customer. Customers accomplish this by setting values in the options structure passed to the client ctor. Once these policy options are set, they are immutable for the lifetime of the client object. If the customer wants to perform some operations with a different transport or different logging, then the customer must create a different client object using the desired options structure.

  3. ALL client operations take inputs and these are used to create an HTTP request object (using the client's endpoint) and then send this request object down the pipeline. This means that ALL policies are applied to ALL client operations in exactly the same way. So, policies are operation agnostic, but they are specific to an Azure service. At the end of the pipeline is the transport policy which sends the HTTP request to the service. The service responds to the transport policy which then returns an HTTP response back through the pipeline. If an error occured, the retry policy sends the request again and some policies execute once per retry (logging, some credential, and transport). FYI: Some policies execute once per operation (telemetry, some credential, distributed tracing). When the client method gets the response, it typically deserializes some response headers and/or JSON body into a structure which is returned to customer code.

As you can see, the pipeline policies are always applied to ALL operations of a client. But, occasionally, a customer wants to override the behavior for a specific call to an operation. For example, if the customer knows they are downloading a 4 terabyte blob, they might like to have different retry policy options. To accomplish this, the customer sets some per-method call "context" with a key like "retry" that has a value like "maxTries=10". The customer sets this key/value context and then passes it to the client's blob download function which passes it to the pipeline causing it to flow through all the policies. The retry policy looks specifically to see if a "retry" key exists in the context and if it does, it reads the value and uses maxTries=10 instead of what was passed to the client ctor's options structure. End result: For this one call to downloadBlob, the retry policy is altered by the customer.

Here's another example: it is highly likely that the customer's code is inside a service itself. In this case, another service can be the client of the customer's service and the customer's service is the client of an Azure service. The request INCOMING to the customer's service can have special HTTP headers on it that are used for distributed tracing. These headers must flow from the originating service through the customer's service, and then through to Azure services. Here is how this is accomplished: The customer writes code that receives the incoming HTTP request. Then, the customer creates a "context" and sets a "tracing" key with the headers as its value. The customer MUST pass this context around their code as it represents the INCOMING request. Eventually, the customer service code calls to Azure by invoking a method on a client.
This context is passed to the method and the distributed tracing policy in the pipeline looks for the "tracing" key in the context. If found, this policy extracts the INCOMING headers, applies them to the OUTGOING HTTP request and sends these headers to the Azure service. End result: the distributed tracing information flows from the originating service through the customer's service and then all the way to the Azure services. It's now possible to see everything that happened in response to a single operation invoked by some end-user somewhere on the Internet.

In Go, you can see some our code where we set these key/value pairs on a context here. See the call to context.WithValue.

.NET uses AsyncLocal variables (similar to a thread-local variable) as a way for customers to set these key/value pairs and so they get flowed through the pipeline without polluting the client's method's signatures. This makes the methods "look nice" but this introduces other problems so it is not prefered.

Java has an actual Context class for the key/value pairs. If you look here you'll see a bunch of client methods that expose a Context parameter (it's always the last parameter). In C++, Context is also always the last parameter, see here for some examples on the KeyVault KeyClient.

For Go, Context is always the first parameter. See here for some examples in the Cosmos client.

@MindFlavor
Copy link
Contributor

Thank you @JeffreyRichter for clearing things out. So you are confirming that the only thing missing is:

As you can see, the pipeline policies are always applied to ALL operations of a client. But, occasionally, a customer wants to override the behavior for a specific call to an operation. For example, if the customer knows they are downloading a 4 terabyte blob, they might like to have different retry policy options. To accomplish this, the customer sets some per-method call "context" with a key like "retry" that has a value like "maxTries=10". The customer sets this key/value context and then passes it to the client's blob download function which passes it to the pipeline causing it to flow through all the policies. The retry policy looks specifically to see if a "retry" key exists in the context and if it does, it reads the value and uses maxTries=10 instead of what was passed to the client ctor's options structure. End result: For this one call to downloadBlob, the retry policy is altered by the customer.

That is exactly what @yoshuawuyts is proposing: an optional way to override the pipeline configured values, on a call-basis. What I would prefer is to rename it from Context to PipelineOptionsOverride (or something like that). The mere fact that we are debating what Context is evidence that is poorly named IMHO.

The SDK customer can create a PipelineOptionsOverride (with tracing and whatnot) and pass it to every call they need to customize (besides what they did when creating the pipeline in the first place).

As how to do it, Rust does not have implicits1 like Scala does (and it seems C# does too with AsyncLocal). While something could be cooked up with macros, I'd rather have the explicit, optional parameter or, even better, the builder pattern @yoshuawuyts mentioned.

Footnotes

  1. Discussed here: https://internals.rust-lang.org/t/implicit-parameters/14514.

@cataggar cataggar added Azure.Core The azure_core crate design-discussion An area of design currently under discussion and open to team and community feedback. labels Nov 3, 2021
@JeffreyRichter
Copy link
Member

I'm open to a different name. I do not think it should have "Pipeline" in the name because the pipeline is not a concept that is immediately exposed to customers. There are also many precedents for tweaking an operation call by way of some "context". For example, in the .NET SDK, this search gives 127 hits for "RequestContext". And, of course, Google calls the context a Context in Go.

My point is that the term Context is frequently used for this purpose.
"PipelineOptionsOverride" is not an ideal name. Also, the distributed tracing example is not about overriding pipeline options at all.
Right now, I'd stick with "Context" but if others want to suggest alternatives, then I'm open to that.

@rylev
Copy link
Contributor Author

rylev commented Nov 3, 2021

Right now, I'd stick with "Context" but if others want to suggest alternatives, then I'm open to that.

I'd like to stick with Context for the time being if just to make sure we're on the same page with everything else.

@yoshuawuyts's suggestion above for using the builder pattern seems ideal to me. This will allow the user to specify optional things like options and key/value without being required to write the boiler plate of CreateDatabaseOptions::new() and Context::new() every time (since these are often left in their default states.

Something like the following would look and feel really nice:

database_client
  .create_database("my_database_name") // if the user didn't have any options of context they'd just call `.await` here
  .consistency_level(ConsistencyLevel::Strong) // this is an option the user doesn't need to specify
  .insert(RetryCount::new(5)) // this is a "key/value" specified as a type but looked up internally by type name
  .await?;

@JeffreyRichter the following seems to be able to handle everything you mentioned and while being different from other clients, feels like the very idiomatic way Rust usually handles optional arguments. Thoughts?

I'm going to close this PR for now since the number of changes distracts from the topic at hand. As a next step, we'll move one of the operations (likely create_database) over to this builds pattern to get feedback on it.

This is also not too dissimilar to @MindFlavor's original design albeit with some key tweaks.

@rylev rylev closed this Nov 3, 2021
@JeffreyRichter
Copy link
Member

Actually, the context is usually NOT kept in its default state. This is true for our simple examples. But, if the customer is building a service, then the Context MUST be flowed through in order for distributed tracing to work. In this case, ALL of our methods and the customer's methods MUST pass the Context explicitly. Faliure to do this results in no distributed tracing.

@rylev rylev deleted the remove-context-parameter branch November 8, 2021 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Azure.Core The azure_core crate design-discussion An area of design currently under discussion and open to team and community feedback.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants