-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed Tracing #261
Distributed Tracing #261
Conversation
Done
TODO
|
@@ -132,7 +132,7 @@ public override string GetStatus() | |||
/// <param name="context">The orchestration context</param> | |||
/// <param name="input">The typed input</param> | |||
/// <returns>The typed output</returns> | |||
public abstract Task<TResult> RunTask(OrchestrationContext context, TInput input); | |||
public abstract Task<TResult> RunTaskAsync(OrchestrationContext context, TInput input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change, which cannot be allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cgillum
Sorry, I didn't notice that. However, I might mistake to refactor the name. I recover it.
6b2c65b
to
f3d15b3
Compare
Hi @cgillum The CI fails, it looks the script can't refer the $ServiceBusConnectionString . Hmm I have no idea. Any thoughts? If not, I'll ask the Azure DevOps guys. |
@TsuyoshiUshio I'm not sure. The CI passed earlier today for a different PR. I have not made any changes to the pipeline. |
@cgillum I'm asking the Azure DevOps guys. |
@@ -0,0 +1,38 @@ | |||
using Microsoft.VisualStudio.TestTools.UnitTesting; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using [](start = 0, length = 5)
File header + move using into namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this still needs to be done?
@@ -11,6 +11,8 @@ | |||
// limitations under the License. | |||
// ---------------------------------------------------------------------------------- | |||
|
|||
using System.Diagnostics; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using [](start = 0, length = 5)
move into namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
{ | ||
|
||
#if NETSTANDARD2_0 | ||
private static AsyncLocal<TraceContext> current = new AsyncLocal<TraceContext>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private [](start = 8, length = 7)
don't use private modifier per rest of repo convention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @simonporter ,
Before fix this, Could you clarify the convention? Should I use internal or default for the modifier in this case? Why do you forbid private modifier? I'd like to know the reason for the future contribution of this repo. Also want to properly fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done that. I change private into default.
One note. I'm adding |
Hi @simonporter , Sounds good for the convention? I'm working with @cgillum for enabling Distributed Tracing for Durable Functions. However, we try to make it available for other teams that use DurableTask. This PR is not definitive version. If you don't like my implementation, you can dispose it. However, the architecture and what I experienced might be helpful at least for the future implementor. Could you give me 30 min to explain this PR in person? I'll come. (Hopefully with @cgillum . ) This PR might be too big for review in here. |
The CI fails are the CI issue. Not code. the lack of permission for public PR. |
@TsuyoshiUshio thank you for bringing this to our notice. We (VsTest team) are working on getting distributed test runs to work with public projects. Will keep you updated. |
Hi @ShreyasRmsft , Thank you for telling me! Sure. Sounds exciting. until then we just add PR contributors some kind of group. |
@ShreyasRmsft Could you tell me the best place to file this issue? I spent several days for this issue. I don't want people have the same experience as me. Also, if it is OK. could you tell me when it is going to be published? I also MSFT guy. However, I can't identify you from your GitHub account. :) |
@TsuyoshiUshio you can raise the issue on https://github.com/Microsoft/azure-pipelines-tasks/issues |
Thank you, @ShreyasRmsft , I've done that! |
Hi @simonporter This is the sample of the raw message with Two Layered OrchestratorTraceContext. Please see the
|
Hi @simonporter , @cgillum , I implement the test for ContinueAsNew. The current behavior is that Orchestration which is created by ContinueAsNew is considered as the same Orchestration. It is because of the |
@@ -612,6 +614,7 @@ static TaskHubInfo GetTaskHubInfo(string taskHub, int partitionCount) | |||
session.StartNewLogicalTraceScope(); | |||
|
|||
List<MessageData> outOfOrderMessages = null; | |||
var current = Activity.Current; // Correlation for checking the state of the Activity.Current. TODO remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
src/DurableTask.Core/TraceContext.cs
Outdated
// ---------------------------------------------------------------------------------- | ||
|
||
namespace DurableTask.Core | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed could this be refactored to encapsulate activity, thus reducing the need to decide between both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduce the new structure. I encapsulate activity to the TraceContext object. New name is
- TraceContextBase (abstract class)
- W3CTraceContext (implementation of the TraceContextBase)
- HttpCorrelationProtocolTraceContext (Implementation of the TraceContextBase)
For enabling this, I create a Factory to refer the configuration of the Protocol settings.
These two Contexts have limited property. Some property is not serialized to reduce the size of the message. It requires special serialization, It is handled on TraceContextStore
{ | ||
/// <summary> | ||
/// TraceMessages for Distributed Tracing | ||
/// </summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a static class? also might make sense to rename to TraceConstants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/// For the Newton.json 7.0.1 requires JsonSerializerSettings. It will all so requires DurableTask users. | ||
/// To avoid this breaking change, I provide TraceContextStore which support the serialization/deserialization of the <see cref="TraceContext"/> | ||
/// </summary> | ||
public class TraceContextStore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is only used while storing and retrieving from Message Data, would it make sense to handle this simply as a property and constructor on Trace Context? We could also just use a custom JsonConverter attribute on top of the TraceContext since it seems like it needs to be serialized in a special way? we could also handle it via a private JsonProperty on message data. Incase neither of these are possible consider renaming to maybe SerializableTraceContext?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the TraceContextStore and make it available for TraceContextBase. I implement it as property and static method. I tried to use constructor, however, it requires additional logic to move restored object to the target object. It could be error prone. I create a Restore method simply pass the json string it creates TraceContextBase subclass object.
@@ -341,6 +341,11 @@ public void HandleTaskFailedEvent(TaskFailedEvent failedEvent) | |||
var taskFailedException = new TaskFailedException(failedEvent.EventId, taskId, info.Name, info.Version, | |||
failedEvent.Reason, cause); | |||
|
|||
// correlation | |||
var traceContext = CorrelationTraceContext.Current; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are traceContext and activity used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
if (orchestrationState.OrchestrationStatus == OrchestrationStatus.Completed) | ||
{ | ||
CorrelationTraceClient.TrackDepencencyTelemetry(dependencyActivity); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these all separate Ifs? instead of a single if ((orchestrationState.OrchestrationStatus == OrchestrationStatus.Completed) ||
(orchestrationState.OrchestrationStatus == OrchestrationStatus.Failed)) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Guid traceActivityId = Guid.NewGuid(); | ||
var session = new ActivitySession(this.storageAccountName, this.settings.TaskHubName, message, traceActivityId); | ||
session.StartNewLogicalTraceScope(); | ||
|
||
// correlation | ||
string name = $"{TraceMessages.Activity} {((TaskScheduledEvent)session.MessageData.TaskMessage.Event)?.Name?.GetTargetClassName()}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonporter to confirm here, but I don't believe this custom logic to get the class NAme would always hold, particularly in the case of a custom NameVersion object manager. Can this be moved one level up to The TaskOrchestrationManager in Core? In that case you would have access to the object manager and can get the class NAme more simply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adarsh1 Where can I find the TaskOrchestrationManager? I can't find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry meant the TaskActivityDispatcher
currentTraceContext = currentTraceContext ?? current.CreateTraceContext(); | ||
|
||
data.TraceContextStore = TraceContextStore.Create(currentTraceContext); | ||
rawContent = await messageManager.SerializeMessageDataAsync(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like data is the only thing changing between the if and else, should the raw content initialization simply stay out of this if else structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Test/DurableTask.AzureStorage.Tests/DurableTask.AzureStorage.Tests.csproj
Show resolved
Hide resolved
@@ -35,4 +35,5 @@ | |||
<ItemGroup> | |||
<ProjectReference Include="..\DurableTask.Core\DurableTask.Core.csproj" /> | |||
</ItemGroup> | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undo if no change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Thank you @adarsh1 ! I'll do it as we discussed. I'll start this week. :) I appreciate your review. |
I fix all your request. Review it again, please. I checked the Scenario Test passes and execute the sample and check if we can see the same tracing on the portal. TODO
|
/// This property is not serialized. | ||
/// </summary> | ||
[JsonIgnore] | ||
public Activity CurrentActivity { get; set; } // TODO make it protected after refactoring is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make protected, or remove TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double check the dependency. This property depend on the subclass, factory, and test class. I hope only these three can use this property. I madeit internal and removed TODO.
Traceparent = CurrentActivity.GetTraceparent(); | ||
Tracestate = CurrentActivity.GetTracestate(); | ||
ParentSpanId = CurrentActivity.GetParentSpanId(); | ||
// ParentId = activity.Id, // TODO check if it is not necessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
public override TimeSpan Duration => CurrentActivity?.Duration ?? DateTimeOffset.UtcNow - StartTime; | ||
|
||
/// <inheritdoc /> | ||
public override string TelemetryId => throw new NotImplementedException(); // TODO Implement the HTTPCorrelation Protocol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this Safe? Would things fail if HttpCorrelationProtocol is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HttpCorrelationProtocol is not fully tested. App Insights team encourage to use W3C now. So I start implement/test with W3C. However, I implement HttpCorrelationProtocol. It should work, however, to make sure it works, it requires Scenario Test with TelemetryInitializer for HttpCorrelationProtocol.
public string SerializableTraceContext => | ||
JsonConvert.SerializeObject(this, new JsonSerializerSettings() | ||
{ | ||
TypeNameHandling = TypeNameHandling.Objects, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really should be using a custom JsonConverter here, if we need special handling.
If we don't have the time to do that for now maybe a TODO would be ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add TODO.
Guid traceActivityId = Guid.NewGuid(); | ||
var session = new ActivitySession(this.storageAccountName, this.settings.TaskHubName, message, traceActivityId); | ||
session.StartNewLogicalTraceScope(); | ||
|
||
// correlation | ||
string name = $"{TraceMessages.Activity} {((TaskScheduledEvent)session.MessageData.TaskMessage.Event)?.Name?.GetTargetClassName()}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry meant the TaskActivityDispatcher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ad a few more nits around existing TODOs, other than that looks ok to me
…the current users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the requests
Hi @cgillum , I fix the issues that you requested. However, some need to discuss with you. I leave these as unresovled comments. Could you have a look these three? |
Hi @TsuyoshiUshio - I have reviewed your latest updates and responded to your questions. There are also a few requested fixes that have not been made yet. Please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix issues.
Hi @cgillum , I fixed them. Only one concerns left.
|
Hi @cgillum this is friendly reminder. :) Please have a look. If you are ok, let's merge it. |
Hi. I'm not sure if this is the correct place to ask this question (I can't find a link to some kind of issue or feature request). Is there any kind of limit to number of executions that will be logged (and correlated) to AI? In some cases, I may have tens of thousands of activity function executions within a single orchestrator execution. I'm also curious what an eternal orchestrator would look like. To be clear, I'm really asking about what the limit of the app insights UI might be. I'm sure log analytics would store whatever is sent to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - thanks for your patience. I suggest one more small change and then let's merge this.
Hi @cgillum I fixed all of the issues. |
Thanks! I will merge this now. Glad to get this first step finally checked in. :) |
Hi @jkerak You are using tens of thousands of activity functions in a single orchestrator right? I'm not sure about the limitation of the durable side, however, Application Insights has some quota https://docs.microsoft.com/en-us/azure/azure-monitor/app/api-custom-events-metrics#limits This issue might be a good place to talk about correlation. Azure/azure-functions-durable-extension#939 |
Thank you @cgillum for kind support of this PR. I'll go in implementing durable functions side. |
Hi @cgillum
I'm creating this pull request to have this branch to hands off to you.
I open this PR for visibility.
What I've done:
TODO
Now one of the test case will fail for not implementing it for multiple orchestration retry case. However, I open this for share.