Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moved Performance Counters to its own Telemetry Consumer #2122

Merged
merged 7 commits into from
Sep 12, 2016
Merged

Moved Performance Counters to its own Telemetry Consumer #2122

merged 7 commits into from
Sep 12, 2016

Conversation

galvesribeiro
Copy link
Member

@galvesribeiro galvesribeiro commented Sep 7, 2016

More one step forward on .Net Core move following #368

In this PR, I moved the Perf Counter specific code to its own telemetry consumer and made it a IMetricTelemetryConsumer.

Basically the OrleansCounterControl.exe still be used to install the default Orleans Counters however, now the Statistics are published not just to the perf counter but for all registered IMetricTelemetryConsumer.

To use it install nuget Microsoft.Orleans.OrleansTelemetryConsumers.Counters and configure as:

XML

<Defaults>
  <Telemetry>
    <TelemetryConsumer Assembly="OrleansTelemetryConsumers.Counters" Type="OrleansTelemetryConsumers.Counters.PerfCounterTelemetryConsumer" />
  </Telemetry>
</Defaults>

or

Code

LogManager.TelemetryConsumers.Add(new PerfCounterTelemetryConsumer());

<?xml version="1.0"?>
<package xmlns="http://schemas.microsoft.com/packaging/2010/07/nuspec.xsd">
<metadata>
<id>Microsoft.Orleans.OrleansTelemetryConsumers.Counters</id>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive question, but do you think there is still value on having a separate nuget package for the perf counters installer? Maybe they can still live in different DLLs, but potentially shipped in a shared nuget?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A package for counter control already exists. Why not have this in that package? This would lead to two interdependent counter packages we'd need to maintain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that. I just don't like the idea of have the counter control referenced in the project like we have today.

There is a tools folder on nuget that we can add it. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, counter control is already updated to use the consumer dll instead of the types that once where in OrleansRuntime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, @jason-bragg I think the intent is the user to Add/Register the consumer, and not the .exe from counter control. We should add the .exe inside the consumer nuget IMHO and get rid of the old nuget.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@galvesribeiro, I've no objection with moving the exe to this package. IMO, what is important here is to have the consumer and exe in the same package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok @jason-bragg will do that. Please let me know if there nothing else to address and I'll make the change, rebase, push. Thanks :)

@jason-bragg
Copy link
Contributor

To preserve behavior in test code and bang on this change more (short of writing tests), I suggest adding the xml line to test configs, and the programmatic code to default test cluster configuration.

internal static class OrleansCounterManager
{
private static readonly Logger logger = LogManager.GetLogger("OrleansPerfCounterManager", LoggerType.Runtime);
//private static readonly List<Tuple<StatisticName, bool>> counterNames = new List<Tuple<StatisticName, bool>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz remove commented out code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ops, forgot that.

@jdom
Copy link
Member

jdom commented Sep 7, 2016

Please also add a line in the Changelog.md saying that you now have to opt into perf counters (and potentially link to documentation too once we have that). Remember that this changelog is end-user facing, no need to go into a lot of technical details, just link to the PR (and later on the doc)

@@ -585,6 +586,7 @@ private Silo LoadSiloInNewAppDomain(string siloName, Silo.SiloType type, Cluster
new object[] { });

appDomain.UnhandledException += ReportUnobservedException;
appDomain.DoCallBack(RegisterPerfCountersTelemetryConsumer);
Copy link
Member Author

@galvesribeiro galvesribeiro Sep 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify... This hack here and on TestSiloHost is something @jdom and myself agreed to leave for now since we can't configure properly via code config the telemetry consumer inside the test cluster/silo AppDomain from the xunit tester.

I'll revisit it later once I get the next task for the new config and silo host model and this will (finally) get out here.

@galvesribeiro
Copy link
Member Author

Don't know why the CI is failing now... It is building and running all the tests here (even from build.cmd script)... Any ideas?

<file src="OrleansTelemetryConsumers.Counter.XML" target="lib\net451" />
<file src="OrleansTelemetryConsumers.Counter.dll" target="lib\net451" />
<file src="OrleansTelemetryConsumers.Counter.pdb" target="lib\net451" />
<file src="$SRC_DIR$OrleansTelemetryConsumers.Counters\**\*.cs" target="src" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build error is when building the nugets.

File not found: 'OrleansTelemetryConsumers.Counter.XML'.

There is something odd here. Is there really a 'Counter' and 'Counters' assembly (and related XML doc files)?

@@ -20,7 +20,7 @@
<DefineConstants>DEBUG;TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
<DocumentationFile>bin\Debug\OrleansTelemetryConsumers.Counters.XML</DocumentationFile>
<DocumentationFile>obj\Debug\OrleansTelemetryConsumers.Counters.XML</DocumentationFile>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is wrong, should be bin, not obj

@galvesribeiro
Copy link
Member Author

galvesribeiro commented Sep 8, 2016

@jason-bragg I was talking with @jdom...

Do you know why OrleansCounterControl.exe exist? There is an OrleansPerformanceCounterInstaller installer class that is invoked if you use Installutil.exe and install the same counters the same way... Do you have any idea? If there is no specific reason, I think we can kill that unecessary thing and let people install it using Installutil.exe which is already installed on each machine with .Net Framework... I just tested here and it installed just fine the counters.

@sergeybykov any clues?

Can we remove it?

@jason-bragg
Copy link
Contributor

Unfortunately I don't know the history of Counter control. I do know it loads assemblies and create specific counters for grains found, in addition to the hard coded runtime of counters. I suspect this dynamic counter generation based off of grains is what necessitated the tool.

@galvesribeiro
Copy link
Member Author

AHHH ok. That makes sense @jason-bragg . But in all cases, we could have that code inside the Installer class and avoid that exe IMHO. My point is that Installutil can perform the same thing as well.

@jason-bragg
Copy link
Contributor

Understood.

I think Orleans assemblies need some general way to publish its counters so monitoring integration points requiring upfront initialization (like performance counters, or ETW events, ...) know what metrics to register. This counter publishing mechanism, imo, should be extensible in a way that allows other extension points (providers, serializers, ..) to register their own counters. For instance, the OrleansAzureUtils could register performance counters for its table and blob storage calls. This would allow monitoring integration points to know about metrics in Orleans core as well as third party extensions.

This is a much wider issue than should be addressed in this PR, as the recently added APM/KPI interfaces don't account for systems that require upfront registration, and the existing internal runtime counters don't expose counter data in a general way.

Maybe since we need to replace perf counters for .NetCore port we can reframe the problem as a general metrics publishing problem rather than a monitoring problem?

@galvesribeiro
Copy link
Member Author

galvesribeiro commented Sep 8, 2016

@jason-bragg

This is a much wider issue than should be addressed in this PR, as the recently added APM/KPI interfaces don't account for systems that require upfront registration

I don't agree with that. The telemetry APIs should not care about registration. Each consumer should have its own mechanism to register the metrics/counters on its own.

There is nothing blocking the providers to report metrics. They can use the methods Today, the ugly thing is that metrics are inside LogManager, etc...

Anyway, I think this should be debated in another issue. In fact, one of the things that I plan for 2.0 is a complete revamp on the telemetry, counters, metrics, log, trace, story. In other words... I want to keep us out of those business. We should just provide common set of interfaces and let people plug whatever they want to consume the data pushed from Orleans runtime, providers and even their own data from grains.

Again, I'll open another issue later to discuss that when I start to work in it for 2.0.

Regardless of the future, I still think we can remove that counter control. If not on this PR, in another but before we make a new release. There is no point on have that if the installer class can do the same thing. It is less one thing people need to care about in deployment. I suggest to get it done now since people will start using and will make easier when they read the change in changelog.md.

@jason-bragg
Copy link
Contributor

"I'll open another issue later to discuss that when I start to work in it for 2.0."

Agreed. We can discuss it then.
In the interim, to help explore the problem I failed to articulate (or possibly misunderstand completely), please consider how you'd implement an IMetricTelemetryConsumer that writes to perf counters without knowing what metrics were to be written until runtime when they are being generated?

@galvesribeiro
Copy link
Member Author

The perf counter is a consumer just like any other. In the specific case of the perf counters, there is a requirement to the counter to have been previously created in order to report the values. The counter creation is the reponsibility of the consumer. If it will be created outside runtime, at runtime on first usage, or whatever way they want, this is the consumer implementation. It is also its responsibility to discard any kind of telemetry they don't want. For example, this implementation of IMetricTelemetryConsumer, only care about metrics pushed to previously created counters and discard the other calls.

More open providers like AI, NewRelic, doesn't have any filter at all. It capture what is pushed to it, and push to Azure AI without do any kind of filter. The AI user, will filter that on its dashboard any way they want.

Again, I don't want Orleans to have to deal with this business. The solo purpose of this specific PR is to move Perf Counters code outside the core and there is no better place to leave it today than on its own IMetricTelemetryConsumer. As a bonus, all the stats that were only published to the the PerfCounters, now are published to ALL registered IMetricTelemetryConsumers without caring about their implementation details, filters, registration, it just don't need to care.

Ok, I think we are extending this discussion too much here :) Lets move on. I'm waiting @jdom perform the VSO/Perf tests so we can move forward on .Net Core. Later on, lets discuss that more but thanks for the comments, will think about it for 2.0

case UpdateMode.Increment:
if (value.HasValue)
{
cd.PerfCounter.IncrementBy((long)value.Value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think original code checked if cd was null, then if cd.PerfCounter was null. If cd.PerfCounter was null it would create it, which we don't want to do, but if it is null, we don't want to use it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right... Let me fix that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

<file src="$SRC_DIR$OrleansTelemetryConsumers.Counters\**\*.cs" target="src" />
<file src="OrleansCounterControl.exe" target="tools" />
<file src="OrleansCounterControl.exe.config" target="tools" />
<file src="Orleans.dll" target="tools" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to install the OrleansCounterControl as it was before. It needs to load application grains.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok... More one reason to remove counter control and put that grain load logic inside the installer class and let people use installUtil.exe from within the server deployment directory... That could be even automated from the silo startup...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for using the installer util, instead of maintaining our own. It's just a question of resources. If you want to make that part of this change, great, but if not we need the existing patterns to be preserved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jason-bragg I`ll add it to the installer class tomorrow. Was just waiting to someone from the core-team to aknowledge that. Will push changes tomorrow. Thanks

@galvesribeiro
Copy link
Member Author

image

Ok, done. Now all you need is call installutil OrleansTelemetryConsumers.Counters.dll to install and the same with the /u parameter to uninstall. The grain types are loaded just as with the old counter control. Will update the PR with that change and removing counter control project/nuget.

<package xmlns="http://schemas.microsoft.com/packaging/2010/07/nuspec.xsd">
<metadata>
<id>Microsoft.Orleans.CounterControl</id>
Copy link
Contributor

@jason-bragg jason-bragg Sep 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave counter control in place and continue to support until installUtil is used in production environments. Let's phase out counter control over time. 1.3.0 we support counter control or users can use installUtil. 1.3.1+ish we deprecate counter control.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jason-bragg I'm against that... Why should we keep it? The command line is basically the same. There is no need to deploy that .exe anymore and if you need PerfCounter the dll will be there deployed with server code. The user can still use the old CounterControl nuget to register if they want. We are just not going to push anymore new nugets so why keep the project? I though by your comment here that you agree with me and if I add that to this PR we could remove the old one... Its totally pointless to have it if you leave the old nuget alive...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is based on the assumption that InstallUtil will work in all environments as well as the CounterControl. That it has no unexpected dependencies and/or side effects. That when errors occur they will be clear enough to users. These are all probably true, but we've very large and critical services running on Orleans and 1.3 already includes a fair amount of risk. Continuing to support this assembly until the new approach has been vetted in production environments is simply a risk mitigation strategy that costs us little but can save us and our customers hours (or days) should something go wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jason-bragg OK, I'll not arg anymore... I want to finish this work... But just for the sake of information, there is no difference at all from running a console app and using installutil.exe specially the way I did, which uses exactly same code as the console app and installutil.exe is just calling the Install() method the same way the console does. But ok, will not enter in the debate. I just want this .Net Core blocker to go away. I'm going to revert everything and reapply some of the changes.

@galvesribeiro
Copy link
Member Author

@jason-bragg counter control is back and now you are able to install both ways. Please let me know anything else you want.

@jason-bragg jason-bragg merged commit d816ac2 into dotnet:master Sep 12, 2016
@jason-bragg
Copy link
Contributor

Great work.
Thanks!

@galvesribeiro
Copy link
Member Author

Thank you guys for reviewing that @jason-bragg and @jdom

@galvesribeiro galvesribeiro deleted the perfcounters branch September 12, 2016 23:31
@jdom
Copy link
Member

jdom commented Sep 12, 2016

Thanks dude, this unblocks some of the work I'm currently doing in .net core :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants