Collect metrics around OSIO build pipelines #2778

krishnapaparaju · 2018-03-27T08:56:01Z

We would need to collect these metrics and feed to Zabbix. These metrics would help to understand operational side of things with clear visibility into quality of OSIO user experience.

Manual initiated from OSIO pipelines screens (& check if build got started with success)
Number of times, 'view log' been clicked (& check if build log show up with success)
Able to determine if a build log shows success / failure. If Failure, store the contents of the failures
An instance is to be idled, is actually getting idled ?
An instance is to be unidled, is actually getting unidled ?
How many times web hooks are being received, store the source for these webhooks (& check if build / required action got started with success)

lordofthejars · 2018-03-27T10:21:44Z

The idea of this comment is to enumerate metrics, put my comments on them, and also ask questions about when and how should this be done. I will notice that my concerns are purely from the point of view of QA so I can be missing some part of the importance of why these metrics are important, but I prefer to just ask it or question them instead of saying nothing.

a) Manual initiated from OSIO pipelines screens

For this metric the only thing that I see that can be useful is in case of knowing the preferences of the user and for example to know if they rerun builds manually instead of webhook, but in terms of QA I don’t see now the benefits.

I am not sure if Jenkins provides this data.

b) Number of times, 'view log' been clicked

What is the important thing of knowing if user has clicked to see the log of the build or not? Most of the time you click on view the log not because there is a failure but because you want to know how the build is going on and what it is doing/in which stage it is.

I am not sure if Jenkins provides this data.

c) Able to determine if a build log shows success / failure. If Failure, store the contents of the failures

You can get this result from build result, it is not necessary to check anything in log. This seems to be a good metric but at the end the build can fail because we have added a regression from Build team or because the project is failing because it contains a flaky test, so at the end what we are having here is something like a flaky metric since we can see that something is failing constantly, then take a look what’s happening and see that it is something not related to build team, which means time to analyze each and every failure detected by the metrics.

I think this data can be retrieved using Rest API of Jenkins.

d) an instance is to be idled, is actually getting idled ?

This is something that might be interesting since I have find some delays and so on. What I don’t know if this is provided by OSO or fabric8 or something developed by us, so I will need some background about it.

e) an instance is to be unriddled, is actually getting unriddled ?

Same as before

f) How many times web hooks are being received, store the source for these webhooks

Exactly the same as point a) In this case it can be used to do some statistics but in terms of QA I don’t see much benefits.

krishnapaparaju · 2018-03-27T10:53:17Z

@lordofthejars +1 not all these metrics would come from Jenkins , will need to figure out ways / add new components as required to collect these important metrics (less to do with collecting numbers, more around flagging failures)

pradeepto · 2018-03-27T11:40:29Z

Duplicate of #2245

cc @krishnapaparaju @lordofthejars

lordofthejars · 2018-03-27T13:50:28Z

I have started a quick call with @jaseemabid and our first question was should we provide support for publishing data to Zabbix or to Prometheus. The question is important since Zabbix uses a push model where we are sending data to there, meanwhile in Prometheus is a pull model where we need to provide an http endpoint so Prometheus can get data.

The other thing that it is not covering metrics requested here but provides some metrics regarding master-slaves is https://wiki.jenkins.io/display/JENKINS/Metrics+Plugin so this is something we can also take into consideration.

fabric8io/fabric8-build-team#24

joshuawilson · 2018-03-28T03:10:25Z

@aslakknutsen how much of the metrics you have been working on can help here?

lordofthejars · 2018-03-28T10:20:13Z

I have been talking with Aslak about monitoring since he has worked a lot in these scene and we have arrived at some conclusions which affect how to implement this issue.

First of all is that we need to communicate to Promotheus, not Zabbix because OSIO already provides an integration with that and we only need to take care of providing the required endpoints, the rest is done by infrastructure.

Then about what is the best way to proceed, we agreed that for now the best approach will just forget about Jenkins, and let's focus on easiest things which are Jenkins Idler (fabric8-services/fabric8-jenkins-idler#168) and Jenkins Proxy.

So the task or initial task might be "enable promotheus endpoint in your service and collect what ever metrics we need and enable pcp in the service"

"Enable PCP only will give you all cpu/memory/network etc as standard metrics. Expose a Prometheus endpoint, e.g. https://github.com/fabric8-services/fabric8-wit/blob/master/main.go#L421 to collect basic go runtime data like heaps/memory/gc etc. Then using the same client lib you can track your own metrics where ever you see fit in the code e.g. https://github.com/fabric8-services/fabric8-wit/tree/master/metric"

lordofthejars · 2018-03-28T10:27:03Z

So what I will split this task into next subtasks:

Enable PCP on Jenkins Idle with standard metrics (cpu/memory/...)
Enable PCP on Jenkins Proxy with standard metrics
Collect custom metrics for Jenkins Idle
Collect custom metrics for Jenkins Proxy
Validate if it is enough or we go one step further to Jenkins service.

aslakknutsen · 2018-03-28T11:32:17Z

First of all is that we need to sync to Promotheus

Technically you don't sync to anything. You expose the Promotheus format endpoint. The rest is taken care of

aslakknutsen · 2018-03-28T11:39:37Z

d, e, f you can get easy via Jenkins Idler/Proxy and normal OSD PCP monitoring route

a, b might be exposed via woopra telemetry already, check with @qodfathr for access and @joshuawilson for tracking the events if they are not.

c from Jenkins/OSO is an unknown route atm but in progress. Previous attempts were not possible to do due to resource issues on the clusters, but might be possible now that Idler is active. Check with @fche for the 'real Prometheus OSO' route for tenants (last I checked it was not ready, only cluster level stuff was tracked). And check with @kbsingh and @fche if 'old idea' would be a viable option until 'real' solution is ready. Alternativly you can avoid all that and track that metrics via Jenkins Idler as well as it should be getting all build events from the cluster.

lordofthejars · 2018-05-30T12:37:14Z

Currently the original metrics are all of them exposed (proxy and idler metrics) Then there is this issue fabric8-jenkins/jenkins-openshift-base#23 which don't know exactly how to proceed which is about adding prometheus plugin in Jenkins tenant. So maybe this issue could be closed depending on what we decide with fabric8-jenkins/jenkins-openshift-base#23

joshuawilson · 2018-05-30T20:09:20Z

If you want to track user telemetry via Woopra, you should talk to @rahulm0101.

krishnapaparaju added team/build-cd type/feature-request labels Mar 27, 2018

This was referenced Mar 27, 2018

Sprint plan for OSIO Build: # 147 #2779

Closed

Integrate with Zabbix fabric8-services/fabric8-jenkins-proxy#167

Open

lordofthejars mentioned this issue Apr 4, 2018

Issue #182 Add promotheus /metrics handler fabric8-services/fabric8-jenkins-idler#190

Closed

lordofthejars mentioned this issue Apr 23, 2018

Create Prometheus metrics for accessing counter fabric8-services/fabric8-jenkins-proxy#231

Closed

xcoulon assigned lordofthejars Apr 23, 2018

lordofthejars mentioned this issue Apr 23, 2018

Enable Prometheus Plugin on Jenkins fabric8-jenkins/jenkins-openshift-base#23

Open

MatousJobanek mentioned this issue Apr 25, 2018

Expose basic go runtime data to Zabbix arquillian/ike-prow-plugins#99

Open

joshuawilson added area/telemetry Woopra and Segment work area/metrics labels May 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collect metrics around OSIO build pipelines #2778

Collect metrics around OSIO build pipelines #2778

krishnapaparaju commented Mar 27, 2018 •

edited

Loading

lordofthejars commented Mar 27, 2018 •

edited by pradeepto

Loading

krishnapaparaju commented Mar 27, 2018 •

edited

Loading

pradeepto commented Mar 27, 2018 •

edited

Loading

lordofthejars commented Mar 27, 2018 •

edited

Loading

joshuawilson commented Mar 28, 2018

lordofthejars commented Mar 28, 2018 •

edited

Loading

lordofthejars commented Mar 28, 2018

aslakknutsen commented Mar 28, 2018

aslakknutsen commented Mar 28, 2018

lordofthejars commented May 30, 2018

joshuawilson commented May 30, 2018

Collect metrics around OSIO build pipelines #2778

Collect metrics around OSIO build pipelines #2778

Comments

krishnapaparaju commented Mar 27, 2018 • edited Loading

lordofthejars commented Mar 27, 2018 • edited by pradeepto Loading

krishnapaparaju commented Mar 27, 2018 • edited Loading

pradeepto commented Mar 27, 2018 • edited Loading

lordofthejars commented Mar 27, 2018 • edited Loading

joshuawilson commented Mar 28, 2018

lordofthejars commented Mar 28, 2018 • edited Loading

lordofthejars commented Mar 28, 2018

aslakknutsen commented Mar 28, 2018

aslakknutsen commented Mar 28, 2018

lordofthejars commented May 30, 2018

joshuawilson commented May 30, 2018

krishnapaparaju commented Mar 27, 2018 •

edited

Loading

lordofthejars commented Mar 27, 2018 •

edited by pradeepto

Loading

krishnapaparaju commented Mar 27, 2018 •

edited

Loading

pradeepto commented Mar 27, 2018 •

edited

Loading

lordofthejars commented Mar 27, 2018 •

edited

Loading

lordofthejars commented Mar 28, 2018 •

edited

Loading