-
Notifications
You must be signed in to change notification settings - Fork 12
Conversation
This patch add ability to listen for metric data messages from Ceilometer and prepare them for PRometheus scraping.
@paramite Any updates? We need to wrap up testing and get this reviewed and merged asap. We're getting close to beta. |
- added missing support for AMQP1Connections to main loop for metrics - unify shared logic with main loop for events
fdd45f0
to
cf831e1
Compare
I can't make QDRs work so that overcloud QDRs send metrics to manually spawned QDR on undercloud. I believe that I have relevant qdrouterd.conf setup the same as STF side, so no idea what is wrong (no error except the message send timeout on Ceilometer side). Anyway, this code successfully starts listening on bus channel. I just did not check the message processing is still functional. Considering I did not change anything there, it should still be functional. |
@paramite If I run this on OCP, and my server QDR address is |
A note to other reviewers that porting this code to the new sg2 codebase was agreed to be a future task; for now we need to get this working well, then think about what that will take. |
|
//log.Printf("Cleaned up plugin for %s", collectd.GetKey()) | ||
for _, dataInterface := range shard.plugin { | ||
format := saconfig.DataSourceUniversal.String() | ||
switch metric := dataInterface.(type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just add the String() method to the MetricDataFormat interface so that you don't have to type switch just to get the formatted string? This seems like something this type should always implement (I could be wrong)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will do.
|
||
//SpawnSignalHandler spawns goroutine which will wait for interruption signal(s) | ||
// and end smart gateway in case any of the signal is received | ||
func SpawnSignalHandler(finish chan bool, watchedSignals ...os.Signal) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this location makes a lot more sense
output := make([]MetricDataFormat, 0) | ||
sanitized := c.sanitize(data) | ||
// parse only relevant data | ||
var json = jsoniter.ConfigCompatibleWithStandardLibrary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice I am going to start using this package for json stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tried running it in system but the code looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Untested since I don't have a good setup at the moment.
//GetInterval ... | ||
//GetInterval returns hardcoded defaultCeilometerInterval, because Ceilometer metricDesc | ||
//does not contain interval information (are not periodically sent at all) and any reasonable | ||
//interval might be needed for expiry setting for Prometheus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
|
||
/*************************** tsdb.TSDB interface *****************************/ | ||
|
||
//GetLabels ... | ||
func (c *CeilometerMetric) GetLabels() map[string]string { | ||
return make(map[string]string) | ||
labels := make(map[string]string) | ||
if c.TypeInstance != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, this re-labeling logic around plugin and type instances has been entirely removed from the collectd side of things in sg2. We are unclear on it's origin or purpose, but it made writing alarms and dashboards a bit more awkward rather than just using exactly what was in the collectd message. I have no opinion yet on whether it should exists here, since I haven't tried working with the resulting ceilometer metrics yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the naming algorithm seems kinda strange to me to. I just wanted to be consistent witn collectd implementation. IMO if we want to remove it from Ceilometer, we should do this also for collectd. If it was removed from SG2, maybe we should do this as part of this PR prior to GA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't bother with the renaming in this repo because once we productize SG.next we won't be using that code path any longer (even if the code remains). I don't want to accidentally cause a regression in the GA code if we need to release in the meantime.
I do think it's a good idea to do the refactoring here for the ceilometer metrics part to better align with SG.next since that's the final location for things to land ultimately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we will have one type of metrics with old style names and the other types of metrics with new style? Fine by me, just don't like inconsistencies :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paramite no, because you should converge to whatever the SG.next is doing. Once SG.next is ready, we will no longer use this code base to handle metrics delivery for collectd. We could just delete that path in the code base.
Alternatively align the naming conventions against the current collectd code base here to keep things consistent, then when porting the ceilometer metrics support into SG.next, change the naming scheme at that point to align.
I just feel like changing anything in the current collectd code path is really just a waste of time since we'll replace that path with SG.next when it available, and we're more likely to just create a regression for the current GA deployments if we need to release another version of this SG.
If we change collectd now, we'll need to also make sure that any dashboard changes align with this next release, and make sure customers upgrade the dashboard when upgrading SG. Seems like a lot of coordination in a product I'd like to avoid.
So either pick the naming used by SG.next to align and not need to change things later (probably my preference?) or align to what is here to keep all the metrics consistent (whether they be collectd or ceilometer) and then re-align the naming when porting to SG.next.
Not really sure which path is the correct one. Any other thoughts?
AMQP1Connections parameter should be used instead of nonflexible AMQP1Url. Depends-On: infrawatch/smart-gateway#83
AMQP1Connections parameter should be used instead of nonflexible AMQP1Url. Depends-On: infrawatch/smart-gateway#83
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like things pass in CI now using local builds, so this should be good to go!
Closing this in favour of #86 which works with our CI (branch name alignment). Should be same code as here. Thanks! |
AMQP1Connections parameter should be used instead of nonflexible AMQP1Url. Depends-On: infrawatch/smart-gateway#83
* Enable deployment of metric SG for Ceilometer data Depends-On: infrawatch/smart-gateway#83 Depends-On: infrawatch/smart-gateway-operator#48 * Add smoketest for Ceilometer data * Listen on correct channel * Ceilometer smoketest tuning Makes smoketest_ceilometer_entrypoint.sh being executed during smoketest job. * Use data source setting for metrics too # Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # # On branch mmagr-amqp10connections # Changes to be committed: # modified: roles/servicetelemetry/templates/manifest_smartgateway_metrics.j2 # * Finish ceilometer events smoke test * Do not use hardcoded timestamps * Finish ceilometer metrics smoketests * Increase timeout * Add container names * Validate also Ceilometer metrics SG * Update tests/smoketest/smoketest_ceilometer_entrypoint.sh * Update tests/smoketest/smoketest_collectd_entrypoint.sh Co-authored-by: Martin Magr <[email protected]> Co-authored-by: Leif Madsen <[email protected]>
* Use new connection configuration for metrics (#48) AMQP1Connections parameter should be used instead of nonflexible AMQP1Url. Depends-On: infrawatch/smart-gateway#83 * Bump default CSV to 2.0.0 * Fix file rename * Lock operator-courier to v2.1.7 Lock operator-courier to 2.1.7 until we can figure out what is wrong with our CSV/CRD setup or until the operator-courier issue noted in the related issue is resolved. Related: infrawatch/service-telemetry-operator#108 * Update roles/smartgateway/templates/ceilometer-metrics-configmap.yaml.j2 Co-authored-by: Chris Sibbitt <[email protected]> Co-authored-by: Martin Mágr <[email protected]> Co-authored-by: Chris Sibbitt <[email protected]>
* Enhance CI automation (#106) * Add ServiceTelemetry overrides Allows ServiceTelemetry overrides to be expressed via Ansible extra-vars. Adds the four main overrides you would expect in a ServiceTelemetry object, with appropriate defaults set. Will also allow passing in the service_telemetry_manifest as a whole object like what we do with the Service Telemetry Operator. * Allow for per-repo branch overrides Allow for per-repo branch overrides for the Smart Gateway Operator and Smart Gateway repositories via sgo_branch and sg_branch (respectively). * Add functionality around quickstart.sh Add some functionality that was replaced when I dropped the quickstart.sh. Adds some of this functionality back in and also adds some new stuff. * Fix syntax error * Add back quickstart.sh Adds back a quickstart.sh that simulates the same result as the old quickstart.sh * Better CSV modification support Also adds some tags to make skipping over builds for testing much easier. * Debugging ci.yml firing * Make sure namespace is set before using it * Fix syntax and documentation * Test locally first kids * Drop CI debug lines * Clean up working repo clones * Copy CSV into working directory On subsequent runs the in-place modification of the CSV can cause issues either in the development environment, or re-runs of the CI system. Copying the CSV out of the in-place repo into a working location, and then modifying in-place results in a cleaner setup. By doing the copy of the CSV files, we can drop the need to force clone the supporting repositories. Also cleans up some shell commands that were commented out now that they are being dealt with via the replace module. Removes the extra commands added to ci.yml. * Changes to infrared-openstack.sh for OSP13 (#102) * Migrate OSP16 script to OSP13 directory Uses a multi-cloud stf-connectors.yaml style configuration which directly loads the resource lists rather than a list of environment files. Uses the same script as used in OSP16 but subs out the network configuration for a vlan type setup and the latest paths for async puddle. * Working deployment of OSP13 * Migrate changes to align to existing docs Update PR to align to existing documentation and testing the group has been working on. Adjust the stf-connectors.yaml.template to better reflect what we've been testing. Deployment by default will result in presettle: true which is bad for reliability of message delivery. * Get closer alignment to OSP16 setup * Enable deployment of metric SG for Ceilometer data (#93) * Enable deployment of metric SG for Ceilometer data Depends-On: infrawatch/smart-gateway#83 Depends-On: infrawatch/smart-gateway-operator#48 * Add smoketest for Ceilometer data * Listen on correct channel * Ceilometer smoketest tuning Makes smoketest_ceilometer_entrypoint.sh being executed during smoketest job. * Use data source setting for metrics too # Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # # On branch mmagr-amqp10connections # Changes to be committed: # modified: roles/servicetelemetry/templates/manifest_smartgateway_metrics.j2 # * Finish ceilometer events smoke test * Do not use hardcoded timestamps * Finish ceilometer metrics smoketests * Increase timeout * Add container names * Validate also Ceilometer metrics SG * Update tests/smoketest/smoketest_ceilometer_entrypoint.sh * Update tests/smoketest/smoketest_collectd_entrypoint.sh Co-authored-by: Martin Magr <[email protected]> Co-authored-by: Leif Madsen <[email protected]> * Implement CI updates for SG3 * Correct value for SG bridge image path * Lock operator-courier to 2.1.7 Lock operator-courier to 2.1.7 until we can figure out what is wrong with our CSV/CRD setup or until the operator-courier issue noted in the related issue is resolved. Related: #108 * Update build/stf-run-ci/tasks/main.yml Co-authored-by: Chris Sibbitt <[email protected]> * Adjust README to match run-ci.yaml methods Co-authored-by: Martin Mágr <[email protected]> Co-authored-by: Martin Magr <[email protected]> Co-authored-by: Chris Sibbitt <[email protected]>
* Changes for sg2 * Added new containers images for sg2 and bridge * Added socket-dir for them to communicate * Added more mode-specific code to deployment template * Removed now-unused metrics-configmap * Use best bridge args from testing * Updated CSV and container values for new SG * Updated core image name & scorecard * fixed "replaced" line in csv * Adjust template to support ceilometer metrics too * Update description in CSV (#53) Update the CSV description so that we could pass upstream Community Operator requirements should we ever publish there. * amqpDataSource should be amqp_data_source * Misplaced conditionals broke volume * Setup SG3 CI system (#59) * Use new connection configuration for metrics (#48) AMQP1Connections parameter should be used instead of nonflexible AMQP1Url. Depends-On: infrawatch/smart-gateway#83 * Bump default CSV to 2.0.0 * Fix file rename * Lock operator-courier to v2.1.7 Lock operator-courier to 2.1.7 until we can figure out what is wrong with our CSV/CRD setup or until the operator-courier issue noted in the related issue is resolved. Related: infrawatch/service-telemetry-operator#108 * Update roles/smartgateway/templates/ceilometer-metrics-configmap.yaml.j2 Co-authored-by: Chris Sibbitt <[email protected]> Co-authored-by: Martin Mágr <[email protected]> Co-authored-by: Chris Sibbitt <[email protected]> Co-authored-by: Chris Sibbitt <[email protected]> Co-authored-by: Martin Mágr <[email protected]>
* Changes to match new internal SG metrics names * Changing exported_instance to host * plugin_instance relabelling no longer being done in sg2 * type becomes type_instance * Applied relevant changes to the new alerts * Fixing plugin vs type instance on memory alarm * Adjusted order of labels in smoketest * Need longer interval for rates with default 10s scrape interval * OCP metrics label change from pod_name to pod * Raise the linkCapacity of QDR->bridge * Adds a new edgeListener on 5673 with linkCapacity 25000 * Adjusts metrics SG bridge to connect to new listener * Minimizes presettled metrics loss during throughput testing * Should also provide performance improvements in unsettled mode bursts * Other SG modes can keep using 5672 until we converge the SG code * New CRD and CSV for v1.0.3 * Updated scorecard paths * Added amqpDataSource to metrics SG template * This is so I can test changes here: infrawatch/smart-gateway-operator#54 * In preparation to merge with: https://github.com/infrawatch/service-telemetry-operator/pull/93/files#diff-33675527951f20f9727fb4be5c84a746R8 * Putting quickstart.sh back until build/run-ci.yaml is a true replacement * No ability to deploy published artifacts without building * No support for ephemeral storage * Adding back quickstart configs * Setup SG3 CI system (#107) * Enhance CI automation (#106) * Add ServiceTelemetry overrides Allows ServiceTelemetry overrides to be expressed via Ansible extra-vars. Adds the four main overrides you would expect in a ServiceTelemetry object, with appropriate defaults set. Will also allow passing in the service_telemetry_manifest as a whole object like what we do with the Service Telemetry Operator. * Allow for per-repo branch overrides Allow for per-repo branch overrides for the Smart Gateway Operator and Smart Gateway repositories via sgo_branch and sg_branch (respectively). * Add functionality around quickstart.sh Add some functionality that was replaced when I dropped the quickstart.sh. Adds some of this functionality back in and also adds some new stuff. * Fix syntax error * Add back quickstart.sh Adds back a quickstart.sh that simulates the same result as the old quickstart.sh * Better CSV modification support Also adds some tags to make skipping over builds for testing much easier. * Debugging ci.yml firing * Make sure namespace is set before using it * Fix syntax and documentation * Test locally first kids * Drop CI debug lines * Clean up working repo clones * Copy CSV into working directory On subsequent runs the in-place modification of the CSV can cause issues either in the development environment, or re-runs of the CI system. Copying the CSV out of the in-place repo into a working location, and then modifying in-place results in a cleaner setup. By doing the copy of the CSV files, we can drop the need to force clone the supporting repositories. Also cleans up some shell commands that were commented out now that they are being dealt with via the replace module. Removes the extra commands added to ci.yml. * Changes to infrared-openstack.sh for OSP13 (#102) * Migrate OSP16 script to OSP13 directory Uses a multi-cloud stf-connectors.yaml style configuration which directly loads the resource lists rather than a list of environment files. Uses the same script as used in OSP16 but subs out the network configuration for a vlan type setup and the latest paths for async puddle. * Working deployment of OSP13 * Migrate changes to align to existing docs Update PR to align to existing documentation and testing the group has been working on. Adjust the stf-connectors.yaml.template to better reflect what we've been testing. Deployment by default will result in presettle: true which is bad for reliability of message delivery. * Get closer alignment to OSP16 setup * Enable deployment of metric SG for Ceilometer data (#93) * Enable deployment of metric SG for Ceilometer data Depends-On: infrawatch/smart-gateway#83 Depends-On: infrawatch/smart-gateway-operator#48 * Add smoketest for Ceilometer data * Listen on correct channel * Ceilometer smoketest tuning Makes smoketest_ceilometer_entrypoint.sh being executed during smoketest job. * Use data source setting for metrics too # Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # # On branch mmagr-amqp10connections # Changes to be committed: # modified: roles/servicetelemetry/templates/manifest_smartgateway_metrics.j2 # * Finish ceilometer events smoke test * Do not use hardcoded timestamps * Finish ceilometer metrics smoketests * Increase timeout * Add container names * Validate also Ceilometer metrics SG * Update tests/smoketest/smoketest_ceilometer_entrypoint.sh * Update tests/smoketest/smoketest_collectd_entrypoint.sh Co-authored-by: Martin Magr <[email protected]> Co-authored-by: Leif Madsen <[email protected]> * Implement CI updates for SG3 * Correct value for SG bridge image path * Lock operator-courier to 2.1.7 Lock operator-courier to 2.1.7 until we can figure out what is wrong with our CSV/CRD setup or until the operator-courier issue noted in the related issue is resolved. Related: #108 * Update build/stf-run-ci/tasks/main.yml Co-authored-by: Chris Sibbitt <[email protected]> * Adjust README to match run-ci.yaml methods Co-authored-by: Martin Mágr <[email protected]> Co-authored-by: Martin Magr <[email protected]> Co-authored-by: Chris Sibbitt <[email protected]> Co-authored-by: Chris Sibbitt <[email protected]> Co-authored-by: Martin Mágr <[email protected]> Co-authored-by: Martin Magr <[email protected]>
This patch add ability to listen for metric data messages from Ceilometer
and prepare them for Prometheus scraping.