Skip to content
This repository has been archived by the owner on May 4, 2022. It is now read-only.

Ceilometer metrics support #83

Closed
wants to merge 14 commits into from
Closed

Conversation

paramite
Copy link
Member

@paramite paramite commented Apr 24, 2020

This patch add ability to listen for metric data messages from Ceilometer
and prepare them for Prometheus scraping.

This patch add ability to listen for metric data messages from Ceilometer
and prepare them for PRometheus scraping.
@pkilambi
Copy link

pkilambi commented May 5, 2020

@paramite Any updates? We need to wrap up testing and get this reviewed and merged asap. We're getting close to beta.

- added missing support for AMQP1Connections to main loop for metrics
- unify shared logic with main loop for events
@paramite paramite force-pushed the mmagr-ceilometer-metrics branch from fdd45f0 to cf831e1 Compare May 5, 2020 12:20
@paramite paramite changed the title [WIP] Ceilometer metrics support Ceilometer metrics support May 5, 2020
@paramite
Copy link
Member Author

paramite commented May 5, 2020

I can't make QDRs work so that overcloud QDRs send metrics to manually spawned QDR on undercloud. I believe that I have relevant qdrouterd.conf setup the same as STF side, so no idea what is wrong (no error except the message send timeout on Ceilometer side). Anyway, this code successfully starts listening on bus channel. I just did not check the message processing is still functional. Considering I did not change anything there, it should still be functional.

@csibbitt
Copy link
Contributor

csibbitt commented May 6, 2020

@paramite If I run this on OCP, and my server QDR address is stf-default-interconnect-5671-service-telemetry.apps.myhost.com:5671; can you show me the config I need to add to the OSP side to point ceilometer at my STF server?

@csibbitt
Copy link
Contributor

csibbitt commented May 6, 2020

A note to other reviewers that porting this code to the new sg2 codebase was agreed to be a future task; for now we need to get this working well, then think about what that will take.

@paramite
Copy link
Member Author

paramite commented May 7, 2020

@csibbitt:

{
        "AMQP1Connections": [
                {"Url": "amqp://stf-default-interconnect-5671-service-telemetry.apps.myhost.com:5671/anycast/ceilometer/metering.sample", "DataSource": "ceilometer"}
        ],
        "Exporterhost": "localhost",
        "Exporterport": 8081,
        "CPUStats": false,
        "DataCount": -1,
        "UseSample": false,
        "UseTimeStamp": true,
        "Debug": true,
        "Prefetch": 102
}

//log.Printf("Cleaned up plugin for %s", collectd.GetKey())
for _, dataInterface := range shard.plugin {
format := saconfig.DataSourceUniversal.String()
switch metric := dataInterface.(type) {
Copy link
Contributor

@pleimer pleimer May 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just add the String() method to the MetricDataFormat interface so that you don't have to type switch just to get the formatted string? This seems like something this type should always implement (I could be wrong)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will do.


//SpawnSignalHandler spawns goroutine which will wait for interruption signal(s)
// and end smart gateway in case any of the signal is received
func SpawnSignalHandler(finish chan bool, watchedSignals ...os.Signal) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this location makes a lot more sense

output := make([]MetricDataFormat, 0)
sanitized := c.sanitize(data)
// parse only relevant data
var json = jsoniter.ConfigCompatibleWithStandardLibrary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice I am going to start using this package for json stuff

Copy link
Contributor

@pleimer pleimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried running it in system but the code looks good

Copy link
Contributor

@csibbitt csibbitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Untested since I don't have a good setup at the moment.

//GetInterval ...
//GetInterval returns hardcoded defaultCeilometerInterval, because Ceilometer metricDesc
//does not contain interval information (are not periodically sent at all) and any reasonable
//interval might be needed for expiry setting for Prometheus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

/*************************** tsdb.TSDB interface *****************************/

//GetLabels ...
func (c *CeilometerMetric) GetLabels() map[string]string {
return make(map[string]string)
labels := make(map[string]string)
if c.TypeInstance != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, this re-labeling logic around plugin and type instances has been entirely removed from the collectd side of things in sg2. We are unclear on it's origin or purpose, but it made writing alarms and dashboards a bit more awkward rather than just using exactly what was in the collectd message. I have no opinion yet on whether it should exists here, since I haven't tried working with the resulting ceilometer metrics yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the naming algorithm seems kinda strange to me to. I just wanted to be consistent witn collectd implementation. IMO if we want to remove it from Ceilometer, we should do this also for collectd. If it was removed from SG2, maybe we should do this as part of this PR prior to GA.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't bother with the renaming in this repo because once we productize SG.next we won't be using that code path any longer (even if the code remains). I don't want to accidentally cause a regression in the GA code if we need to release in the meantime.

I do think it's a good idea to do the refactoring here for the ceilometer metrics part to better align with SG.next since that's the final location for things to land ultimately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we will have one type of metrics with old style names and the other types of metrics with new style? Fine by me, just don't like inconsistencies :).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paramite no, because you should converge to whatever the SG.next is doing. Once SG.next is ready, we will no longer use this code base to handle metrics delivery for collectd. We could just delete that path in the code base.

Alternatively align the naming conventions against the current collectd code base here to keep things consistent, then when porting the ceilometer metrics support into SG.next, change the naming scheme at that point to align.

I just feel like changing anything in the current collectd code path is really just a waste of time since we'll replace that path with SG.next when it available, and we're more likely to just create a regression for the current GA deployments if we need to release another version of this SG.

If we change collectd now, we'll need to also make sure that any dashboard changes align with this next release, and make sure customers upgrade the dashboard when upgrading SG. Seems like a lot of coordination in a product I'd like to avoid.

So either pick the naming used by SG.next to align and not need to change things later (probably my preference?) or align to what is here to keep all the metrics consistent (whether they be collectd or ceilometer) and then re-align the naming when porting to SG.next.

Not really sure which path is the correct one. Any other thoughts?

@leifmadsen leifmadsen added the needs testing Needs to be tested prior to merge label May 12, 2020
paramite added a commit to infrawatch/smart-gateway-operator that referenced this pull request May 19, 2020
AMQP1Connections parameter should be used instead of nonflexible AMQP1Url.

Depends-On: infrawatch/smart-gateway#83
paramite added a commit to infrawatch/service-telemetry-operator that referenced this pull request May 19, 2020
paramite added a commit to infrawatch/service-telemetry-operator that referenced this pull request Jun 29, 2020
paramite added a commit to infrawatch/smart-gateway-operator that referenced this pull request Jun 29, 2020
AMQP1Connections parameter should be used instead of nonflexible AMQP1Url.

Depends-On: infrawatch/smart-gateway#83
@leifmadsen leifmadsen removed do-not-merge needs testing Needs to be tested prior to merge labels Jul 7, 2020
Copy link
Member

@leifmadsen leifmadsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like things pass in CI now using local builds, so this should be good to go!

@leifmadsen
Copy link
Member

Closing this in favour of #86 which works with our CI (branch name alignment). Should be same code as here. Thanks!

@leifmadsen leifmadsen closed this Jul 7, 2020
@leifmadsen leifmadsen deleted the mmagr-ceilometer-metrics branch July 7, 2020 23:20
leifmadsen pushed a commit to infrawatch/smart-gateway-operator that referenced this pull request Jul 10, 2020
AMQP1Connections parameter should be used instead of nonflexible AMQP1Url.

Depends-On: infrawatch/smart-gateway#83
leifmadsen added a commit to infrawatch/service-telemetry-operator that referenced this pull request Jul 10, 2020
* Enable deployment of metric SG for Ceilometer data

Depends-On: infrawatch/smart-gateway#83
Depends-On: infrawatch/smart-gateway-operator#48

* Add smoketest for Ceilometer data

* Listen on correct channel

* Ceilometer smoketest tuning

Makes smoketest_ceilometer_entrypoint.sh being executed during smoketest job.

* Use data source setting for metrics too
	# Please enter the commit message for your changes. Lines starting
	# with '#' will be ignored, and an empty message aborts the commit.
	#
	# On branch mmagr-amqp10connections
	# Changes to be committed:
	#	modified:   roles/servicetelemetry/templates/manifest_smartgateway_metrics.j2
	#

* Finish ceilometer events smoke test

* Do not use hardcoded timestamps

* Finish ceilometer metrics smoketests

* Increase timeout

* Add container names

* Validate also Ceilometer metrics SG

* Update tests/smoketest/smoketest_ceilometer_entrypoint.sh

* Update tests/smoketest/smoketest_collectd_entrypoint.sh

Co-authored-by: Martin Magr <[email protected]>
Co-authored-by: Leif Madsen <[email protected]>
leifmadsen added a commit to infrawatch/smart-gateway-operator that referenced this pull request Jul 13, 2020
* Use new connection configuration for metrics (#48)

AMQP1Connections parameter should be used instead of nonflexible AMQP1Url.

Depends-On: infrawatch/smart-gateway#83

* Bump default CSV to 2.0.0

* Fix file rename

* Lock operator-courier to v2.1.7

Lock operator-courier to 2.1.7 until we can figure out what is wrong with our CSV/CRD setup
or until the operator-courier issue noted in the related issue is resolved.

Related: infrawatch/service-telemetry-operator#108

* Update roles/smartgateway/templates/ceilometer-metrics-configmap.yaml.j2

Co-authored-by: Chris Sibbitt <[email protected]>

Co-authored-by: Martin Mágr <[email protected]>
Co-authored-by: Chris Sibbitt <[email protected]>
leifmadsen added a commit to infrawatch/service-telemetry-operator that referenced this pull request Jul 13, 2020
* Enhance CI automation (#106)

* Add ServiceTelemetry overrides

Allows ServiceTelemetry overrides to be expressed via Ansible extra-vars. Adds the four
main overrides you would expect in a ServiceTelemetry object, with appropriate defaults
set.

Will also allow passing in the service_telemetry_manifest as a whole object like what we
do with the Service Telemetry Operator.

* Allow for per-repo branch overrides

Allow for per-repo branch overrides for the Smart Gateway Operator and Smart Gateway
repositories via sgo_branch and sg_branch (respectively).

* Add functionality around quickstart.sh

Add some functionality that was replaced when I dropped the quickstart.sh. Adds
some of this functionality back in and also adds some new stuff.

* Fix syntax error

* Add back quickstart.sh

Adds back a quickstart.sh that simulates the same result as the old
quickstart.sh

* Better CSV modification support

Also adds some tags to make skipping over builds for testing much
easier.

* Debugging ci.yml firing

* Make sure namespace is set before using it

* Fix syntax and documentation

* Test locally first kids

* Drop CI debug lines

* Clean up working repo clones

* Copy CSV into working directory

On subsequent runs the in-place modification of the CSV can cause issues either
in the development environment, or re-runs of the CI system. Copying the CSV out
of the in-place repo into a working location, and then modifying in-place results
in a cleaner setup.

By doing the copy of the CSV files, we can drop the need to force clone the supporting
repositories.

Also cleans up some shell commands that were commented out now that they are being
dealt with via the replace module. Removes the extra commands added to ci.yml.

* Changes to infrared-openstack.sh for OSP13 (#102)

* Migrate OSP16 script to OSP13 directory

Uses a multi-cloud stf-connectors.yaml style configuration which directly loads
the resource lists rather than a list of environment files. Uses the same script
as used in OSP16 but subs out the network configuration for a vlan type setup and the
latest paths for async puddle.

* Working deployment of OSP13

* Migrate changes to align to existing docs

Update PR to align to existing documentation and testing the group has been working on.
Adjust the stf-connectors.yaml.template to better reflect what we've been testing.

Deployment by default will result in presettle: true which is bad for reliability of
message delivery.

* Get closer alignment to OSP16 setup

* Enable deployment of metric SG for Ceilometer data (#93)

* Enable deployment of metric SG for Ceilometer data

Depends-On: infrawatch/smart-gateway#83
Depends-On: infrawatch/smart-gateway-operator#48

* Add smoketest for Ceilometer data

* Listen on correct channel

* Ceilometer smoketest tuning

Makes smoketest_ceilometer_entrypoint.sh being executed during smoketest job.

* Use data source setting for metrics too
	# Please enter the commit message for your changes. Lines starting
	# with '#' will be ignored, and an empty message aborts the commit.
	#
	# On branch mmagr-amqp10connections
	# Changes to be committed:
	#	modified:   roles/servicetelemetry/templates/manifest_smartgateway_metrics.j2
	#

* Finish ceilometer events smoke test

* Do not use hardcoded timestamps

* Finish ceilometer metrics smoketests

* Increase timeout

* Add container names

* Validate also Ceilometer metrics SG

* Update tests/smoketest/smoketest_ceilometer_entrypoint.sh

* Update tests/smoketest/smoketest_collectd_entrypoint.sh

Co-authored-by: Martin Magr <[email protected]>
Co-authored-by: Leif Madsen <[email protected]>

* Implement CI updates for SG3

* Correct value for SG bridge image path

* Lock operator-courier to 2.1.7

Lock operator-courier to 2.1.7 until we can figure out what is wrong with our CSV/CRD setup
or until the operator-courier issue noted in the related issue is resolved.

Related: #108

* Update build/stf-run-ci/tasks/main.yml

Co-authored-by: Chris Sibbitt <[email protected]>

* Adjust README to match run-ci.yaml methods

Co-authored-by: Martin Mágr <[email protected]>
Co-authored-by: Martin Magr <[email protected]>
Co-authored-by: Chris Sibbitt <[email protected]>
leifmadsen added a commit to infrawatch/smart-gateway-operator that referenced this pull request Jul 13, 2020
* Changes for sg2

* Added new containers images for sg2 and bridge
* Added socket-dir for them to communicate
* Added more mode-specific code to deployment template
* Removed now-unused metrics-configmap

* Use best bridge args from testing

* Updated CSV and container values for new SG

* Updated core image name & scorecard

* fixed "replaced" line in csv

* Adjust template to support ceilometer metrics too

* Update description in CSV (#53)

Update the CSV description so that we could pass upstream Community Operator requirements
should we ever publish there.

* amqpDataSource should be amqp_data_source

* Misplaced conditionals broke volume

* Setup SG3 CI system (#59)

* Use new connection configuration for metrics (#48)

AMQP1Connections parameter should be used instead of nonflexible AMQP1Url.

Depends-On: infrawatch/smart-gateway#83

* Bump default CSV to 2.0.0

* Fix file rename

* Lock operator-courier to v2.1.7

Lock operator-courier to 2.1.7 until we can figure out what is wrong with our CSV/CRD setup
or until the operator-courier issue noted in the related issue is resolved.

Related: infrawatch/service-telemetry-operator#108

* Update roles/smartgateway/templates/ceilometer-metrics-configmap.yaml.j2

Co-authored-by: Chris Sibbitt <[email protected]>

Co-authored-by: Martin Mágr <[email protected]>
Co-authored-by: Chris Sibbitt <[email protected]>

Co-authored-by: Chris Sibbitt <[email protected]>
Co-authored-by: Martin Mágr <[email protected]>
leifmadsen added a commit to infrawatch/service-telemetry-operator that referenced this pull request Jul 13, 2020
* Changes to match new internal SG metrics names

* Changing exported_instance to host

* plugin_instance relabelling no longer being done in sg2

* type becomes type_instance

* Applied relevant changes to the new alerts

* Fixing plugin vs type instance on memory alarm

* Adjusted order of labels in smoketest

* Need longer interval for rates with default 10s scrape interval

* OCP metrics label change from pod_name to pod

* Raise the linkCapacity of QDR->bridge

* Adds a new edgeListener on 5673 with linkCapacity 25000
* Adjusts metrics SG bridge to connect to new listener
* Minimizes presettled metrics loss during throughput testing
  * Should also provide performance improvements in unsettled mode bursts
* Other SG modes can keep using 5672 until we converge the SG code

* New CRD and CSV for v1.0.3

* Updated scorecard paths

* Added amqpDataSource to metrics SG template

* This is so I can test changes here:
infrawatch/smart-gateway-operator#54

* In preparation to merge with:
https://github.com/infrawatch/service-telemetry-operator/pull/93/files#diff-33675527951f20f9727fb4be5c84a746R8

* Putting quickstart.sh back until build/run-ci.yaml is a true replacement

* No ability to deploy published artifacts without building
* No support for ephemeral storage

* Adding back quickstart configs

* Setup SG3 CI system (#107)

* Enhance CI automation (#106)

* Add ServiceTelemetry overrides

Allows ServiceTelemetry overrides to be expressed via Ansible extra-vars. Adds the four
main overrides you would expect in a ServiceTelemetry object, with appropriate defaults
set.

Will also allow passing in the service_telemetry_manifest as a whole object like what we
do with the Service Telemetry Operator.

* Allow for per-repo branch overrides

Allow for per-repo branch overrides for the Smart Gateway Operator and Smart Gateway
repositories via sgo_branch and sg_branch (respectively).

* Add functionality around quickstart.sh

Add some functionality that was replaced when I dropped the quickstart.sh. Adds
some of this functionality back in and also adds some new stuff.

* Fix syntax error

* Add back quickstart.sh

Adds back a quickstart.sh that simulates the same result as the old
quickstart.sh

* Better CSV modification support

Also adds some tags to make skipping over builds for testing much
easier.

* Debugging ci.yml firing

* Make sure namespace is set before using it

* Fix syntax and documentation

* Test locally first kids

* Drop CI debug lines

* Clean up working repo clones

* Copy CSV into working directory

On subsequent runs the in-place modification of the CSV can cause issues either
in the development environment, or re-runs of the CI system. Copying the CSV out
of the in-place repo into a working location, and then modifying in-place results
in a cleaner setup.

By doing the copy of the CSV files, we can drop the need to force clone the supporting
repositories.

Also cleans up some shell commands that were commented out now that they are being
dealt with via the replace module. Removes the extra commands added to ci.yml.

* Changes to infrared-openstack.sh for OSP13 (#102)

* Migrate OSP16 script to OSP13 directory

Uses a multi-cloud stf-connectors.yaml style configuration which directly loads
the resource lists rather than a list of environment files. Uses the same script
as used in OSP16 but subs out the network configuration for a vlan type setup and the
latest paths for async puddle.

* Working deployment of OSP13

* Migrate changes to align to existing docs

Update PR to align to existing documentation and testing the group has been working on.
Adjust the stf-connectors.yaml.template to better reflect what we've been testing.

Deployment by default will result in presettle: true which is bad for reliability of
message delivery.

* Get closer alignment to OSP16 setup

* Enable deployment of metric SG for Ceilometer data (#93)

* Enable deployment of metric SG for Ceilometer data

Depends-On: infrawatch/smart-gateway#83
Depends-On: infrawatch/smart-gateway-operator#48

* Add smoketest for Ceilometer data

* Listen on correct channel

* Ceilometer smoketest tuning

Makes smoketest_ceilometer_entrypoint.sh being executed during smoketest job.

* Use data source setting for metrics too
	# Please enter the commit message for your changes. Lines starting
	# with '#' will be ignored, and an empty message aborts the commit.
	#
	# On branch mmagr-amqp10connections
	# Changes to be committed:
	#	modified:   roles/servicetelemetry/templates/manifest_smartgateway_metrics.j2
	#

* Finish ceilometer events smoke test

* Do not use hardcoded timestamps

* Finish ceilometer metrics smoketests

* Increase timeout

* Add container names

* Validate also Ceilometer metrics SG

* Update tests/smoketest/smoketest_ceilometer_entrypoint.sh

* Update tests/smoketest/smoketest_collectd_entrypoint.sh

Co-authored-by: Martin Magr <[email protected]>
Co-authored-by: Leif Madsen <[email protected]>

* Implement CI updates for SG3

* Correct value for SG bridge image path

* Lock operator-courier to 2.1.7

Lock operator-courier to 2.1.7 until we can figure out what is wrong with our CSV/CRD setup
or until the operator-courier issue noted in the related issue is resolved.

Related: #108

* Update build/stf-run-ci/tasks/main.yml

Co-authored-by: Chris Sibbitt <[email protected]>

* Adjust README to match run-ci.yaml methods

Co-authored-by: Martin Mágr <[email protected]>
Co-authored-by: Martin Magr <[email protected]>
Co-authored-by: Chris Sibbitt <[email protected]>

Co-authored-by: Chris Sibbitt <[email protected]>
Co-authored-by: Martin Mágr <[email protected]>
Co-authored-by: Martin Magr <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

5 participants