Skip to content
This repository has been archived by the owner on Jun 25, 2024. It is now read-only.

add the telemetry to the default list of dataplane-operator services #392

Merged

Conversation

jlarriba
Copy link
Contributor

@jlarriba jlarriba commented Sep 6, 2023

No description provided.

@jlarriba
Copy link
Contributor Author

jlarriba commented Sep 6, 2023

This depends on PR openstack-k8s-operators/edpm-ansible#312 to merge.

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/8151a725b11c4c99bf51ebdcda9723d0

✔️ dataplane-operator-docs-preview SUCCESS in 1m 48s
✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 36m 56s
podified-multinode-edpm-deployment-crc FAILURE in 1h 17m 42s
dataplane-operator-crc-podified-edpm-baremetal FAILURE in 56m 43s

@slagle
Copy link
Collaborator

slagle commented Sep 6, 2023

/hold

We are holding dataplane-operator PRs until the CI stabilizes after merging #303. The /hold will be dropped here once all PRs are in to support 303.

@slagle
Copy link
Collaborator

slagle commented Sep 8, 2023

/unhold

rdoproject.org/github-check jobs should be passing now.

Copy link
Collaborator

@slagle slagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+2

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/03dd263239eb4553942872acce19798a

✔️ dataplane-operator-docs-preview SUCCESS in 1m 47s
✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 34m 59s
podified-multinode-edpm-deployment-crc FAILURE in 1h 16m 47s
✔️ dataplane-operator-crc-podified-edpm-baremetal SUCCESS in 1h 02m 31s

@fao89
Copy link
Collaborator

fao89 commented Sep 11, 2023

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/38b3d45e31294dcbb040cb8ccc65ce9f

✔️ dataplane-operator-docs-preview SUCCESS in 1m 48s
✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 34m 30s
podified-multinode-edpm-deployment-crc FAILURE in 1h 17m 09s
✔️ dataplane-operator-crc-podified-edpm-baremetal SUCCESS in 57m 14s

@slagle
Copy link
Collaborator

slagle commented Sep 11, 2023

The CI failure here needs to be fixed, I think it's on the edpm-ansible side.

From:
https://logserver.rdoproject.org/92/392/d745e1167b79e620f901fdeca7d75dc655d66c05/github-check/podified-multinode-edpm-deployment-crc/5e2e43d/controller/ci-framework-data/artifacts/must-gather.local.8574441275466136731/quay-io-openstack-k8s-operators-openstack-must-gather-sha256-070a98749455b097c66a1a25ff827a3c0ee7385f1f0c8bf54f5b15438501f2bd/namespaces/openstack/pods/dataplane-deployment-telemetry-openstack-edpm-kfgz7/logs/dataplane-deployment-telemetry-openstack-edpm-kfgz7.log

[pod/dataplane-deployment-telemetry-openstack-edpm-kfgz7/openstackansibleee] �[0;31mfailed: [edpm-compute-0] (item={'src': '/var/lib/config-data/merged/ceilometer.conf', 'dest': '/var/lib/openstack/config/ceilometer/ceilometer.conf'}) => {"ansible_loop_var": "item", "changed": false, "item": {"dest": "/var/lib/openstack/config/ceilometer/ceilometer.conf", "src": "/var/lib/config-data/merged/ceilometer.conf"}, "msg": "Could not find or access '/var/lib/config-data/merged/ceilometer.conf' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}�[0m

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/5bd7dc26000a4a06b8054cef891e49da

✔️ dataplane-operator-docs-preview SUCCESS in 1m 47s
✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 40m 53s
podified-multinode-edpm-deployment-crc FAILURE in 1h 18m 55s
dataplane-operator-crc-podified-edpm-baremetal FAILURE in 1h 20m 10s

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/4947d2d7daa9431b96b778e69e9d1d53

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 35m 56s
podified-multinode-edpm-deployment-crc FAILURE in 1h 16m 53s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 10m 03s
✔️ dataplane-operator-docs-preview SUCCESS in 2m 00s

@jlarriba
Copy link
Contributor Author

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/caa4f45fea9b43688662baf2b9ec3e29

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 37m 08s
podified-multinode-edpm-deployment-crc FAILURE in 1h 18m 22s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 14m 00s
✔️ dataplane-operator-docs-preview SUCCESS in 1m 54s

@openshift-ci openshift-ci bot added the lgtm label Sep 25, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 25, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fao89, jlarriba

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/01b7e7af94244616a6926669a2e2fad1

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 48m 20s
podified-multinode-edpm-deployment-crc FAILURE in 1h 18m 25s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 13m 19s
✔️ dataplane-operator-docs-preview SUCCESS in 2m 08s

@jlarriba
Copy link
Contributor Author

The error is:

Could not find or access '/var/lib/openstack/config/telemetry/polling.yaml' on the Ansible Controller

While the polling.yaml file is mounted in /var/lib/openstack/configs/telemetry/polling.yaml. Small typo, look at the config vs configs.

I have submitted the appropiate fix to edpm_ansible: https://github.com/openstack-k8s-operators/edpm-ansible/pull/368/files

@jlarriba
Copy link
Contributor Author

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/4b225abc1d8b45b2a938a04a9cdf9a34

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 36m 42s
podified-multinode-edpm-deployment-crc FAILURE in 1h 18m 47s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 11m 48s
✔️ dataplane-operator-docs-preview SUCCESS in 1m 50s

@rabi
Copy link
Contributor

rabi commented Sep 26, 2023

recheck

Looks like openstack-k8s-operators/edpm-ansible#368 has been merged.

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/d0682813f64b406092c91edd240cd616

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 35m 28s
podified-multinode-edpm-deployment-crc FAILURE in 1h 18m 49s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 17m 54s
✔️ dataplane-operator-docs-preview SUCCESS in 1m 52s

@jlarriba
Copy link
Contributor Author

jlarriba commented Sep 26, 2023

Indeed, working on it. The fix: openstack-k8s-operators/edpm-ansible#370

@jlarriba
Copy link
Contributor Author

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/46b5127b19494ab2b378883157247c2e

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 44m 43s
podified-multinode-edpm-deployment-crc FAILURE in 1h 18m 58s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 13m 16s
✔️ dataplane-operator-docs-preview SUCCESS in 1m 48s

@jlarriba
Copy link
Contributor Author

Working on the error

@jlarriba
Copy link
Contributor Author

podman_containers is presenting a strange behaviour when executed in ansibleEE containers. This fails because node_exporter is not being created. According to the logs:

Sep 26 06:13:59 np0004183008 podman[111824]: 2023-09-26 10:13:59.871219738 +0000 UTC m=+0.052653626 container remove 75d88fed233c08224028184bc4872aff498e0afbfdf7f7627c226831875b8bd5 (image=quay.io/prometheus/node-exporter:v1.5.0, name=node_exporter, maintainer=The Prometheus Authors <[email protected]>)
Sep 26 06:13:59 np0004183008 python3[111766]: ansible-containers.podman.podman_container PODMAN-CONTAINER-DEBUG: podman rm -f node_exporter
Sep 26 06:13:59 np0004183008 systemd[3966]: Started podman-111837.scope.
Sep 26 06:13:59 np0004183008 podman[111837]: 2023-09-26 10:13:59.94917157 +0000 UTC m=+0.058407299 container create 375c510283d02c7ccb6e088c5d6563df3d06e083a576b6790e248df00c83d331 (image=quay.io/prometheus/node-exporter:v1.5.0, name=node_exporter, maintainer=The Prometheus Authors <[email protected]>)
Sep 26 06:13:59 np0004183008 podman[111837]: 2023-09-26 10:13:59.926524629 +0000 UTC m=+0.035760348 image pull 0da6a335fe1356545476b749c68f022c897de3a2139e8f0054f6937349ee2b83 quay.io/prometheus/node-exporter:v1.5.0
Sep 26 06:13:59 np0004183008 python3[111766]: ansible-containers.podman.podman_container PODMAN-CONTAINER-DEBUG: podman create --name node_exporter --publish 9100:9100 quay.io/prometheus/node-exporter:v1.5.0

As we can see, it is running a podman rm -f node_exporter and then a creation. When I run the same playbook using standalone ansible, the podman rm does not happen and the container is happily created.

As we are in the verge of DP2, I have submitted a PR (openstack-k8s-operators/edpm-ansible#372) to edpm_ansible to replace the usage of podman_containers and use command instead until we can figure out what configuration in ansibleEE is making this behave differently.

@jlarriba
Copy link
Contributor Author

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/3cfbc8b733a24c0f8a1b3194449933a1

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 04m 53s
podified-multinode-edpm-deployment-crc FAILURE in 1h 21m 43s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 19m 20s
✔️ dataplane-operator-docs-preview SUCCESS in 1m 47s

@jlarriba
Copy link
Contributor Author

This is still broken. Now, with the new code to just create node_exporter, the log says:

[pod/dataplane-deployment-telemetry-openstack-edpm-qdpp2/openstackansibleee] TASK [osp.edpm.edpm_telemetry : deploy node_exporter container] ****************
[pod/dataplane-deployment-telemetry-openstack-edpm-qdpp2/openstackansibleee] Wednesday 27 September 2023  11:30:27 +0000 (0:00:01.376)       0:00:26.382 *** 
[pod/dataplane-deployment-telemetry-openstack-edpm-qdpp2/openstackansibleee] �[0;31mfatal: [edpm-compute-0]: FAILED! => {"changed": true, "cmd": ["podman", "create", "--name", "node_exporter", "--publish", "9100:9100", "quay.io/prometheus/node-exporter:v1.5.0"], "delta": "0:00:00.090099", "end": "2023-09-27 11:30:28.070138", "msg": "non-zero return code", "rc": 125, "start": "2023-09-27 11:30:27.980039", "stderr": "Error: creating container storage: the container name \"node_exporter\" is already in use by 99cf83813c820394b589c87530fa30aa577c9fd2b11849021a859abb4a62fd49. You have to remove that container to be able to reuse that name: that name is already in use", "stderr_lines": ["Error: creating container storage: the container name \"node_exporter\" is already in use by 99cf83813c820394b589c87530fa30aa577c9fd2b11849021a859abb4a62fd49. You have to remove that container to be able to reuse that name: that name is already in use"], "stdout": "", "stdout_lines": []}�[0m

How is it possible that node_exporter container is already running if we have not deployed it before? Is the test also testing ansible idempotence?

However, I created a new PR on edpm_ansible to achieve the playbook idempotence: openstack-k8s-operators/edpm-ansible#377

@jlarriba
Copy link
Contributor Author

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/5566923735714adda98a99c5c8094185

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 35m 01s
podified-multinode-edpm-deployment-crc FAILURE in 1h 16m 35s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 16m 27s
✔️ dataplane-operator-docs-preview SUCCESS in 1m 45s

@jlarriba
Copy link
Contributor Author

recheck

@openshift-merge-robot openshift-merge-robot merged commit d032d41 into openstack-k8s-operators:main Sep 28, 2023
2 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants