Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove lightweight modules in packages #3188

Closed
22 tasks done
kaiyan-sheng opened this issue Apr 25, 2022 · 12 comments
Closed
22 tasks done

Remove lightweight modules in packages #3188

kaiyan-sheng opened this issue Apr 25, 2022 · 12 comments
Labels
meta Team:Cloud-Monitoring Label for the Cloud Monitoring team

Comments

@kaiyan-sheng
Copy link
Contributor

kaiyan-sheng commented Apr 25, 2022

Background

Before Elastic Agent and packages existed we introduced lightweight modules in Metricbeat to simplify the development of new metricsets and packages. The goal was to move to pure config defined modules as we have in Filebeat. Over time we have been making more and more use of it, for example in the AWS package. This is great as it reduces the number of code changes we have to make.

With the introduction of elastic packages, it has become even more important to have packages purely based on configurations because that is what is delivered as part of the package. Unfortunately many modules use lightweight modules but the packages we build also depend on the lightweight modules to be there, meaning the code/config files have to be contributed to the beats repository to work. This makes it easier to keep packages and modules in sync but it also means we keep adding to the beats repository. This issue is caused by running the migration script from Metricbeat to integrations.

Actions

  1. For the old migrated data streams (from Bests) like rds: we should move the configuration part from Beats into integrations.
  2. For all data streams: use ingest pipeline instead of rename processor.
  3. For all new packages, we should add configuration directly into integrations.

Impacted Integrations (wip)

@ruflin
Copy link
Contributor

ruflin commented Apr 26, 2022

++ on moving to a fully package based approach. There is another benefit of this that the data collection part on the edge becomes less complex and needs less changes. This might make it easier that also newer version of Elastic Agent can ship data to older versions of Elasticsearch as the data format shipped stays the same. What matters is only the package that is installed.

On side effect of 2 is that we will use the rename processor a lot more (see https://github.com/elastic/beats/blob/main/x-pack/metricbeat/module/aws/rds/manifest.yml#L79 as an example). Lets make sure to involve the es-perf team ( @DJRickyB ) in this. I'm not too concerned on the metrics front about this change as my understanding is the number of events is not very high compared to logs and we do much heavier processing on the logs side but it is worth investigating. It might be useful to have a bulk rename processor that could also make it easier to write these renames.

@kaiyan-sheng
Copy link
Contributor Author

One problem showed up is: before moving configuration from Beats to Integrations, EC2 metrics data stream generates events with field:

    "metricset": {
        "period": 300000,
        "name": "ec2"
    }

With moving the configuration into integrations, EC2 metrics data stream generates events with field:

    "metricset": {
        "period": 300000,
        "name": "cloudwatch"
    }

This is caused by metricsets field in stream.yml.hbs. It changed from ec2 to cloudwatch after moving the configuration to integrations: https://github.com/elastic/integrations/blob/main/packages/aws/data_stream/ec2_metrics/agent/stream/stream.yml.hbs#L1

Should we add a rename processor to rename metricset.name from cloudwatch to ec2 in this case?

@ruflin
Copy link
Contributor

ruflin commented May 31, 2022

The rename processor is the low hanging fruit here. More fundamental, I wonder if we still need the metricset.name field in the first place. Are we using it for any queries? In general it should be replaced by queries on data_stream.dataset as this is much more efficient.

@kaiyan-sheng
Copy link
Contributor Author

@ruflin Agreed. metricset.name is not used on our side (dashboards). Users should also use data_stream.dataset to identify which service these metrics are from.

@endorama
Copy link
Member

Should we add a removal step in the ingest pipeline to remove metricset.name in integrations?

@kaiyan-sheng
Copy link
Contributor Author

Should we add a removal step in the ingest pipeline to remove metricset.name in integrations?

@endorama Good question! This field does tell us which metricset is it using from Metricbeat. This way, user knows we are using the cloudwatch metricset in the background instead of the ec2 metricset for example. Not sure how useful this is but since this field exist across all integrations, maybe we can keep it for now?

@tommyers-elastic
Copy link
Contributor

@kaiyan-sheng @endorama @zmoog i think everything is done here - shall we close this issue once and for all?

@kaiyan-sheng
Copy link
Contributor Author

@tommyers-elastic I just went though our GCP package and found these data streams are added but not with lightweight module removed. I will move these into the issue description.

  • GCP storage
  • GCP pubsub
  • GCP loadbalancing
  • GCP dataproc

@ChrsMark
Copy link
Member

ChrsMark commented Sep 22, 2022

@kaiyan-sheng shall we communicate this effort with the wider audience of integrations' teams? This would help to advertise the recommended way when it comes to the usage (or non usage :)) of leighweight modules.

We discovered this while implementing/reviewing at #4253 (comment).
cc: @gsantoro

Also I guess the long term plan would be to use input packages (when they are ready). Relevant content: elastic/package-spec#319 (comment) and elastic/package-spec#404

@ruflin
Copy link
Contributor

ruflin commented Sep 22, 2022

++, lets make sure it is document in a guide around how to build integrations.

@kaiyan-sheng
Copy link
Contributor Author

@ruflin +1! I also suggested working on documentation for how to build integrations (especially with lightweight module) for our team for 8.6.0 planning.

@kaiyan-sheng
Copy link
Contributor Author

@tommyers-elastic I think this meta issue is ready to be closed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta Team:Cloud-Monitoring Label for the Cloud Monitoring team
Projects
None yet
Development

No branches or pull requests

5 participants