Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable synthetic source for Elastic-Agent datastreams #9826

Closed
wants to merge 4 commits into from

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented May 9, 2024

Proposed commit message

This commit enables the synthetic source for all Elastic-Agent datastreams except for cloudbeat_logs and cloud_defend_logs that are not compatible with synthetic source because they define a decision_id field as text.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Author's Checklist

  • Update format-version to 3.1.4 in a separated PR

How to test this PR locally

  1. Build the integration and bring up a stack with it
  2. Go to Fleet, there should be no error/warning from Kibana.
  3. Go to DevTools
  4. Run the following query: GET _component_template/logs-elastic_agent.*@package
  5. Ensure all component templates contain:
             "_source": {
               "mode": "synthetic"
             },
    

## Related issues
## Screenshots

This commit enables the synthetic source for all Elastic-Agent
datastreams except for `cloudbeat_logs` and `cloud_defend_logs` that
are not compatible with synthetic source because they define a
`decision_id` field as text.
@belimawr belimawr force-pushed the synthetic-source branch from b6495ea to 1c39f94 Compare May 9, 2024 16:37
description: Collect logs and metrics from Elastic Agents.
type: integration
format_version: 1.0.0
license: basic
categories: ["elastic_stack"]
conditions:
kibana:
version: "^8.11.2"
version: "^8.15.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to be careful not to publish this too far ahead of the release of 8.15.0? 8.14.0 isn't even out yet. Presumably we couldn't bump the minor version if we need to add features for another reason if this goes first?

@jsoriano what's the best practice here? Do we need to mark this as a prerelease version?

Copy link
Member

@jsoriano jsoriano May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation to use synthetic source in this package?

So far synthetic source is only GA for TSDB indexes. APM has been using at least since 8.7.0 on metrics data streams. Around this version also Fleet starting enabling it by default on all TSDB indexes.
But I don't think we are using it for logs data streams anywhere, so it can be a bit risky to enable it here unless there are strong reasons.

If we want to enable it in any case here, I think we should release it as prerelease, adding a prerelease tag to the version, maybe even bumping to a new major, so setting a version like 2.0.0-beta1. This would also allow to keep 1.x for backports to GA versions till synthetic source is GA, or at least more tested, in logs data streams.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation to use synthetic source in this package?

Significantly reducing the storage footprint of the agent monitoring logs data. This is the storage cost for an agent doing nothing useful, merely existing.

Agree it needs to be pre-release, since synthetic source on logs indices is still technical preview.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the PM for synthetic source, this is a good candidate for using synthetic _source prior to GA, given that these are internal logs, we do not maintain a contract for them, and we are not blocked by any synthetic source limitations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense to try this first in an internal feature. Is there a target version to release this feature as GA for all indexes?

@strawgate
Copy link
Contributor

it was pointed out to me that synthetic source for logs indices is in technical preview. I'll chat with the ES team and see how bad of an idea this is from their perspective

Copy link

@belimawr belimawr marked this pull request as ready for review May 9, 2024 19:18
@belimawr belimawr requested a review from a team as a code owner May 9, 2024 19:18
Comment on lines +8 to +9
_source:
mode: synthetic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this syntax is not valid in recent versions of the spec. Please use the following syntax, Fleet is aware of the source_mode setting and can handle it in a safer way.

Suggested change
_source:
mode: synthetic
source_mode: synthetic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, it worked well on all my manual testing. But I'll update and test again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this enables synthetic mode, but Fleet may not be aware of it. There is some logic in Fleet to allow to override this option by the user, not sure if this works with this syntax.
This behavior is described here: elastic/kibana#141211 (comment)

Older versions of the spec, like 1.0.0, were much less restrictive, and allowed things that could lead to problematic or ambiguous situations. In v2 and v3 we have been introducing restrictions to avoid these situations. For example restrict now what can be configured for index templates in the manifest.

You can find here some help for some common issues when upgrading packages to v2 or v3: https://github.com/elastic/elastic-package/blob/main/docs/howto/update_major_package_spec.md

description: Collect logs and metrics from Elastic Agents.
type: integration
format_version: 1.0.0
license: basic
categories: ["elastic_stack"]
conditions:
kibana:
version: "^8.11.2"
version: "^8.15.0"
Copy link
Member

@jsoriano jsoriano May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation to use synthetic source in this package?

So far synthetic source is only GA for TSDB indexes. APM has been using at least since 8.7.0 on metrics data streams. Around this version also Fleet starting enabling it by default on all TSDB indexes.
But I don't think we are using it for logs data streams anywhere, so it can be a bit risky to enable it here unless there are strong reasons.

If we want to enable it in any case here, I think we should release it as prerelease, adding a prerelease tag to the version, maybe even bumping to a new major, so setting a version like 2.0.0-beta1. This would also allow to keep 1.x for backports to GA versions till synthetic source is GA, or at least more tested, in logs data streams.

Comment on lines +11 to +13
decision_id:
type: text
store: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this field be defined as a normal field under a fields file? I think these definitions are also not allowed on recent versions of the package spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this field be defined as a normal field under a fields file?

That was the first thing I tried, but for some reason it had no effect, so I tried here and it worked.

I confess I don't know much about the integrations development, so for things like that I usually go with a trial and error approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share what you tried first?

Actually I don't see any package using store: true and this is not explicitly supported by the spec, so maybe this is the only way to do it, and not sure if this will work in v3, I will take a look to this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't see any package using store: true and this is not explicitly supported by the spec, so maybe this is the only way to do it, and not sure if this will work in v3, I will take a look to this.

This is indeed not supported in current spec, adding support for it in elastic/package-spec#748.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share what you tried first?

I had replaced

by

- name: decision_id
  type: text
  store: true

But it had no effect :/

If I understood correctly, elastic/package-spec#748 should make store: true allowed and working when used in fields.yml, is that right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly, elastic/package-spec#748 should make store: true allowed and working when used in fields.yml, is that right?

Yes, I tried something like what you mention and indeed this doesn't work.

These changes are both needed:

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team [elastic/elastic-agent-data-plane] label May 10, 2024
@elasticmachine
Copy link

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@belimawr
Copy link
Contributor Author

Given the backport PR elastic/kibana#183313 I've lowered the Kibana version to 8.14.1

@elasticmachine
Copy link

elasticmachine commented May 13, 2024

💔 Build Failed

Failed CI Steps

History

jsoriano added a commit to elastic/kibana that referenced this pull request May 14, 2024
## Summary

Add support for the `store` mapping parameter in fields defined in
packages.

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

### Related issues

- Support in packages added in
elastic/package-spec#748.
- Needed for cases where synthetic source mode is used, like in
elastic/integrations#9826.
@pierrehilbert
Copy link
Contributor

Given the backport PR elastic/kibana#183313 I've lowered the Kibana version to 8.14.1

Won't it be available from 8.14.0?

@botelastic
Copy link

botelastic bot commented Jun 20, 2024

Hi! We just realized that we haven't looked into this PR in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Jun 20, 2024
@botelastic botelastic bot removed the Stalled label Jul 19, 2024
@botelastic
Copy link

botelastic bot commented Aug 18, 2024

Hi! We just realized that we haven't looked into this PR in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Aug 18, 2024
changes:
- description: Enable synthetic source for Elastic-Agent datastreams
type: enhancement
link: https://github.com/elastic/integrations/pull/42
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
link: https://github.com/elastic/integrations/pull/42
link: https://github.com/elastic/integrations/pull/9826

@botelastic botelastic bot removed the Stalled label Aug 20, 2024
@botelastic
Copy link

botelastic bot commented Sep 19, 2024

Hi! We just realized that we haven't looked into this PR in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Sep 19, 2024
@botelastic
Copy link

botelastic bot commented Oct 19, 2024

Hi! This PR has been stale for a while and we're going to close it as part of our cleanup procedure. We appreciate your contribution and would like to apologize if we have not been able to review it, due to the current heavy load of the team. Feel free to re-open this PR if you think it should stay open and is worth rebasing. Thank you for your contribution!

@botelastic botelastic bot closed this Oct 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Integration:elastic_agent Elastic Agent Stalled Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team [elastic/elastic-agent-data-plane]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants