-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable synthetic source for Elastic-Agent datastreams #9826
Conversation
This commit enables the synthetic source for all Elastic-Agent datastreams except for `cloudbeat_logs` and `cloud_defend_logs` that are not compatible with synthetic source because they define a `decision_id` field as text.
packages/elastic_agent/manifest.yml
Outdated
description: Collect logs and metrics from Elastic Agents. | ||
type: integration | ||
format_version: 1.0.0 | ||
license: basic | ||
categories: ["elastic_stack"] | ||
conditions: | ||
kibana: | ||
version: "^8.11.2" | ||
version: "^8.15.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to be careful not to publish this too far ahead of the release of 8.15.0? 8.14.0 isn't even out yet. Presumably we couldn't bump the minor version if we need to add features for another reason if this goes first?
@jsoriano what's the best practice here? Do we need to mark this as a prerelease version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation to use synthetic source in this package?
So far synthetic source is only GA for TSDB indexes. APM has been using at least since 8.7.0 on metrics data streams. Around this version also Fleet starting enabling it by default on all TSDB indexes.
But I don't think we are using it for logs data streams anywhere, so it can be a bit risky to enable it here unless there are strong reasons.
If we want to enable it in any case here, I think we should release it as prerelease, adding a prerelease tag to the version
, maybe even bumping to a new major, so setting a version like 2.0.0-beta1
. This would also allow to keep 1.x for backports to GA versions till synthetic source is GA, or at least more tested, in logs data streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation to use synthetic source in this package?
Significantly reducing the storage footprint of the agent monitoring logs data. This is the storage cost for an agent doing nothing useful, merely existing.
Agree it needs to be pre-release, since synthetic source on logs indices is still technical preview.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the PM for synthetic source, this is a good candidate for using synthetic _source prior to GA, given that these are internal logs, we do not maintain a contract for them, and we are not blocked by any synthetic source limitations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, makes sense to try this first in an internal feature. Is there a target version to release this feature as GA for all indexes?
it was pointed out to me that synthetic source for logs indices is in technical preview. I'll chat with the ES team and see how bad of an idea this is from their perspective |
|
_source: | ||
mode: synthetic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this syntax is not valid in recent versions of the spec. Please use the following syntax, Fleet is aware of the source_mode
setting and can handle it in a safer way.
_source: | |
mode: synthetic | |
source_mode: synthetic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, it worked well on all my manual testing. But I'll update and test again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this enables synthetic mode, but Fleet may not be aware of it. There is some logic in Fleet to allow to override this option by the user, not sure if this works with this syntax.
This behavior is described here: elastic/kibana#141211 (comment)
Older versions of the spec, like 1.0.0, were much less restrictive, and allowed things that could lead to problematic or ambiguous situations. In v2 and v3 we have been introducing restrictions to avoid these situations. For example restrict now what can be configured for index templates in the manifest.
You can find here some help for some common issues when upgrading packages to v2 or v3: https://github.com/elastic/elastic-package/blob/main/docs/howto/update_major_package_spec.md
packages/elastic_agent/manifest.yml
Outdated
description: Collect logs and metrics from Elastic Agents. | ||
type: integration | ||
format_version: 1.0.0 | ||
license: basic | ||
categories: ["elastic_stack"] | ||
conditions: | ||
kibana: | ||
version: "^8.11.2" | ||
version: "^8.15.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation to use synthetic source in this package?
So far synthetic source is only GA for TSDB indexes. APM has been using at least since 8.7.0 on metrics data streams. Around this version also Fleet starting enabling it by default on all TSDB indexes.
But I don't think we are using it for logs data streams anywhere, so it can be a bit risky to enable it here unless there are strong reasons.
If we want to enable it in any case here, I think we should release it as prerelease, adding a prerelease tag to the version
, maybe even bumping to a new major, so setting a version like 2.0.0-beta1
. This would also allow to keep 1.x for backports to GA versions till synthetic source is GA, or at least more tested, in logs data streams.
decision_id: | ||
type: text | ||
store: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this field be defined as a normal field under a fields
file? I think these definitions are also not allowed on recent versions of the package spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this field be defined as a normal field under a fields file?
That was the first thing I tried, but for some reason it had no effect, so I tried here and it worked.
I confess I don't know much about the integrations development, so for things like that I usually go with a trial and error approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you share what you tried first?
Actually I don't see any package using store: true
and this is not explicitly supported by the spec, so maybe this is the only way to do it, and not sure if this will work in v3, I will take a look to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I don't see any package using
store: true
and this is not explicitly supported by the spec, so maybe this is the only way to do it, and not sure if this will work in v3, I will take a look to this.
This is indeed not supported in current spec, adding support for it in elastic/package-spec#748.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you share what you tried first?
I had replaced
integrations/packages/elastic_agent/data_stream/cloud_defend_logs/fields/fields.yml
Lines 1 to 2 in 5620eac
- name: decision_id | |
type: text |
- name: decision_id
type: text
store: true
But it had no effect :/
If I understood correctly, elastic/package-spec#748 should make store: true
allowed and working when used in fields.yml
, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood correctly, elastic/package-spec#748 should make
store: true
allowed and working when used infields.yml
, is that right?
Yes, I tried something like what you mention and indeed this doesn't work.
These changes are both needed:
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
Given the backport PR elastic/kibana#183313 I've lowered the Kibana version to |
Co-authored-by: Jaime Soriano Pastor <[email protected]>
💔 Build Failed
Failed CI StepsHistory
|
## Summary Add support for the `store` mapping parameter in fields defined in packages. ### Checklist Delete any items that are not applicable to this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios ### Related issues - Support in packages added in elastic/package-spec#748. - Needed for cases where synthetic source mode is used, like in elastic/integrations#9826.
Won't it be available from 8.14.0? |
Hi! We just realized that we haven't looked into this PR in a while. We're sorry! We're labeling this issue as |
Hi! We just realized that we haven't looked into this PR in a while. We're sorry! We're labeling this issue as |
changes: | ||
- description: Enable synthetic source for Elastic-Agent datastreams | ||
type: enhancement | ||
link: https://github.com/elastic/integrations/pull/42 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link: https://github.com/elastic/integrations/pull/42 | |
link: https://github.com/elastic/integrations/pull/9826 |
Hi! We just realized that we haven't looked into this PR in a while. We're sorry! We're labeling this issue as |
Hi! This PR has been stale for a while and we're going to close it as part of our cleanup procedure. We appreciate your contribution and would like to apologize if we have not been able to review it, due to the current heavy load of the team. Feel free to re-open this PR if you think it should stay open and is worth rebasing. Thank you for your contribution! |
Proposed commit message
This commit enables the synthetic source for all Elastic-Agent datastreams except for
cloudbeat_logs
andcloud_defend_logs
that are not compatible with synthetic source because they define adecision_id
field as text.Checklist
changelog.yml
file.Author's Checklist
format-version
to3.1.4
in a separated PRHow to test this PR locally
GET _component_template/logs-elastic_agent.*@package
## Related issues## Screenshots