Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added enhancements/cluster-logging/content-filter.md #1513

Merged

Conversation

alanconway
Copy link
Contributor

No description provided.

@openshift-ci openshift-ci bot requested review from jcantrill and periklis November 3, 2023 15:54
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
@jcantrill
Copy link
Contributor

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 5, 2024
@alanconway
Copy link
Contributor Author

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 5, 2024
@alanconway alanconway force-pushed the content-filter-enhancement branch from 9959e1c to 249287a Compare January 5, 2024 22:33
@alanconway
Copy link
Contributor Author

@jcantrill @r2d2rnd thanks for excellent comments, sorry for slow response. I think the latest push addresses them all, re-comment if not.

@alanconway alanconway force-pushed the content-filter-enhancement branch from 249287a to a5f5dbd Compare January 5, 2024 22:34
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
Copy link
Contributor

@periklis periklis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good start proposal for enhancing log record control in many constrained production environments. It offers two important tools for dropping entire records and or dropping fields from each record.

After thorough reading I have a couple of questions as food for thought if we (intent to) leave room for future improvements:

  1. What is the inspiration for naming the feature filters and accordingly the first two prune (related to fields) and drop (related to entire records)? Is the source of inspiration on how similar features work on the supported collector implementations (i.e. Vector, fluentd)? If yes can you point to them in a related work section?
  2. Although the proposal aims to give users means to control log record size and log volume, it is not clear if we want to go any step forward in future, i.e. relabeling records, constructing calculated fields. Latter is explicitly stated as a non-goal, but is this a definitive NO that we won't do this ever? If not what can we learn from other sibling tools like Prometheus' relabeling-config that combines dropping/pruning with relabeling in one config block?
  3. Considering filter validation it seems that each as well as both together having a serious impact on the ClusterLogForwarder status field to inform the user about broken and/or invalid rules as well as the impact on log records and volume. I believe error reporting deserves a separate section to make the proposal complete. The reader should be informed on how we report invalid rules/filters (i.e. support and docs will have an easier enablement in this regard).
  4. Do we need to provide default filters from the early start on? This looks like extra baggage that will result in special handling once we support more formats than the traditional Viaq model, i.e. OpenTelemetry. My suggestion is to drop defaults until we realize the impact supporting filters across multiple log record formats.
  5. Speaking about OpenTelemetry, I believe this proposal deserves a section elaborating about the impact on OpenTelemetry formatted log records. Can it be that we miss cases which render OpenTelemetry log records invalid if a user prunes/drops fields from them? The proposed section should provide an first analysis in that regard.
  6. Can we simplify/condense the drop filter matchers in a form like the Prometheus' relabeling-config regex matchers? Do we need numerical comparison early on? Especially supporting numerical comparison opens IMHO the desire for more than less/greater, e.g. lessOrEqual/greaterOrEqual, modulus, etc. Can we simply keep the matchers only handling text?

@jcantrill
Copy link
Contributor

  1. What is the inspiration for naming the feature filters and accordingly the first two prune (related to fields) and drop (related to entire records)? Is the source of inspiration on how similar features work on the supported collector implementations (i.e. Vector, fluentd)? If yes can you point to them in a related work section?

Mostly this stems from the "fluentd" naming and we already have API in place caled "filters". I believe as part of the v2.0 API we should consider renaming to "transforms" which agrees with Vector. I think it is also more open ended and implies range of activities: data modification, dropping, etc.

  1. Although the proposal aims to give users means to control log record size and log volume, it is not clear if we want to go any step forward in future, i.e. relabeling records, constructing calculated fields. Latter is explicitly stated as a non-goal, but is this a definitive NO that we won't do this ever? If not what can we learn from other sibling tools like Prometheus' relabeling-config that combines dropping/pruning with relabeling in one config block?

This specific API is related only to reducing log size and dropping records. The filter framework allows us to add more if required but we are refraining from doing so at this time.

  1. Considering filter validation it seems that each as well as both together having a serious impact on the ClusterLogForwarder status field to inform the user about broken and/or invalid rules as well as the impact on log records and volume. I believe error reporting deserves a separate section to make the proposal complete. The reader should be informed on how we report invalid rules/filters (i.e. support and docs will have an easier enablement in this regard).
  2. Do we need to provide default filters from the early start on? This looks like extra baggage that will result in special handling once we support more formats than the traditional Viaq model, i.e. OpenTelemetry. My suggestion is to drop defaults until we realize the impact supporting filters across multiple log record formats.

I can imagine us modifying internal implementation to make these special filters which would simplify dropping them in future as needed (e.g. Viaq). I think this is also the correct direction for OpenTelementry but this enhancement is strictly for the very specific purposes.

  1. Speaking about OpenTelemetry, I believe this proposal deserves a section elaborating about the impact on OpenTelemetry formatted log records. Can it be that we miss cases which render OpenTelemetry log records invalid if a user prunes/drops fields from them? The proposed section should provide an first analysis in that regard.
  2. Can we simplify/condense the drop filter matchers in a form like the Prometheus' relabeling-config regex matchers? Do we need numerical comparison early on? Especially supporting numerical comparison opens IMHO the desire for more than less/greater, e.g. lessOrEqual/greaterOrEqual, modulus, etc. Can we simply keep the matchers only handling text?

I would agree. I don't believe we have need for numerical comparison for now and I have not seen any requirements.

@alanconway
Copy link
Contributor Author

1. What is the inspiration for naming the feature `filters`

Vector uses 'filter' to drop records and 'transform' to modify them, but then blurs the distinction by allowing a 'transform' to also drop records using "abort".
I decided to call everything a 'filter', the distinction doesn't seem all that helpful.
E.g. the "drop" and "prune" filters do just that (drop records, prune fields) but the apiAudit filter does both since the kube-api audit policy is written that way.

2. Although the proposal aims to give users means to control log record size and log volume, it is not clear if we want to go any step forward in future, i.e. relabeling records, constructing calculated fields. Latter is explicitly stated as a non-goal, but is this a definitive NO that we won't do this ever?

Non-goal only for current work, could be extended in future.

If not what can we learn from other sibling tools like Prometheus' relabeling-config that combines dropping/pruning with relabeling in one config block?

The "filter" type is an open ended YAML struct so we have scope to add new filters that do whatever we want.

3. Considering filter validation it seems that each as well as both together having a serious impact on the `ClusterLogForwarder` status field to inform the user about broken and/or invalid rules

I think this fits in the existing scheme of having a list of text error messages associated with each broken input/output/pipeline.

as well as the impact on log records and volume. I believe error reporting deserves a separate section to make the proposal complete. The reader should be informed on how we report invalid rules/filters (i.e. support and docs will have an easier enablement in this regard).

+1 to adding an error handling section.

4. Do we need to provide default filters from the early start on? 

Not necessarily. The one we've had most intense demand for is filtering by log level, but this isn't so hard to write by hand as a drop fiter:

-  field: level 
   in: [ critical, error, warning] 
5. Speaking about OpenTelemetry, I believe this proposal deserves a section elaborating about the impact on [OpenTelemetry formatted log records](https://opentelemetry.io/docs/specs/otel/protocol/file-exporter/#examples). Can it be that we miss cases which render OpenTelemetry log records invalid if a user prunes/drops fields from them? The proposed section should provide an first analysis in that regard.

Originally the proposal had such a section but it proved confusing. The filters will work the same, but with different field names. Additional validation or guidelines for OTEL can be added when we have an OTEL implementation.

6. Can we simplify/condense the `drop` filter matchers in a form like the Prometheus' [relabeling-config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config) `regex` matchers? Do we need numerical comparison early on? Especially  supporting numerical comparison opens IMHO the desire for more than less/greater, e.g. lessOrEqual/greaterOrEqual, modulus, etc. Can we simply keep the matchers only handling text?

+1 to dropping numerical for first cut.

@alanconway alanconway force-pushed the content-filter-enhancement branch from a5f5dbd to d5fd9af Compare January 15, 2024 19:56
@alanconway
Copy link
Contributor Author

@periklis @jcantrill thanks again for more good comments, the proposal is simpler & better for them.
Please re-review & make sure all your concerns are covered.

@alanconway alanconway force-pushed the content-filter-enhancement branch from d5fd9af to d29f350 Compare January 15, 2024 19:59
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
enhancements/cluster-logging/content-filter.md Outdated Show resolved Hide resolved
@alanconway alanconway force-pushed the content-filter-enhancement branch 2 times, most recently from 4319cb7 to a6a1feb Compare January 15, 2024 21:56
@jcantrill
Copy link
Contributor

/approve

Copy link
Contributor

openshift-ci bot commented Jan 15, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcantrill

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 15, 2024
- No new security issues.
- Extensions to ClusterLogForwarder will be reviewed as part of normal code review.

### Drawbacks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regexp syntax can be depended on collector implementation, e.g: Ruby is case-sensitive, Rust is case-insensitive by default and etc.

Copy link
Contributor

@periklis periklis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just small formatting suggestions for the examples.

Comment on lines 92 to 107
``` yaml
spec:
filters:
- name: foo
type: prune
prune:
notIn: [.message, .kubernetes] # Keep only the message and kubernetes fields.
in: [.kubernetes.flat_labels] # Prune the kubernetes.flat_labels sub-field.
pipelines:
- name: bar
filterRefs: ["foo"]
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``` yaml
spec:
filters:
- name: foo
type: prune
prune:
notIn: [.message, .kubernetes] # Keep only the message and kubernetes fields.
in: [.kubernetes.flat_labels] # Prune the kubernetes.flat_labels sub-field.
pipelines:
- name: bar
filterRefs: ["foo"]
```
``` yaml
spec:
filters:
- name: foo
type: prune
prune:
notIn: [.message, .kubernetes] # Keep only the message and kubernetes fields.
in: [.kubernetes.flat_labels] # Prune the kubernetes.flat_labels sub-field.
pipelines:
- name: bar
filterRefs: ["foo"]

Comment on lines 112 to 127
``` yaml
spec:
filters:
- name: # Provided by the user
type: drop
drop:
- test:
- field: # JSON path to the field
# Requires exactly one of the following conditions.
matches: # regular expression match
notMatches: # regular expression does not match
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``` yaml
spec:
filters:
- name: # Provided by the user
type: drop
drop:
- test:
- field: # JSON path to the field
# Requires exactly one of the following conditions.
matches: # regular expression match
notMatches: # regular expression does not match
```
``` yaml
spec:
filters:
- name: # Provided by the user
type: drop
drop:
- test:
- field: # JSON path to the field
# Requires exactly one of the following conditions.
matches: # regular expression match
notMatches: # regular expression does not match

Comment on lines 135 to 153
``` yaml
filters:
- name: important
type: drop
drop:
- tests:
- field: .kubernetes.namespace_name
notMatches: "very-important" # Keep everything from this namespace.
- field: .level # Keep important levels
matches: "warning|error|critical"
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``` yaml
filters:
- name: important
type: drop
drop:
- tests:
- field: .kubernetes.namespace_name
notMatches: "very-important" # Keep everything from this namespace.
- field: .level # Keep important levels
matches: "warning|error|critical"
```
``` yaml
filters:
- name: important
type: drop
drop:
- tests:
- field: .kubernetes.namespace_name
notMatches: "very-important" # Keep everything from this namespace.
- field: .level # Keep important levels
matches: "warning|error|critical"

@periklis
Copy link
Contributor

3. Considering filter validation it seems that each as well as both together having a serious impact on the ClusterLogForwarder status field to inform the user about broken and/or invalid rules as well as the impact on log records and volume. I believe error reporting deserves a separate section to make the proposal complete. The reader should be informed on how we report invalid rules/filters (i.e. support and docs will have an easier enablement in this regard).

@alanconway ☝️ Regarding that do you intend to provide examples on how the status field will look like for error reporting?

- name: important
type: drop
drop:
- tests:
Copy link
Contributor

@vparfonov vparfonov Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "tests" good name here, what about something like: "rules"

@dhellmann
Copy link
Contributor

#1555 is changing the enhancement template in a way that will cause the header check in the linter job to fail for existing PRs. If this PR is merged within the development period for 4.16 you may override the linter if the only failures are caused by issues with the headers (please make sure the markdown formatting is correct). If this PR is not merged before 4.16 development closes, please update the enhancement to conform to the new template.

@alanconway alanconway force-pushed the content-filter-enhancement branch 2 times, most recently from 746fa58 to 9084b95 Compare February 16, 2024 18:35
@alanconway alanconway force-pushed the content-filter-enhancement branch from 9084b95 to 3409e86 Compare February 16, 2024 18:54
Copy link
Contributor

openshift-ci bot commented Feb 16, 2024

@alanconway: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Contributor

@jcantrill jcantrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 16, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit e7c9a21 into openshift:master Feb 16, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants