Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

send_quiet is not working for container operator #35726

Open
newly12 opened this issue Oct 10, 2024 · 7 comments
Open

send_quiet is not working for container operator #35726

newly12 opened this issue Oct 10, 2024 · 7 comments
Labels
bug Something isn't working good first issue Good for newcomers pkg/stanza

Comments

@newly12
Copy link
Contributor

newly12 commented Oct 10, 2024

Component(s)

pkg/stanza

What happened?

Description

sample log(some parts are masked)

2024-10-10T09:16:41.228180936Z stderr F 2024-10-10T09:16:41.228Z        error   reader/reader.go:148    process: %w     {"kind": "receiver", "name": "file_reloader", "data_type": "logs", "kind": "receiver", "name": "filelog/xxx/xxx/driver-registrar", "component": "fileconsumer", "path": "/var/log/pods/xxx", "error": "failed to parse time: failed to get the time from &{2024-10-10 09:16:41.228140889 +0000 UTC m=+7410.805943455 0001-01-01 00:00:00 +0000 UTC 2024-10-10T09:16:31.279621753Z xxxxxx] map[]  [] [] [] DEFAULT }"}

Steps to Reproduce

running v0.109.0 and given following operator config

      - type: container
        id: container
        on_error: send_quiet
        format: crio
        add_metadata_from_filepath: false

Expected Result

no error log to be logged

Actual Result

Collector version

v0.109.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

The issue is found after we upgrade from v0.104.0 to v0.109.0, while checked change history in pkg/stanza, I suspect it is related to d31bc2e#diff-3f65701685094531f52b03700a138417be4cb0038f6acf7fea5896a791cf7f9bL135

@newly12 newly12 added bug Something isn't working needs triage New item requiring triage labels Oct 10, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@djaglowski
Copy link
Member

cc @ChrsMark

@ChrsMark ChrsMark added the good first issue Good for newcomers label Oct 10, 2024
@ChrsMark
Copy link
Member

ChrsMark commented Oct 11, 2024

I took a look into this. It seems that these errors that happen directly inside Process are not handled with HandleEntryError as it happen in other places like in ProcessWithCallback. Hence they are returned directly upwards to the caller and finally logged by the Reader.

I think we can honor the send_* options by "wrapping" these top level errors with HandleEntryError.

@khushijain21
Copy link
Contributor

Hello @ChrsMark, can you assign this to me?

@ChrsMark
Copy link
Member

Sure @khushijain21! Thank's for taking this!

@ChrsMark
Copy link
Member

ChrsMark commented Oct 30, 2024

After #35834 my comment is not valid.

This issue can happen however in other operators like at

return fmt.Errorf("failed to read dynamic header attribute %s", p.headerAttribute)
. This makes me believe that the interface here is not consistent and we need to revisit this to ensure we are not circling back.

It seems that after d31bc2e#diff-3f65701685094531f52b03700a138417be4cb0038f6acf7fea5896a791cf7f9bL135 it is expected that the errors from the operators are returned up to the reader and hence logged.

The send_quiet option can only handle if the operator itself logs the error as Error or Debug:

t.Logger().Debug("Failed to process entry", zap.Any("error", err), zap.Any("action", t.OnError))

@djaglowski @khushijain21 shall we first agree on the expected behavior before moving forward with #35758?

From what I can put together now, I think we need to agree on the following:

  1. Ensure that handling errors won't result in duplicating logs: duplicate logs are sent on timestamp parsing error #35973
  2. Agree on the scope of the on_error options. Should all errors in operators being handled based the on_error or just those that are parsing errors?
  3. Decide if operators' errors should be returned to the caller (reader) and hence logged there no matter if the entry might have been sent or not. swallowed exception in Operator.Process #33783 make the errors being propagated to the Reader, and this results in this mixed behavior here.
  4. Verify that all operators honor what on_error setting dictates.

I'm leaning towards having the on_error to actually impact the whole operator which means that if *_quiet is provided the errors are only logged and are not returned. I can reserve some time soon to put sth together unless someone else want to work on this earlier.

@djaglowski
Copy link
Member

Apologies for the churn on this. I'm definitely not very tuned into the considerations of how we do or should propagate errors here. @ChrsMark I'm happy to go with your recommendation on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers pkg/stanza
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants