Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement graceful shutdown with pipeline draining #483

Closed
tigrannajaryan opened this issue Jan 3, 2020 · 13 comments
Closed

Implement graceful shutdown with pipeline draining #483

tigrannajaryan opened this issue Jan 3, 2020 · 13 comments
Assignees
Labels
area:pipeline area:processor help wanted Good issue for contributors to OpenTelemetry Service to pick up
Milestone

Comments

@tigrannajaryan
Copy link
Member

Currently when shutting down any data that is in the pipeline is lost. We need to drain the pipeline during shutdown. First we need to shutdown receivers, then processors (and implement draining in processors that stores data), then exporters.

@joe-elliott
Copy link
Contributor

I may be misunderstanding the need, but except for the processors this appears to be done:

func (app *Application) shutdownPipelines() {
// Shutdown order is the reverse of building: first receivers, then flushing pipelines
// giving senders a chance to send all their data. This may take time, the allowed
// time should be part of configuration.
app.logger.Info("Stopping receivers...")
app.builtReceivers.StopAll()
// TODO: shutdown processors by calling Shutdown() for each processor in the
// order they are arranged in the pipeline.
app.logger.Info("Shutting down exporters...")
app.exporters.ShutdownAll()
}

@tigrannajaryan
Copy link
Member Author

except for the processors

Yes, this is the missing part. Processors support partially coming in #488

We still need to implement draining in processors (particularly in batch and queue).

@tigrannajaryan tigrannajaryan added the help wanted Good issue for contributors to OpenTelemetry Service to pick up label May 13, 2020
@flands flands added this to the GA 1.0 milestone Jun 18, 2020
@jan25
Copy link

jan25 commented Jun 28, 2020

Hi! Can i work on this?

After quick look at queued_processor, this is a rough idea to fix - On shutdown have a deadline of 5s or until queue is emptied(whichever happens first). Also, We can have stats on how many spans were flushed and dropped during shutdown process. WDYT?

I see there is Stop() in queued_processor, any hints where is that used?

Similar idea can be applied to batch_processor too.

@tigrannajaryan
Copy link
Member Author

@jan25 do you still want to work on this?

We will need to drain the pipeline by first stopping receivers, then stopping processors in the right order, then stop exporters. When stopping we will need to wait for up to certain max time, you are right. It needs to be done for all components, not just certain processors.

@jan25
Copy link

jan25 commented Oct 3, 2020

@tigrannajaryan Sure, I'll start the work and open a PR soon

@tigrannajaryan
Copy link
Member Author

@jan25 I will remove this from 1.0 milestone, since it is not a must-have, but feel free to work on it. We will accept it when it is ready.

@kamalmarhubi
Copy link

@tigrannajaryan Is there a suggested workaround to avoid dropping messages during collector upgrades?

@tigrannajaryan
Copy link
Member Author

There is no known workaround.

@kamalmarhubi
Copy link

@tigrannajaryan in that case, I feel like this should be added back to the GA milestone? Dropping things on the floor during upgrades is not great...

@tigrannajaryan
Copy link
Member Author

@kamalmarhubi Help is wanted on this one. If there are contributors who can submit a PR before GA, we will be happy to review it.

@bogdandrutu
Copy link
Member

I think this is done correct?

MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this issue Nov 11, 2021
hughesjj pushed a commit to hughesjj/opentelemetry-collector that referenced this issue Apr 27, 2023
* Bump version to 0.1.0

* Remove redundant info from README.md
swiatekm pushed a commit to swiatekm/opentelemetry-collector that referenced this issue Oct 9, 2024
@clintonb
Copy link

Is graceful shutdown actually implemented/supported? The docs say it is not:

### Graceful Shutdown
Collector does not yet support graceful shutdown but we plan to add it. All components
must be ready to shutdown gracefully via `Shutdown()` function that all component
interfaces require. If components contain any temporary data they need to process
and export it out of the Collector before shutdown is completed. The shutdown process
will have a maximum allowed duration so put a limit on how long your shutdown
operation can take.

@bradmac-ats
Copy link

Is graceful shutdown actually implemented/supported? The docs say it is not:

### Graceful Shutdown
Collector does not yet support graceful shutdown but we plan to add it. All components
must be ready to shutdown gracefully via `Shutdown()` function that all component
interfaces require. If components contain any temporary data they need to process
and export it out of the Collector before shutdown is completed. The shutdown process
will have a maximum allowed duration so put a limit on how long your shutdown
operation can take.

These docs seem to indicate that it has been implemented: https://pkg.go.dev/go.opentelemetry.io/collector/otelcol#CollectorSettings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:pipeline area:processor help wanted Good issue for contributors to OpenTelemetry Service to pick up
Projects
None yet
Development

No branches or pull requests

9 participants