Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce component status reporting #6550

Conversation

tigrannajaryan
Copy link
Member

Summary of changes

  • Add component/status package. This introduces the concepts of component status event and pipeline readiness status.
  • Introduce Host.RegisterStatusListener() to allow components to listen for status changes.
  • Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus().
  • Move component ID and Type to a separate component/id package. This is necessary to avoid dependency cycle with the new component/status package. Large part of the diff in this commit is because of this particular change. If this is too large and difficult to review we can split this commit into 2 parts where this particular change is a separate commit.
  • Deprecated component.ID and component.Type in favour of id.ID and id.Type.

TODO after this is merged

  • healthcheck extension must register and listen to component statuses.
  • Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib.

Open Questions

  • Do we want to name component/id package component/componentid instead?
  • The pipelines readiness status is also implemented in the component/status package. We can split the status, componentstatus and pipelinestatus into separate packages but not sure if it warrants creation of a new top-level package just to be able to stay pure in the component package. We can try this if think it is a better approach.
  • Listeners need to be able to tell if all current components are healthy. It is assumed that the listeners need to maintain a map of components and track the status of each component. This works only if we assume that the set of components cannot change during the lifetime of the listener. This assumption is true today but can change in the future if we introduce partial pipeline restarts where only modified/added/removed components are recreated (this will break listener's assumption and the map will become invalid). Should we instead keep track of this entire status map in the Host and broadcast the entire status to the listeners as a whole instead of (or in addition to) individual component events?

@tigrannajaryan
Copy link
Member Author

@open-telemetry/collector-approvers @open-telemetry/collector-contrib-approvers please take a look at the draft and tell me if you like the direction.

@codecov
Copy link

codecov bot commented Nov 15, 2022

Codecov Report

Base: 91.33% // Head: 91.35% // Increases project coverage by +0.02% 🎉

Coverage data is based on head (0de249e) compared to base (375a859).
Patch coverage: 94.68% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6550      +/-   ##
==========================================
+ Coverage   91.33%   91.35%   +0.02%     
==========================================
  Files         242      246       +4     
  Lines       13888    14055     +167     
==========================================
+ Hits        12684    12840     +156     
- Misses        959      968       +9     
- Partials      245      247       +2     
Impacted Files Coverage Δ
component/config.go 40.00% <ø> (ø)
component/extension.go 100.00% <ø> (ø)
service/config.go 100.00% <ø> (ø)
service/extensions/extensions.go 79.79% <50.00%> (-3.19%) ⬇️
service/service.go 65.78% <50.00%> (-3.66%) ⬇️
service/internal/pipelines/pipelines.go 94.17% <87.09%> (-0.95%) ⬇️
component/component.go 100.00% <100.00%> (ø)
component/componenttest/nop_exporter.go 100.00% <100.00%> (ø)
component/componenttest/nop_extension.go 100.00% <100.00%> (ø)
component/componenttest/nop_host.go 100.00% <100.00%> (ø)
... and 24 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super hard to review, are you ok to only keep for the moment changes in "component/*" in the PR for the moment?

Overall feedback:

  • Direction is fine.
  • Need to better understand if the pipeline status should be also part of this mechanism (maybe postpone for later that part).
  • Do we go on a model where components register for updates, or we define an optional interface that Components can implement and we check for that instead of exposing the API on the Host?
  • In general I want to make components "less" aware about their ID. For that I think service has full control on the Host instance we pass to each component (and knows/can guarantee the right ID is set in the wrapper).

}

// ComponentEventOption applies options to a ComponentEvent.
type ComponentEventOption func(*ComponentEvent) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either make the arg of the func private, or use an interface with a private func.

// See the License for the specific language governing permissions and
// limitations under the License.

package status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this package, makes me think that we want this in component, because all of the structs start with "component".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename component.StatusEvent the main struct of this package and get rid of it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial approach, but it pollutes the component package with many new public symbols. I think it is a bit better this way to isolate status-related stuff in a separate sub-package. No strong opinion, I can merge if we think that's better.

Comment on lines +58 to +61
// Source is the component that is reporting its status.
func (ev *ComponentEvent) Source() Source {
return ev.source
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this, I think we should not make the component that aware of the ID. I think we already "wrap" the host, so we know which component reports the Event to us.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can hide the ID on the "generating" side. But we still need to keep it on the "listening" side. See my other comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, pass it as separate arg.

}

// ListenerOption applies options to a Listener.
type ListenerOption func(*Listener)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here, right now func(*Listener) is public for no good reason.

// the following events:
// - Events reported by via the ReportComponentStatus function,
// - Notifications about pipeline status changes (ready, not ready).
RegisterStatusListener(options ...status.ListenerOption) StatusListenerUnregisterFunc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like status.Listener is not used anywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in notifications.go.

@tigrannajaryan tigrannajaryan force-pushed the feature/tigran/status_reporting branch from 5285937 to d73c804 Compare November 15, 2022 19:12
@tigrannajaryan
Copy link
Member Author

Super hard to review, are you ok to only keep for the moment changes in "component/*" in the PR for the moment?

Try now, I only kept component/* and service/* for now.

## Summary of changes

- Add component/status package. This introduces the concepts of component status
  event and pipeline readiness status.
- Introduce Host.RegisterStatusListener() to allow components to listen for
  status changes.
- Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus().
- Move component ID and Type to a separate component/id package. This is necessary
  to avoid dependency cycle with the new component/status package. Large part of
  the diff in this commit is because of this particular change. If this is too large
  and difficult to review we can split this commit into 2 parts where this particular
  change is a separate commit.
- Deprecated component.ID and component.Type in favour of id.ID and id.Type.

## TODO after this is merged

- healthcheck extension must register and listen to component statuses.
- Replace all ReportFatalError() calls by ReportComponentStatus() calls
  in core and contrib.

## Open Questions

- Do we want to name component/id package component/componentid instead?
- The pipelines readiness status is also implemented in the component/status
  package. We can split the status, componentstatus and pipelinestatus into
  separate packages but not sure if it warrants creation of a new top-level
  package just to be able to stay pure in the component package.
  We can try this if think it is a better approach.
- Listeners need to be able to tell if all current components are healthy.
  It is assumed that the listeners need to maintain a map of components and
  track the status of each component. This works only if we assume that
  the set of components cannot change during the lifetime of the listener.
  This assumption is true today but can change in the future if we introduce
  partial pipeline restarts where only modified/added/removed components
  are recreated (this will break listener's assumption and the map will become
  invalid). Should we instead keep track of this entire status map in the Host
  and broadcast the entire status to the listeners as a whole instead of
  (or in addition to) individual component events?
@tigrannajaryan tigrannajaryan force-pushed the feature/tigran/status_reporting branch from d73c804 to 0de249e Compare November 15, 2022 19:14
@tigrannajaryan
Copy link
Member Author

  • Do we go on a model where components register for updates, or we define an optional interface that Components can implement and we check for that instead of exposing the API on the Host?

This is the current approach we use with PipelineWatcher interface. If prefer this approach then we can do that. An important difference is that with Host some other bit of code can register to listen (not sure why we would need it), while with this approach only components can.

@tigrannajaryan
Copy link
Member Author

  • In general I want to make components "less" aware about their ID. For that I think service has full control on the Host instance we pass to each component (and knows/can guarantee the right ID is set in the wrapper).

Yes, with current approach the source component isn't aware of its ID. The Source field in the event is filled by hostWrapper.

The listener MUST be aware of component IDs in order to keep a full status of all components. The full status is necessary so that the healthcheck can aggregate it into one boolean "Healthy" value. To keep the full status the listener needs a map where the key is source component ID. If we want to hide the component IDs from status reporting mechanism then we have to keep track of the full status in the Host (hostWrapper/service) and only notify listeners about full status and not individual component statuses.

@bogdandrutu
Copy link
Member

The listener MUST be aware of component IDs in order to keep a full status of all components. The full status is necessary so that the healthcheck can aggregate it into one boolean "Healthy" value. To keep the full status the listener needs a map where the key is source component ID. If we want to hide the component IDs from status reporting mechanism then we have to keep track of the full status in the Host (hostWrapper/service) and only notify listeners about full status and not individual component statuses.

If the source of the "Source" is Host then what about passing ID as a separate argument to the "ListenerFunc"? So that we we don't confuse the Component that they need to populate the ID.

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-collector that referenced this pull request Nov 15, 2022
This is an alternate to open-telemetry#6550

## Summary of changes

- Add component status concept. Components can report their status via
  Host.ReportComponentStatus(). Interested extensions can watch status events
  if they implement StatusWatcher interface.
  This is similar to how PipelineWatcher works today.
- Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus().

## TODO after this is merged

- healthcheck extension must implement StatusWatcher.
- Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib.

## Open Questions

- StatusWatchers need to be able to tell if all current components are healthy.
  It is assumed that the listeners need to maintain a map of components and track
  the status of each component. This works only if we assume that the set of
  components cannot change during the lifetime of the listener. This assumption
  is true today but can change in the future if we introduce partial pipeline
  restarts where only modified/added/removed components are recreated (this will
  break listener's assumption and the map will become invalid). Should we
  instead keep track of this entire status map in the Host and broadcast the
  entire status to the listeners as a whole instead of (or in addition to)
  individual component events?
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-collector that referenced this pull request Nov 15, 2022
This is an alternate to open-telemetry#6550

## Summary of changes

- Add component status concept. Components can report their status via
  Host.ReportComponentStatus(). Interested extensions can watch status events
  if they implement StatusWatcher interface.
  This is similar to how PipelineWatcher works today.
- Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus().

## TODO after this is merged

- healthcheck extension must implement StatusWatcher.
- Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib.

## Open Questions

- StatusWatchers need to be able to tell if all current components are healthy.
  It is assumed that the listeners need to maintain a map of components and track
  the status of each component. This works only if we assume that the set of
  components cannot change during the lifetime of the listener. This assumption
  is true today but can change in the future if we introduce partial pipeline
  restarts where only modified/added/removed components are recreated (this will
  break listener's assumption and the map will become invalid). Should we
  instead keep track of this entire status map in the Host and broadcast the
  entire status to the listeners as a whole instead of (or in addition to)
  individual component events?
@tigrannajaryan
Copy link
Member Author

An alternate approach here: #6560

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-collector that referenced this pull request Nov 16, 2022
This is an alternate to open-telemetry#6550

## Summary of changes

- Add component status concept. Components can report their status via
  Host.ReportComponentStatus(). Interested extensions can watch status events
  if they implement StatusWatcher interface.
  This is similar to how PipelineWatcher works today.
- Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus().

## TODO after this is merged

- healthcheck extension must implement StatusWatcher.
- Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib.

## Open Questions

- StatusWatchers need to be able to tell if all current components are healthy.
  It is assumed that the listeners need to maintain a map of components and track
  the status of each component. This works only if we assume that the set of
  components cannot change during the lifetime of the listener. This assumption
  is true today but can change in the future if we introduce partial pipeline
  restarts where only modified/added/removed components are recreated (this will
  break listener's assumption and the map will become invalid). Should we
  instead keep track of this entire status map in the Host and broadcast the
  entire status to the listeners as a whole instead of (or in addition to)
  individual component events?
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-collector that referenced this pull request Nov 16, 2022
This is an alternate to open-telemetry#6550

## Summary of changes

- Add component status concept. Components can report their status via
  Host.ReportComponentStatus(). Interested extensions can watch status events
  if they implement StatusWatcher interface.
  This is similar to how PipelineWatcher works today.
- Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus().

## TODO after this is merged

- healthcheck extension must implement StatusWatcher.
- Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib.

## Open Questions

- StatusWatchers need to be able to tell if all current components are healthy.
  It is assumed that the listeners need to maintain a map of components and track
  the status of each component. This works only if we assume that the set of
  components cannot change during the lifetime of the listener. This assumption
  is true today but can change in the future if we introduce partial pipeline
  restarts where only modified/added/removed components are recreated (this will
  break listener's assumption and the map will become invalid). Should we
  instead keep track of this entire status map in the Host and broadcast the
  entire status to the listeners as a whole instead of (or in addition to)
  individual component events?
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-collector that referenced this pull request Nov 16, 2022
This is an alternate to open-telemetry#6550

## Summary of changes

- Add component status concept. Components can report their status via
  Host.ReportComponentStatus(). Interested extensions can watch status events
  if they implement StatusWatcher interface.
  This is similar to how PipelineWatcher works today.
- Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus().

## TODO after this is merged

- healthcheck extension must implement StatusWatcher.
- Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib.

## Open Questions

- StatusWatchers need to be able to tell if all current components are healthy.
  It is assumed that the listeners need to maintain a map of components and track
  the status of each component. This works only if we assume that the set of
  components cannot change during the lifetime of the listener. This assumption
  is true today but can change in the future if we introduce partial pipeline
  restarts where only modified/added/removed components are recreated (this will
  break listener's assumption and the map will become invalid). Should we
  instead keep track of this entire status map in the Host and broadcast the
  entire status to the listeners as a whole instead of (or in addition to)
  individual component events?
@tigrannajaryan
Copy link
Member Author

Closing in favour of #6560

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-collector that referenced this pull request Nov 21, 2022
This is an alternate to open-telemetry#6550

## Summary of changes

- Add component status concept. Components can report their status via
  Host.ReportComponentStatus(). Interested extensions can watch status events
  if they implement StatusWatcher interface.
  This is similar to how PipelineWatcher works today.
- Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus().

## TODO after this is merged

- healthcheck extension must implement StatusWatcher.
- Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib.

## Open Questions

- StatusWatchers need to be able to tell if all current components are healthy.
  It is assumed that the listeners need to maintain a map of components and track
  the status of each component. This works only if we assume that the set of
  components cannot change during the lifetime of the listener. This assumption
  is true today but can change in the future if we introduce partial pipeline
  restarts where only modified/added/removed components are recreated (this will
  break listener's assumption and the map will become invalid). Should we
  instead keep track of this entire status map in the Host and broadcast the
  entire status to the listeners as a whole instead of (or in addition to)
  individual component events?
mwear pushed a commit to mwear/opentelemetry-collector that referenced this pull request Aug 2, 2023
This is an alternate to open-telemetry#6550

- Add component status concept. Components can report their status via
  Host.ReportComponentStatus(). Interested extensions can watch status events
  if they implement StatusWatcher interface.
  This is similar to how PipelineWatcher works today.
- Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus().

- healthcheck extension must implement StatusWatcher.
- Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib.

- StatusWatchers need to be able to tell if all current components are healthy.
  It is assumed that the listeners need to maintain a map of components and track
  the status of each component. This works only if we assume that the set of
  components cannot change during the lifetime of the listener. This assumption
  is true today but can change in the future if we introduce partial pipeline
  restarts where only modified/added/removed components are recreated (this will
  break listener's assumption and the map will become invalid). Should we
  instead keep track of this entire status map in the Host and broadcast the
  entire status to the listeners as a whole instead of (or in addition to)
  individual component events?
mwear pushed a commit to mwear/opentelemetry-collector that referenced this pull request Aug 23, 2023
This is an alternate to open-telemetry#6550

- Add component status concept. Components can report their status via
  Host.ReportComponentStatus(). Interested extensions can watch status events
  if they implement StatusWatcher interface.
  This is similar to how PipelineWatcher works today.
- Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus().

- healthcheck extension must implement StatusWatcher.
- Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib.

- StatusWatchers need to be able to tell if all current components are healthy.
  It is assumed that the listeners need to maintain a map of components and track
  the status of each component. This works only if we assume that the set of
  components cannot change during the lifetime of the listener. This assumption
  is true today but can change in the future if we introduce partial pipeline
  restarts where only modified/added/removed components are recreated (this will
  break listener's assumption and the map will become invalid). Should we
  instead keep track of this entire status map in the Host and broadcast the
  entire status to the listeners as a whole instead of (or in addition to)
  individual component events?
mwear pushed a commit to mwear/opentelemetry-collector that referenced this pull request Sep 10, 2023
This is an alternate to open-telemetry#6550

- Add component status concept. Components can report their status via
  Host.ReportComponentStatus(). Interested extensions can watch status events
  if they implement StatusWatcher interface.
  This is similar to how PipelineWatcher works today.
- Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus().

- healthcheck extension must implement StatusWatcher.
- Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib.

- StatusWatchers need to be able to tell if all current components are healthy.
  It is assumed that the listeners need to maintain a map of components and track
  the status of each component. This works only if we assume that the set of
  components cannot change during the lifetime of the listener. This assumption
  is true today but can change in the future if we introduce partial pipeline
  restarts where only modified/added/removed components are recreated (this will
  break listener's assumption and the map will become invalid). Should we
  instead keep track of this entire status map in the Host and broadcast the
  entire status to the listeners as a whole instead of (or in addition to)
  individual component events?
mwear pushed a commit to mwear/opentelemetry-collector that referenced this pull request Sep 26, 2023
This is an alternate to open-telemetry#6550

- Add component status concept. Components can report their status via
  Host.ReportComponentStatus(). Interested extensions can watch status events
  if they implement StatusWatcher interface.
  This is similar to how PipelineWatcher works today.
- Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus().

- healthcheck extension must implement StatusWatcher.
- Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib.

- StatusWatchers need to be able to tell if all current components are healthy.
  It is assumed that the listeners need to maintain a map of components and track
  the status of each component. This works only if we assume that the set of
  components cannot change during the lifetime of the listener. This assumption
  is true today but can change in the future if we introduce partial pipeline
  restarts where only modified/added/removed components are recreated (this will
  break listener's assumption and the map will become invalid). Should we
  instead keep track of this entire status map in the Host and broadcast the
  entire status to the listeners as a whole instead of (or in addition to)
  individual component events?
mwear pushed a commit to mwear/opentelemetry-collector that referenced this pull request Sep 29, 2023
This is an alternate to open-telemetry#6550

- Add component status concept. Components can report their status via
  Host.ReportComponentStatus(). Interested extensions can watch status events
  if they implement StatusWatcher interface.
  This is similar to how PipelineWatcher works today.
- Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus().

- healthcheck extension must implement StatusWatcher.
- Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib.

- StatusWatchers need to be able to tell if all current components are healthy.
  It is assumed that the listeners need to maintain a map of components and track
  the status of each component. This works only if we assume that the set of
  components cannot change during the lifetime of the listener. This assumption
  is true today but can change in the future if we introduce partial pipeline
  restarts where only modified/added/removed components are recreated (this will
  break listener's assumption and the map will become invalid). Should we
  instead keep track of this entire status map in the Host and broadcast the
  entire status to the listeners as a whole instead of (or in addition to)
  individual component events?
mwear pushed a commit to mwear/opentelemetry-collector that referenced this pull request Oct 6, 2023
This is an alternate to open-telemetry#6550

- Add component status concept. Components can report their status via
  Host.ReportComponentStatus(). Interested extensions can watch status events
  if they implement StatusWatcher interface.
  This is similar to how PipelineWatcher works today.
- Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus().

- healthcheck extension must implement StatusWatcher.
- Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib.

- StatusWatchers need to be able to tell if all current components are healthy.
  It is assumed that the listeners need to maintain a map of components and track
  the status of each component. This works only if we assume that the set of
  components cannot change during the lifetime of the listener. This assumption
  is true today but can change in the future if we introduce partial pipeline
  restarts where only modified/added/removed components are recreated (this will
  break listener's assumption and the map will become invalid). Should we
  instead keep track of this entire status map in the Host and broadcast the
  entire status to the listeners as a whole instead of (or in addition to)
  individual component events?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants