Introduce component status reporting #6550

tigrannajaryan · 2022-11-15T17:58:47Z

Summary of changes

Add component/status package. This introduces the concepts of component status event and pipeline readiness status.
Introduce Host.RegisterStatusListener() to allow components to listen for status changes.
Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus().
Move component ID and Type to a separate component/id package. This is necessary to avoid dependency cycle with the new component/status package. Large part of the diff in this commit is because of this particular change. If this is too large and difficult to review we can split this commit into 2 parts where this particular change is a separate commit.
Deprecated component.ID and component.Type in favour of id.ID and id.Type.

TODO after this is merged

healthcheck extension must register and listen to component statuses.
Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib.

Open Questions

Do we want to name component/id package component/componentid instead?
The pipelines readiness status is also implemented in the component/status package. We can split the status, componentstatus and pipelinestatus into separate packages but not sure if it warrants creation of a new top-level package just to be able to stay pure in the component package. We can try this if think it is a better approach.
Listeners need to be able to tell if all current components are healthy. It is assumed that the listeners need to maintain a map of components and track the status of each component. This works only if we assume that the set of components cannot change during the lifetime of the listener. This assumption is true today but can change in the future if we introduce partial pipeline restarts where only modified/added/removed components are recreated (this will break listener's assumption and the map will become invalid). Should we instead keep track of this entire status map in the Host and broadcast the entire status to the listeners as a whole instead of (or in addition to) individual component events?

tigrannajaryan · 2022-11-15T17:59:27Z

@open-telemetry/collector-approvers @open-telemetry/collector-contrib-approvers please take a look at the draft and tell me if you like the direction.

codecov · 2022-11-15T18:11:11Z

Codecov Report

Base: 91.33% // Head: 91.35% // Increases project coverage by +0.02% 🎉

Coverage data is based on head (0de249e) compared to base (375a859).
Patch coverage: 94.68% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6550      +/-   ##
==========================================
+ Coverage   91.33%   91.35%   +0.02%     
==========================================
  Files         242      246       +4     
  Lines       13888    14055     +167     
==========================================
+ Hits        12684    12840     +156     
- Misses        959      968       +9     
- Partials      245      247       +2

Impacted Files	Coverage Δ
component/config.go	`40.00% <ø> (ø)`
component/extension.go	`100.00% <ø> (ø)`
service/config.go	`100.00% <ø> (ø)`
service/extensions/extensions.go	`79.79% <50.00%> (-3.19%)`	⬇️
service/service.go	`65.78% <50.00%> (-3.66%)`	⬇️
service/internal/pipelines/pipelines.go	`94.17% <87.09%> (-0.95%)`	⬇️
component/component.go	`100.00% <100.00%> (ø)`
component/componenttest/nop_exporter.go	`100.00% <100.00%> (ø)`
component/componenttest/nop_extension.go	`100.00% <100.00%> (ø)`
component/componenttest/nop_host.go	`100.00% <100.00%> (ø)`
... and 24 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

bogdandrutu

Super hard to review, are you ok to only keep for the moment changes in "component/*" in the PR for the moment?

Overall feedback:

Direction is fine.
Need to better understand if the pipeline status should be also part of this mechanism (maybe postpone for later that part).
Do we go on a model where components register for updates, or we define an optional interface that Components can implement and we check for that instead of exposing the API on the Host?
In general I want to make components "less" aware about their ID. For that I think service has full control on the Host instance we pass to each component (and knows/can guarantee the right ID is set in the wrapper).

bogdandrutu · 2022-11-15T18:07:21Z

component/status/status.go

+}
+
+// ComponentEventOption applies options to a ComponentEvent.
+type ComponentEventOption func(*ComponentEvent) error


either make the arg of the func private, or use an interface with a private func.

bogdandrutu · 2022-11-15T18:08:03Z

component/status/status.go

+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package status


Reading this package, makes me think that we want this in component, because all of the structs start with "component".

I would rename component.StatusEvent the main struct of this package and get rid of it.

That was my initial approach, but it pollutes the component package with many new public symbols. I think it is a bit better this way to isolate status-related stuff in a separate sub-package. No strong opinion, I can merge if we think that's better.

bogdandrutu · 2022-11-15T18:09:32Z

component/status/status.go

+// Source is the component that is reporting its status.
+func (ev *ComponentEvent) Source() Source {
+	return ev.source
+}


Not sure about this, I think we should not make the component that aware of the ID. I think we already "wrap" the host, so we know which component reports the Event to us.

Yes, I can hide the ID on the "generating" side. But we still need to keep it on the "listening" side. See my other comment.

Yes, pass it as separate arg.

bogdandrutu · 2022-11-15T18:31:03Z

component/status/status_listener.go

+}
+
+// ListenerOption applies options to a Listener.
+type ListenerOption func(*Listener)


Same comment here, right now func(*Listener) is public for no good reason.

bogdandrutu · 2022-11-15T18:33:15Z

component/host.go

+	// the following events:
+	// - Events reported by via the ReportComponentStatus function,
+	// - Notifications about pipeline status changes (ready, not ready).
+	RegisterStatusListener(options ...status.ListenerOption) StatusListenerUnregisterFunc


Looks like status.Listener is not used anywhere.

It is used in notifications.go.

tigrannajaryan · 2022-11-15T19:13:18Z

Super hard to review, are you ok to only keep for the moment changes in "component/*" in the PR for the moment?

Try now, I only kept component/* and service/* for now.

## Summary of changes - Add component/status package. This introduces the concepts of component status event and pipeline readiness status. - Introduce Host.RegisterStatusListener() to allow components to listen for status changes. - Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus(). - Move component ID and Type to a separate component/id package. This is necessary to avoid dependency cycle with the new component/status package. Large part of the diff in this commit is because of this particular change. If this is too large and difficult to review we can split this commit into 2 parts where this particular change is a separate commit. - Deprecated component.ID and component.Type in favour of id.ID and id.Type. ## TODO after this is merged - healthcheck extension must register and listen to component statuses. - Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib. ## Open Questions - Do we want to name component/id package component/componentid instead? - The pipelines readiness status is also implemented in the component/status package. We can split the status, componentstatus and pipelinestatus into separate packages but not sure if it warrants creation of a new top-level package just to be able to stay pure in the component package. We can try this if think it is a better approach. - Listeners need to be able to tell if all current components are healthy. It is assumed that the listeners need to maintain a map of components and track the status of each component. This works only if we assume that the set of components cannot change during the lifetime of the listener. This assumption is true today but can change in the future if we introduce partial pipeline restarts where only modified/added/removed components are recreated (this will break listener's assumption and the map will become invalid). Should we instead keep track of this entire status map in the Host and broadcast the entire status to the listeners as a whole instead of (or in addition to) individual component events?

tigrannajaryan · 2022-11-15T19:17:07Z

Do we go on a model where components register for updates, or we define an optional interface that Components can implement and we check for that instead of exposing the API on the Host?

This is the current approach we use with PipelineWatcher interface. If prefer this approach then we can do that. An important difference is that with Host some other bit of code can register to listen (not sure why we would need it), while with this approach only components can.

tigrannajaryan · 2022-11-15T19:22:15Z

In general I want to make components "less" aware about their ID. For that I think service has full control on the Host instance we pass to each component (and knows/can guarantee the right ID is set in the wrapper).

Yes, with current approach the source component isn't aware of its ID. The Source field in the event is filled by hostWrapper.

The listener MUST be aware of component IDs in order to keep a full status of all components. The full status is necessary so that the healthcheck can aggregate it into one boolean "Healthy" value. To keep the full status the listener needs a map where the key is source component ID. If we want to hide the component IDs from status reporting mechanism then we have to keep track of the full status in the Host (hostWrapper/service) and only notify listeners about full status and not individual component statuses.

bogdandrutu · 2022-11-15T19:34:09Z

The listener MUST be aware of component IDs in order to keep a full status of all components. The full status is necessary so that the healthcheck can aggregate it into one boolean "Healthy" value. To keep the full status the listener needs a map where the key is source component ID. If we want to hide the component IDs from status reporting mechanism then we have to keep track of the full status in the Host (hostWrapper/service) and only notify listeners about full status and not individual component statuses.

If the source of the "Source" is Host then what about passing ID as a separate argument to the "ListenerFunc"? So that we we don't confuse the Component that they need to populate the ID.

This is an alternate to open-telemetry#6550 ## Summary of changes - Add component status concept. Components can report their status via Host.ReportComponentStatus(). Interested extensions can watch status events if they implement StatusWatcher interface. This is similar to how PipelineWatcher works today. - Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus(). ## TODO after this is merged - healthcheck extension must implement StatusWatcher. - Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib. ## Open Questions - StatusWatchers need to be able to tell if all current components are healthy. It is assumed that the listeners need to maintain a map of components and track the status of each component. This works only if we assume that the set of components cannot change during the lifetime of the listener. This assumption is true today but can change in the future if we introduce partial pipeline restarts where only modified/added/removed components are recreated (this will break listener's assumption and the map will become invalid). Should we instead keep track of this entire status map in the Host and broadcast the entire status to the listeners as a whole instead of (or in addition to) individual component events?

tigrannajaryan · 2022-11-15T22:53:46Z

An alternate approach here: #6560

This is an alternate to open-telemetry#6550 ## Summary of changes - Add component status concept. Components can report their status via Host.ReportComponentStatus(). Interested extensions can watch status events if they implement StatusWatcher interface. This is similar to how PipelineWatcher works today. - Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus(). ## TODO after this is merged - healthcheck extension must implement StatusWatcher. - Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib. ## Open Questions - StatusWatchers need to be able to tell if all current components are healthy. It is assumed that the listeners need to maintain a map of components and track the status of each component. This works only if we assume that the set of components cannot change during the lifetime of the listener. This assumption is true today but can change in the future if we introduce partial pipeline restarts where only modified/added/removed components are recreated (this will break listener's assumption and the map will become invalid). Should we instead keep track of this entire status map in the Host and broadcast the entire status to the listeners as a whole instead of (or in addition to) individual component events?

tigrannajaryan · 2022-11-16T17:45:00Z

Closing in favour of #6560

This is an alternate to open-telemetry#6550 ## Summary of changes - Add component status concept. Components can report their status via Host.ReportComponentStatus(). Interested extensions can watch status events if they implement StatusWatcher interface. This is similar to how PipelineWatcher works today. - Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus(). ## TODO after this is merged - healthcheck extension must implement StatusWatcher. - Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib. ## Open Questions - StatusWatchers need to be able to tell if all current components are healthy. It is assumed that the listeners need to maintain a map of components and track the status of each component. This works only if we assume that the set of components cannot change during the lifetime of the listener. This assumption is true today but can change in the future if we introduce partial pipeline restarts where only modified/added/removed components are recreated (this will break listener's assumption and the map will become invalid). Should we instead keep track of this entire status map in the Host and broadcast the entire status to the listeners as a whole instead of (or in addition to) individual component events?

This is an alternate to open-telemetry#6550 - Add component status concept. Components can report their status via Host.ReportComponentStatus(). Interested extensions can watch status events if they implement StatusWatcher interface. This is similar to how PipelineWatcher works today. - Deprecate Host.ReportFatalError() in favour of Host.ReportComponentStatus(). - healthcheck extension must implement StatusWatcher. - Replace all ReportFatalError() calls by ReportComponentStatus() calls in core and contrib. - StatusWatchers need to be able to tell if all current components are healthy. It is assumed that the listeners need to maintain a map of components and track the status of each component. This works only if we assume that the set of components cannot change during the lifetime of the listener. This assumption is true today but can change in the future if we introduce partial pipeline restarts where only modified/added/removed components are recreated (this will break listener's assumption and the map will become invalid). Should we instead keep track of this entire status map in the Host and broadcast the entire status to the listeners as a whole instead of (or in addition to) individual component events?

bogdandrutu reviewed Nov 15, 2022

View reviewed changes

tigrannajaryan force-pushed the feature/tigran/status_reporting branch from 5285937 to d73c804 Compare November 15, 2022 19:12

tigrannajaryan force-pushed the feature/tigran/status_reporting branch from d73c804 to 0de249e Compare November 15, 2022 19:14

tigrannajaryan mentioned this pull request Nov 15, 2022

Introduce component status reporting #6560

Closed

tigrannajaryan closed this Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce component status reporting #6550

Introduce component status reporting #6550

tigrannajaryan commented Nov 15, 2022

tigrannajaryan commented Nov 15, 2022

codecov bot commented Nov 15, 2022 •

edited

Loading

bogdandrutu left a comment

bogdandrutu Nov 15, 2022

bogdandrutu Nov 15, 2022

bogdandrutu Nov 15, 2022

tigrannajaryan Nov 15, 2022

bogdandrutu Nov 15, 2022

tigrannajaryan Nov 15, 2022

bogdandrutu Nov 15, 2022

bogdandrutu Nov 15, 2022

bogdandrutu Nov 15, 2022

tigrannajaryan Nov 15, 2022

tigrannajaryan commented Nov 15, 2022

tigrannajaryan commented Nov 15, 2022

tigrannajaryan commented Nov 15, 2022

bogdandrutu commented Nov 15, 2022

tigrannajaryan commented Nov 15, 2022

tigrannajaryan commented Nov 16, 2022

Introduce component status reporting #6550

Introduce component status reporting #6550

Conversation

tigrannajaryan commented Nov 15, 2022

Summary of changes

TODO after this is merged

Open Questions

tigrannajaryan commented Nov 15, 2022

codecov bot commented Nov 15, 2022 • edited Loading

Codecov Report

bogdandrutu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tigrannajaryan commented Nov 15, 2022

tigrannajaryan commented Nov 15, 2022

tigrannajaryan commented Nov 15, 2022

bogdandrutu commented Nov 15, 2022

tigrannajaryan commented Nov 15, 2022

tigrannajaryan commented Nov 16, 2022

codecov bot commented Nov 15, 2022 •

edited

Loading