Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a scheduler to the node api to manage fairness among inputs #728

Merged
merged 14 commits into from
Dec 5, 2024

Conversation

haixuanTao
Copy link
Collaborator

This PR fixes the issue that dora node are not fair between inputs when frequency of both inputs are different and the processing time between input is high. What end up happening is that one input might be overwhelmingly called as it's frequency is higher or lower depending on the queue_size.

This PR fixes this issue by adding a scheduler that is always going to check that the next input is the one that has been waiting the longest within the queue making fairness between inputs.

This PR is a follow up PR to #724 that is rewriting the queue within nodes instead of the daemon.

@phil-opp
Copy link
Collaborator

phil-opp commented Dec 4, 2024

I don't think reordering events is a good idea. Events are not units of work, but rather things that happened. So I think we should try to report the events that happened as accurately as possible, including the order they happened in. The order can be very important in some use cases.

For example, consider an error output that is sent when an error occurs. For the receiver it might be important to know at which point in the event stream the error occured, e.g. because it wants to treat some subsequent events differently. One example could be a camera output that only sends black images after a 'camera disconnected' error. The receiver might want to ignore those camera events after the error event. If we reorder events, that's no longer possible.

@haixuanTao
Copy link
Collaborator Author

For example, consider an error output that is sent when an error occurs. For the receiver it might be important to know at which point in the event stream the error occured, e.g. because it wants to treat some subsequent events differently. One example could be a camera output that only sends black images after a 'camera disconnected' error. The receiver might want to ignore those camera events after the error event. If we reorder events, that's no longer possible.

To take you're example, if one of the input is spamming event at a very high frequency with a very long queue size, the node will never receive the error event as it will first have to deal with the long queue of spammed event. Making it react to a failure event very slowly in comparison to the moment it received the error message.

There is no reordering of messages from each data_id only a prioritization of which input is going to be taken out.

@phil-opp
Copy link
Collaborator

phil-opp commented Dec 4, 2024

the node will never receive the error event

It will receive the error event, it just receives the events that happened earlier before it.

To take you're example, if one of the input is spamming event at a very high frequency with a very long queue size

If the node is not interested in all the queue items, why not set a smaller queue size?

Also, if the event really occurs at such a high frequency that the queue is filled considerably then the queue will also overflow quickly. In that case, we remove old events, thereby moving the error event further to front of the queue.

There is no reordering of messages from each data_id only a prioritization of which input is going to be taken out.

Yeah, but it recorders the events of different data_ids with respect to each other. So nodes can no longer rely that the events are returned in chronological order.

@haixuanTao
Copy link
Collaborator Author

haixuanTao commented Dec 4, 2024

It will receive the error event, it just receives the events that happened earlier before it.

The thing is that they can be a considerable time between the error event and the moment it reacts to it, which can be very dangerous.

This also True for stop event.

If the node is not interested in all the queue items, why not set a smaller queue size?

So basically if there is two event with different frequency say.

  • Event A with 100Hz with queue size 1
  • Event B with 10Hz with queue size 1

And the compute time is always about 10Hz <-> 100ms, what is going to happen is that you will invalidate a lot of event A before reaching to event B which is going to be older than any latest event A, and so you will always return event B.

This is why we need fairness between input.

Yeah, but it recorders the events of different data_ids with respect to each other. So nodes can no longer rely that the events are returned in chronological order.

I think that the issue is that people want to have the latest data of each topic not a chronological order of all topic. This is the priority.

@haixuanTao haixuanTao force-pushed the add-scheduler-to-node-api branch from d5d647c to 9fbb7e3 Compare December 5, 2024 12:35
@haixuanTao haixuanTao force-pushed the add-scheduler-to-node-api branch from 9fbb7e3 to 1cc508c Compare December 5, 2024 12:45
Base automatically changed from reduce-event-stream-timeout to move-queue-management-into-node December 5, 2024 12:46
@haixuanTao haixuanTao merged commit 4f3c6e4 into move-queue-management-into-node Dec 5, 2024
48 of 50 checks passed
@haixuanTao haixuanTao deleted the add-scheduler-to-node-api branch December 5, 2024 13:40
@phil-opp
Copy link
Collaborator

This PR contains a lot of commits that have nothing to do with the main changes. Some commits even seem to be duplicates of commits that were merged to main earlier.
In the future, please merge each PR separately into main. Otherwise our history and changelog become too chaotic to follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants