[Proposal] Publish an event when long running operation complete #5479

xluo-aws · 2022-12-07T13:48:27Z

Is your feature request related to a problem? Please describe.
There are index operations that may take tens of minutes or even hours, for example, reindex, split, shrink , etc. We want to send out notifications(configured in ISM dashboard plugin) to user when they are completed, no matter the operation is submitted from ISM dashboard plugin or command line.

Describe the solution you'd like
We brainstormed a few options, the preferred one is to enhance opensearch core logic to publish an event when operation is complete. We can create listener in ISM plugin to listen to the event and send out notification. We checked existing event that plugin listen to, ClusterChangeEvent is one (not sure if there are others) that will be published when we split/shrink an index. However this event doesn't have information that's required to send out notification, for example, who submit the operation request. Other cons of this solution is Reindex will not trigger ClusterChangeEvent, so it's not a general solution.
Another possible solution is to publish a new event when long running operation is triggered. The listener in plugin will create a scheduled event to check the operation status every x minutes and send out notification once it's completed. The extension point could be extend RestToXContentListener for RestResizeHandler and RestReindexAction to publish an event, or extend TransportResizeAction/TransportReindexAction to publish event This is similar to the 2nd alternative below but has less impact because it only affects few long running operations.

Describe alternatives you've considered
1 Create wrapper API in ISM plugin, it will call existing index operation API first then create a scheduled job to check operation status every x minutes then send out notification once it's completed. This requires user to switch to new wrapper API.
2 Create actionFilter in ISM plugin to filter all requests and create a scheduled job if the request is long running operations. The major concern is performance impact. However ISM already has an actionFilter that intercept all request, we guess this solution should already have passed performance review so it's not a totally new performance risk. We can do some performance test if this can be a candidate solution.
3 For reindex, we can leverage IndexingOperationListener to monitor .task index, reindex will write to this index upon completion, we can then send out notification. For Shrink and Split, we can leverage ClusterStateChange event to find out which index is created and whether it's created due to resize or not, if it's resize, we compare its shard with source index shards to figure out it's split or shrink, then we wait for active shards to be ready(same logic as how we tell a create index operation is done) and send out notification. All coding change is in ISM plugin.

Additional context
Add any other context or screenshots about the feature request here.

xluo-aws · 2022-12-08T02:36:50Z

This ticket is related to opensearch-project/index-management-dashboards-plugin#284.

dblock · 2022-12-09T20:25:58Z

I like a generic solution in which any extension can subscribe to events, and any action can publish an event which would propagate across the cluster when there's someone listening. Events should be durable/come with certain delivery guarantees as well.

Hailong-am · 2022-12-16T08:55:08Z

For reindex, we can create scheduled job in ISM plugin to monitor .task index, if it's a reindex task, we'll create another scheduled job to check task status every x minutes and send out notification once it's completed.

we don't need to create a monitor job, the timing of writing into .tasks index is task complete. In that case, what we need is parse the task result to see if there has any errors or failures, and then send out notification accordingly.

xluo-aws · 2022-12-19T02:10:45Z

For reindex, we can create scheduled job in ISM plugin to monitor .task index, if it's a reindex task, we'll create another scheduled job to check task status every x minutes and send out notification once it's completed.

we don't need to create a monitor job, the timing of writing into .tasks index is task complete. In that case, what we need is parse the task result to see if there has any errors or failures, and then send out notification accordingly.

Thanks for pointing it out, I just updated the description.

nknize · 2023-01-18T18:19:07Z

@xluo-aws have you looked into ResourceWatcher and what might be missing to achieve the objective? A ResourceWatcher can be registered through ResourceWatcherService#add which will notify the registered Watcher instance through AbstractResourceWatcher#checkAndNotify at a given Frequency interval (which can be user defined).

xluo-aws · 2023-01-20T01:38:14Z

Nick, Thanks for the suggestion. We are not ware of the resourcewatcher until now but after a quick look at the code it seems we can leverage it to keep checking the operation status until it's completed. This makes publish an event at index operation submit time more convenient. Will do more research and provide an update.

gaobinlong · 2023-02-08T04:12:43Z

We have a new idea about this issue, similar to reindex, we can make all of the other long running operations like shrink/split/clone can be tracked by _tasks API firstly, then we can monitor the .tasks index, when a new long running operation completes or fails, we will send notification to the user. We think this maybe a generic solution and it also has other benefits, I've created another issue about making some long running operations can be tracked by _tasks API, @dblock @nknize could you please help to take a look at this: #6228?

Hailong-am · 2023-02-08T07:08:21Z

We have a new idea about this issue, similar to reindex, we can make all of the other long running operations like shrink/split/clone can be tracked by _tasks API firstly, then we can monitor the .tasks index, when a new long running operation completes or fails, we will send notification to the user. We think this maybe a generic solution and it also has other benefits, I've created another issue about making some long running operations can be tracked by _tasks API, @dblock @nknize could you please help to take a look at this: #6228?

Based on this assumption, we could have a IndexOperationListener watch on .tasks index. Once there has a new document write into this index which means a task has completed, we can parse and extract action from the document and to see whether notification is needed for this action.

To have a IndexOperationListener is a lightweight and clean solution by comparing to use JobScheduler plugin or ResourceWatcher to keep monitor on the long running operation status. There also have some limitations, since the task execution result persist into .tasks index happened when task complete and task informations are in memory, when node restart those information will be lost and no way to track task execution anymore.

xluo-aws · 2023-02-20T02:23:06Z

Close this one: Our final solution is to update long running operation to tasks so we can check task status to find out if the long running operation is completed or not. The change will be done in release 2.7. Ticket number are:
opensearch-project/index-management-dashboards-plugin#615 and opensearch-project/index-management-dashboards-plugin#624

xluo-aws added enhancement Enhancement or improvement to existing feature or request untriaged labels Dec 7, 2022

xuezhou25 added the discuss Issues intended to help drive brainstorming and decision making label Dec 9, 2022

minalsha added extensions and removed untriaged labels Jan 9, 2023

zhichao-aws mentioned this issue Jan 17, 2023

ActionFilter performance test opensearch-project/index-management-dashboards-plugin#575

Closed

xluo-aws mentioned this issue Jan 20, 2023

Research notification on index operation completion approach opensearch-project/index-management-dashboards-plugin#585

Closed

xluo-aws mentioned this issue Feb 3, 2023

Notification integration with index operation feature proposal opensearch-project/index-management-dashboards-plugin#598

Closed

gaobinlong mentioned this issue Feb 8, 2023

Make some long running operations execute asynchronously and can be tracked by _tasks API #6228

Closed

Hailong-am mentioned this issue Feb 9, 2023

[FEATURE] Add user name into threadContext header opensearch-project/security#2432

Closed

xluo-aws closed this as completed Feb 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Publish an event when long running operation complete #5479

[Proposal] Publish an event when long running operation complete #5479

xluo-aws commented Dec 7, 2022 •

edited

Loading

xluo-aws commented Dec 8, 2022

dblock commented Dec 9, 2022

Hailong-am commented Dec 16, 2022

xluo-aws commented Dec 19, 2022

nknize commented Jan 18, 2023

xluo-aws commented Jan 20, 2023

gaobinlong commented Feb 8, 2023

Hailong-am commented Feb 8, 2023

xluo-aws commented Feb 20, 2023

[Proposal] Publish an event when long running operation complete #5479

[Proposal] Publish an event when long running operation complete #5479

Comments

xluo-aws commented Dec 7, 2022 • edited Loading

xluo-aws commented Dec 8, 2022

dblock commented Dec 9, 2022

Hailong-am commented Dec 16, 2022

xluo-aws commented Dec 19, 2022

nknize commented Jan 18, 2023

xluo-aws commented Jan 20, 2023

gaobinlong commented Feb 8, 2023

Hailong-am commented Feb 8, 2023

xluo-aws commented Feb 20, 2023

xluo-aws commented Dec 7, 2022 •

edited

Loading