You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem statement
KNN index performs graph creations during refresh/flush of the index. Graph creation is considered as very expensive operation and can take long time based on different parameters like refresh interval (i.e size of translog).
User don't have visibility around how much time does it take for graph creation. This will provide better visibility to user which can help tune various parameters.
Refresh/flush is not only triggered in background activities but it can be triggered during other operations as well like bulk/recovery/etc. Graph creation can increase the overall latency for these operation as well. These metrics can help us triaging such issues as well like long running bulk/recovery.
What solution would you like?
We should expose the overall latency metrics for graph creation. We can see how better we can expose this as with cumulative metrics or individual metrics for each graph creation.
We can plan to expose it per index or per shard level, as this can change based on data or configuration of index/shard as well.
We should add latency metrics around other KNN operations as well and not just for graph creation, it will provide better visibility in KNN operations.
The text was updated successfully, but these errors were encountered:
Apart from graph creation, what are the other operations where we need latency metrics? Can you please provide some details around that.
Graphs are created per segments of a shard, and where we should put metrics depends on what use case we want to solve. So can you please add some details around what is the exact customer need.
Problem statement
KNN index performs graph creations during refresh/flush of the index. Graph creation is considered as very expensive operation and can take long time based on different parameters like refresh interval (i.e size of translog).
User don't have visibility around how much time does it take for graph creation. This will provide better visibility to user which can help tune various parameters.
Refresh/flush is not only triggered in background activities but it can be triggered during other operations as well like bulk/recovery/etc. Graph creation can increase the overall latency for these operation as well. These metrics can help us triaging such issues as well like long running bulk/recovery.
What solution would you like?
We should expose the overall latency metrics for graph creation. We can see how better we can expose this as with cumulative metrics or individual metrics for each graph creation.
We can plan to expose it per index or per shard level, as this can change based on data or configuration of index/shard as well.
We should add latency metrics around other KNN operations as well and not just for graph creation, it will provide better visibility in KNN operations.
The text was updated successfully, but these errors were encountered: