Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Update remote store stats api documentation with latest segments and translog fields #4904

Closed
2 of 4 tasks
ashking94 opened this issue Aug 28, 2023 · 6 comments · Fixed by #4995 or #5107
Closed
2 of 4 tasks
Assignees
Labels
3 - Done Issue is done/complete v2.10.0
Milestone

Comments

@ashking94
Copy link
Member

ashking94 commented Aug 28, 2023

What do you want to do?

  • Request a change to existing documentation
  • Add new documentation
  • Report a technical problem with the documentation
  • Other

Tell us about your request. Provide a summary of the request and all versions that are affected.
Corresponding to -

What other resources are available? Provide links to related issues, POCs, steps for testing, etc.

@ashking94
Copy link
Member Author

@hdhalter
Copy link
Contributor

@Naarcha-AWS - Is there something more to do with this issue for 2.10?

@hdhalter
Copy link
Contributor

@Naarcha-AWS - Is this ok to close?

@Naarcha-AWS
Copy link
Collaborator

This is okay to close, unless @ashking94, you have any additional information I should add to the page you linked.

@ashking94
Copy link
Member Author

Remote Store Stats API [Proposed Doc change]

Use the Remote Store Stats API to monitor shard-level remote store performance. Metrics are only relevant if the index is remote store backed.

For an aggregated output on an index, node or cluster level, use the Index Stats, Nodes Stats or Cluster Stats API respectively

Path and HTTP methods

GET _remotestore/stats/<index_name>
GET _remotestore/stats/<index_name>/<shard_id>

Path parameters

The following table lists the available path parameters. All path parameters are optional.

Parameter Type Description
index_name String The index name or index pattern.
--- --- ---
shard_id String The shard ID.

Remote store stats for an index

Use the following API to get remote store statistics for all shards of an index.

Example request

GET _remotestore/stats/<index_name>

Example response

{
    "_shards": {
        "total": 4,
        "successful": 4,
        "failed": 0
    },
    "indices": {
        "remote-index": {
            "shards": {
                "0": [{
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "q1VxWZnCTICrfRc2bRW3nw"
                        },
                        "segment": {
                            "download": {},
                            "upload": {
                                "local_refresh_timestamp_in_millis": 1694171634102,
                                "remote_refresh_timestamp_in_millis": 1694171634102,
                                "refresh_time_lag_in_millis": 0,
                                "refresh_lag": 0,
                                "bytes_lag": 0,
                                "backpressure_rejection_count": 0,
                                "consecutive_failure_count": 0,
                                "total_uploads": {
                                    "started": 5,
                                    "succeeded": 5,
                                    "failed": 0
                                },
                                "total_upload_size": {
                                    "started_bytes": 15342,
                                    "succeeded_bytes": 15342,
                                    "failed_bytes": 0
                                },
                                "remote_refresh_size_in_bytes": {
                                    "last_successful": 0,
                                    "moving_avg": 3068.4
                                },
                                "upload_speed_in_bytes_per_sec": {
                                    "moving_avg": 99988.2
                                },
                                "remote_refresh_latency_in_millis": {
                                    "moving_avg": 44.0
                                }
                            }
                        },
                        "translog": {
                            "upload": {
                                "last_successful_upload_timestamp": 1694171633644,
                                "total_uploads": {
                                    "started": 6,
                                    "failed": 0,
                                    "succeeded": 6
                                },
                                "total_upload_size": {
                                    "started_bytes": 1932,
                                    "failed_bytes": 0,
                                    "succeeded_bytes": 1932
                                },
                                "total_upload_time_in_millis": 21478,
                                "upload_size_in_bytes": {
                                    "moving_avg": 322.0
                                },
                                "upload_speed_in_bytes_per_sec": {
                                    "moving_avg": 2073.8333333333335
                                },
                                "upload_time_in_millis": {
                                    "moving_avg": 3579.6666666666665
                                }
                            },
                            "download": {}
                        }
                    },
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": false,
                            "node": "EZuen5Y5Sv-eDCLwh9gv-Q"
                        },
                        "segment": {
                            "download": {
                                "last_sync_timestamp": 1694171634148,
                                "total_download_size": {
                                    "started_bytes": 15112,
                                    "succeeded_bytes": 15112,
                                    "failed_bytes": 0
                                },
                                "download_size_in_bytes": {
                                    "last_successful": 2910,
                                    "moving_avg": 1259.3333333333333
                                },
                                "download_speed_in_bytes_per_sec": {
                                    "moving_avg": 382387.3333333333
                                }
                            },
                            "upload": {}
                        },
                        "translog": {
                            "upload": {},
                            "download": {}
                        }
                    }
                ],
                "1": [{
                        "routing": {
                            "state": "STARTED",
                            "primary": false,
                            "node": "q1VxWZnCTICrfRc2bRW3nw"
                        },
                        "segment": {
                            "download": {
                                "last_sync_timestamp": 1694171633181,
                                "total_download_size": {
                                    "started_bytes": 18978,
                                    "succeeded_bytes": 18978,
                                    "failed_bytes": 0
                                },
                                "download_size_in_bytes": {
                                    "last_successful": 325,
                                    "moving_avg": 1265.2
                                },
                                "download_speed_in_bytes_per_sec": {
                                    "moving_avg": 456047.6666666667
                                }
                            },
                            "upload": {}
                        },
                        "translog": {
                            "upload": {},
                            "download": {}
                        }
                    },
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "EZuen5Y5Sv-eDCLwh9gv-Q"
                        },
                        "segment": {
                            "download": {},
                            "upload": {
                                "local_refresh_timestamp_in_millis": 1694171633122,
                                "remote_refresh_timestamp_in_millis": 1694171633122,
                                "refresh_time_lag_in_millis": 0,
                                "refresh_lag": 0,
                                "bytes_lag": 0,
                                "backpressure_rejection_count": 0,
                                "consecutive_failure_count": 0,
                                "total_uploads": {
                                    "started": 6,
                                    "succeeded": 6,
                                    "failed": 0
                                },
                                "total_upload_size": {
                                    "started_bytes": 19208,
                                    "succeeded_bytes": 19208,
                                    "failed_bytes": 0
                                },
                                "remote_refresh_size_in_bytes": {
                                    "last_successful": 0,
                                    "moving_avg": 3201.3333333333335
                                },
                                "upload_speed_in_bytes_per_sec": {
                                    "moving_avg": 109612.0
                                },
                                "remote_refresh_latency_in_millis": {
                                    "moving_avg": 25.333333333333332
                                }
                            }
                        },
                        "translog": {
                            "upload": {
                                "last_successful_upload_timestamp": 1694171633106,
                                "total_uploads": {
                                    "started": 7,
                                    "failed": 0,
                                    "succeeded": 7
                                },
                                "total_upload_size": {
                                    "started_bytes": 2405,
                                    "failed_bytes": 0,
                                    "succeeded_bytes": 2405
                                },
                                "total_upload_time_in_millis": 27748,
                                "upload_size_in_bytes": {
                                    "moving_avg": 343.57142857142856
                                },
                                "upload_speed_in_bytes_per_sec": {
                                    "moving_avg": 1445.857142857143
                                },
                                "upload_time_in_millis": {
                                    "moving_avg": 3964.0
                                }
                            },
                            "download": {}
                        }
                    }
                ]
            }
        }
    }
}

Response fields

The response is categorized into 3 parts:

  • routing : Contains information related to the shard’s routing
  • segment : Contains stats related to segment transfers to and from the remote store
  • translog : Contains stats related to translog transfers to and from the remote store

Should go inside an info section:

Because of the way Opensearch functions, the visible metrics might differ across shard types.

For instance, segment download stats will be empty for a primary shard copy since primary shard copies does not need to download any segments from the remote store. On the other hand, replica shard copies will have an empty uploads section since replicas do not need to upload any segments to the remote store.

Translog stats will always be populated only for the primary shard copies because only the primary shards participate in translog uploads and recovery from translogs downloaded from the remote store.

The routing object contains the following fields:

Field Description
state Routing state of the shard. Possible values are: UNASSIGNED, INITIALIZING, STARTED, RELOCATING
--- ---
primary Denotes if the shard copy is primary or not
node Node name to which the shard is assigned to

The segment.upload object contains the following fields:

Field Description
local_refresh_timestamp_in_millis Last successful local refresh timestamp (in milliseconds)
--- ---
remote_refresh_timestamp_in_millis Last successful remote refresh timestamp (in milliseconds)
refresh_time_lag_in_millis The time (in milliseconds) the remote refresh is behind the local refresh.
refresh_lag The number of refreshes by which the remote store is lagging behind the local store.
bytes_lag The bytes lag between the remote and local store.
backpressure_rejection_count The total number of write rejections made because of remote store backpressure.
consecutive_failure_count The number of consecutive remote refresh failures since the last success.
total_remote_refresh The total number of remote refreshes.
total_uploads_in_bytes The total number of bytes in all uploads to the remote store.
remote_refresh_size_in_bytes.last_successful The size of data uploaded in the last successful refresh.
remote_refresh_size_in_bytes.moving_avg The average size of data (in bytes) uploaded in the last N refreshes. N is defined in
remote_store.moving_average_window_size. For details, see Remote segment backpressure.
upload_latency_in_bytes_per_sec.moving_avg The average speed of remote store segment uploads (in bytes per second) for the last N uploads. N is defined in
remote_store.moving_average_window_size. For details, see Remote segment backpressure.
remote_refresh_latency_in_millis.moving_avg The average time taken by a single remote refresh during the last N remote refreshes. N is defined in
remote_store.moving_average_window_size. For details, see Remote segment backpressure.

The segment.download object contains the following fields:

Field Description
last_sync_timestamp Timestamp in epoch millis for the last successful segement file download from the remote store
--- ---
total_download_size.started_bytes Total bytes of segment files attempted to be downloaded from the remote store
total_download_size.succeeded_bytes Total bytes of segment files successfully downloaded from the remote store
total_download_size.failed_bytes Total bytes of segment files failed to be downloaded from the remote store
download_size_in_bytes.last_successful Size of the last successful segment file downloaded from the remote store
download_size_in_bytes.moving_avg The average size of segement data (in bytes) downloaded in the last 20 downloads.
download_speed_in_bytes_per_sec.moving_avg The average download speed (in bytes/sec) for the last 20 downloads.

The translog.upload object contains the following fields:

Field Description
last_successful_upload_timestamp Timestamp in epoch millis for the last successful translog file upload to the remote store
--- ---
total_uploads.started Total number of attempted translog upload syncs to the remote store
total_uploads.failed Total number of failed translog upload syncs to the remote store
total_uploads.succeeded Total number of succeeded translog upload syncs to the remote store
total_upload_size.started_bytes Total bytes of translog files attempted to be uploaded to the remote store
total_upload_size.succeeded_bytes Total bytes of translog files successfully uploaded to the remote store
total_upload_size.failed_bytes Total bytes of translog files failed to be uploaded to the remote store
total_upload_time_in_millis Total time spent in translog uploads to the remote store
upload_size_in_bytes.moving_avg The average size of translog data (in bytes) uploaded in the last N downloads. N is defined in
remote_store.moving_average_window_size.
upload_speed_in_bytes_per_sec.moving_avg The average speed of remote store translog uploads (in bytes per second) for the last N uploads. N is defined in
remote_store.moving_average_window_size.
upload_time_in_millis.moving_avg The average time taken by a single translog upload (in milliseconds) for the last N uploads. N is defined in
remote_store.moving_average_window_size.

The translog.download object contains the following fields:

Field Description
last_successful_download_timestamp Timestamp in epoch millis for the last successful translog file download from the remote store
--- ---
total_downloads.succeeded Total number of successful translog download syncs from the remote store
total_download_size.succeeded_bytes Total bytes of translog files successfully download from the remote store
total_download_time_in_millis Total time spent in translog downloads from the remote store
download_size_in_bytes.moving_avg The average size of translog data (in bytes) downloaded in the last N downloads. N is defined in
remote_store.moving_average_window_size.
download_speed_in_bytes_per_sec.moving_avg The average speed of remote store translog downloads (in bytes per second) for the last N downloads. N is defined in
remote_store.moving_average_window_size.
download_time_in_millis.moving_avg The average time taken by a single translog download (in milliseconds) for the last N downloads. N is defined in
remote_store.moving_average_window_size.

Remote store stats for a single shard

Use the following API to get remote store statistics for a single shard.

Example request

GET _remotestore/stats/<index_name>/<shard_id>

Example response

{
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    },
    "indices": {
        "remote-index": {
            "shards": {
                "0": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "q1VxWZnCTICrfRc2bRW3nw"
                        },
                        "segment": {
                            "download": {},
                            "upload": {
                                "local_refresh_timestamp_in_millis": 1694171634102,
                                "remote_refresh_timestamp_in_millis": 1694171634102,
                                "refresh_time_lag_in_millis": 0,
                                "refresh_lag": 0,
                                "bytes_lag": 0,
                                "backpressure_rejection_count": 0,
                                "consecutive_failure_count": 0,
                                "total_uploads": {
                                    "started": 5,
                                    "succeeded": 5,
                                    "failed": 0
                                },
                                "total_upload_size": {
                                    "started_bytes": 15342,
                                    "succeeded_bytes": 15342,
                                    "failed_bytes": 0
                                },
                                "remote_refresh_size_in_bytes": {
                                    "last_successful": 0,
                                    "moving_avg": 3068.4
                                },
                                "upload_speed_in_bytes_per_sec": {
                                    "moving_avg": 99988.2
                                },
                                "remote_refresh_latency_in_millis": {
                                    "moving_avg": 44.0
                                }
                            }
                        },
                        "translog": {
                            "upload": {
                                "last_successful_upload_timestamp": 1694171633644,
                                "total_uploads": {
                                    "started": 6,
                                    "failed": 0,
                                    "succeeded": 6
                                },
                                "total_upload_size": {
                                    "started_bytes": 1932,
                                    "failed_bytes": 0,
                                    "succeeded_bytes": 1932
                                },
                                "total_upload_time_in_millis": 21478,
                                "upload_size_in_bytes": {
                                    "moving_avg": 322.0
                                },
                                "upload_speed_in_bytes_per_sec": {
                                    "moving_avg": 2073.8333333333335
                                },
                                "upload_time_in_millis": {
                                    "moving_avg": 3579.6666666666665
                                }
                            },
                            "download": {}
                        }
                    },
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": false,
                            "node": "EZuen5Y5Sv-eDCLwh9gv-Q"
                        },
                        "segment": {
                            "download": {
                                "last_sync_timestamp": 1694171634148,
                                "total_download_size": {
                                    "started_bytes": 15112,
                                    "succeeded_bytes": 15112,
                                    "failed_bytes": 0
                                },
                                "download_size_in_bytes": {
                                    "last_successful": 2910,
                                    "moving_avg": 1259.3333333333333
                                },
                                "download_speed_in_bytes_per_sec": {
                                    "moving_avg": 382387.3333333333
                                }
                            },
                            "upload": {}
                        },
                        "translog": {
                            "upload": {},
                            "download": {}
                        }
                    }
                ]
            }
        }
    }
}

Remote store stats for local shards

Provide the local query parameter set to true to only fetch the shards present on the node that is serving the request:

GET _remotestore/stats/<index_name>?local=true

@ashking94
Copy link
Member Author

ashking94 commented Sep 25, 2023

@Naarcha-AWS I can attempt to do this by tomorrow IST hours if you do not get the time.

ashking94 added a commit to ashking94/documentation-website that referenced this issue Sep 26, 2023
Naarcha-AWS added a commit that referenced this issue Sep 28, 2023
* Update remote store stats api documentation 

This closes #4904

Signed-off-by: Ashish <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Update remote-store-stats-api.md

* Apply suggestions from code review

Co-authored-by: Chris Moore <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Ashish <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Naarcha-AWS <[email protected]>
Co-authored-by: Chris Moore <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
opensearch-trigger-bot bot pushed a commit that referenced this issue Sep 28, 2023
* Update remote store stats api documentation

This closes #4904

Signed-off-by: Ashish <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Update remote-store-stats-api.md

* Apply suggestions from code review

Co-authored-by: Chris Moore <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Ashish <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Naarcha-AWS <[email protected]>
Co-authored-by: Chris Moore <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
(cherry picked from commit 9242b88)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Naarcha-AWS added a commit that referenced this issue Sep 28, 2023
* Update remote store stats api documentation

This closes #4904



* Apply suggestions from code review



* Update remote-store-stats-api.md

* Apply suggestions from code review




* Apply suggestions from code review




---------






(cherry picked from commit 9242b88)

Signed-off-by: Ashish <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Naarcha-AWS <[email protected]>
Co-authored-by: Chris Moore <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
vagimeli pushed a commit that referenced this issue Oct 13, 2023
* Update remote store stats api documentation

This closes #4904

Signed-off-by: Ashish <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Update remote-store-stats-api.md

* Apply suggestions from code review

Co-authored-by: Chris Moore <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Ashish <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Naarcha-AWS <[email protected]>
Co-authored-by: Chris Moore <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
harshavamsi pushed a commit to harshavamsi/documentation-website that referenced this issue Oct 31, 2023
* Update remote store stats api documentation 

This closes opensearch-project#4904

Signed-off-by: Ashish <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Update remote-store-stats-api.md

* Apply suggestions from code review

Co-authored-by: Chris Moore <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Ashish <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Naarcha-AWS <[email protected]>
Co-authored-by: Chris Moore <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
vagimeli pushed a commit that referenced this issue Dec 21, 2023
* Update remote store stats api documentation 

This closes #4904

Signed-off-by: Ashish <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Update remote-store-stats-api.md

* Apply suggestions from code review

Co-authored-by: Chris Moore <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Ashish <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Naarcha-AWS <[email protected]>
Co-authored-by: Chris Moore <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment