-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Executing a SLM policy that is already taking a snapshot times out (504 error code) #45594
Comments
Pinging @elastic/es-core-features |
Can you describe how you reproduced this in more detail? When I run through the same behavior the second execute call returns with the snapshot name, which then fails (as expected) with a warning in the logs and registers it as a failed snapshot in the SLM policy
Repo/policy I'm using:
|
@dakrone The approximate total size of my indices is 22mb and I set my repo to 1kb per second. Here is a sequence of my requests: # Repo configuration
# GET /_snapshot/slow-repo
{
"slow-repo" : {
"type" : "fs",
"settings" : {
"location" : "test",
"max_snapshot_bytes_per_sec" : "1kb"
}
}
}
# Policy configuration
# GET /_slm/policy/slow-test
{
"slow-test" : {
"version" : 1,
"modified_date_millis" : 1565885533675,
"policy" : {
"name" : "slow-test",
"schedule" : "0 0 0 ? * 7",
"repository" : "slow-repo",
"config" : { }
},
"next_execution_millis" : 1566000000000
}
}
# Executing policy the first time
# PUT /_slm/policy/slow-test/_execute
{
"snapshot_name" : "slow-test-_wl2wwbpseudgpqlbosnnq"
}
# Checking policy information after executing - in progress information is listed
# GET /_slm/policy/slow-test
{
"slow-test" : {
"version" : 1,
"modified_date_millis" : 1565885533675,
"policy" : {
"name" : "slow-test",
"schedule" : "0 0 0 ? * 7",
"repository" : "slow-repo",
"config" : { }
},
"last_success" : {
"snapshot_name" : "slow-test-_wl2wwbpseudgpqlbosnnq",
"time" : 1565885694939
},
"next_execution_millis" : 1566000000000,
"in_progress" : {
"name" : "slow-test-_wl2wwbpseudgpqlbosnnq",
"uuid" : "9kEheisWQ5i_4IlT0AamMg",
"state" : "STARTED",
"start_time_millis" : 1565885694668
}
}
}
# Executing policy the second time time - it times out
# PUT /_slm/policy/slow-test/_execute
{
"statusCode": 504,
"error": "Gateway Time-out",
"message": "Client request timeout"
} The failure from the second execution does not show up in in policy information or ES logs until I delete the snapshot that is currently in progress. |
@dakrone I think might now what's going on here. |
* Executing SLM policies on the snapshot thread will block until a snapshot finishes if the pool is completely busy executing that snapshot * Fixes elastic#45594
@jen-huang I opened #45727 with a suggested fix. If you still have the reproducer setup ready for this, feel free to try it out :) |
* Executing SLM policies on the snapshot thread will block until a snapshot finishes if the pool is completely busy executing that snapshot * Fixes #45594
…45727) * Executing SLM policies on the snapshot thread will block until a snapshot finishes if the pool is completely busy executing that snapshot * Fixes elastic#45594
Steps
The text was updated successfully, but these errors were encountered: