Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport #25076 to 2.7 #25118

Closed
gwossum opened this issue Jun 28, 2024 · 0 comments
Closed

Backport #25076 to 2.7 #25118

gwossum opened this issue Jun 28, 2024 · 0 comments
Assignees
Labels
area/storage area/2.x OSS 2.0 related issues and PRs team/edge

Comments

@gwossum
Copy link
Member

gwossum commented Jun 28, 2024

Backport #25076 (retention service hang fix for main-2.x) to 2.7

@gwossum gwossum added area/storage area/2.x OSS 2.0 related issues and PRs team/edge labels Jun 28, 2024
@gwossum gwossum self-assigned this Jun 28, 2024
gwossum added a commit that referenced this issue Jun 28, 2024
Fix issue that can cause the retention service to hang waiting on a
`Shard.Close` call. When this occurs, no other shards will be deleted
by the retention service. This is usually noticed as an increase in
disk usage because old shards are not cleaned up.

The fix adds to new methods to `Store`, `SetShardNewReadersBlocked`
and `InUse`. `InUse` can be used to poll if a shard has active readers,
which the retention service uses to skip over in-use shards to prevent
the service from hanging. `SetShardNewReadersBlocked` determines if
new read access may be granted to a shard. This is required to prevent
race conditions around the use of `InUse` and the deletion of shards.

If the retention service skips over a shard because it is in-use, the
shard will be checked again the next time the retention service is run.
It can be deleted on subsequent checks if it is no longer in-use. If
the shards is stuck in-use, the retention service will not be able to
delete the shards, which can be observed in the logs for manual
intervention. Other shards can still be deleted by the retention service
even if a shard is stuck with readers.

This is a port of ad68ec8 from master-1.x to main-2.x.

closes: #25118
(cherry picked from commit b4bd607)
(cherry picked from commit cb8cfe3)
gwossum added a commit that referenced this issue Jul 1, 2024
Fix issue that can cause the retention service to hang waiting on a
`Shard.Close` call. When this occurs, no other shards will be deleted
by the retention service. This is usually noticed as an increase in
disk usage because old shards are not cleaned up.

The fix adds to new methods to `Store`, `SetShardNewReadersBlocked`
and `InUse`. `InUse` can be used to poll if a shard has active readers,
which the retention service uses to skip over in-use shards to prevent
the service from hanging. `SetShardNewReadersBlocked` determines if
new read access may be granted to a shard. This is required to prevent
race conditions around the use of `InUse` and the deletion of shards.

If the retention service skips over a shard because it is in-use, the
shard will be checked again the next time the retention service is run.
It can be deleted on subsequent checks if it is no longer in-use. If
the shards is stuck in-use, the retention service will not be able to
delete the shards, which can be observed in the logs for manual
intervention. Other shards can still be deleted by the retention service
even if a shard is stuck with readers.

This is a port of ad68ec8 from master-1.x to main-2.x.

closes: #25118
(cherry picked from commit b4bd607)
(cherry picked from commit cb8cfe3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage area/2.x OSS 2.0 related issues and PRs team/edge
Projects
None yet
Development

No branches or pull requests

2 participants