Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dropping database under write load #4580

Merged
merged 9 commits into from
Oct 26, 2015
Merged

Fix dropping database under write load #4580

merged 9 commits into from
Oct 26, 2015

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Oct 26, 2015

Fixes a number of issues that occur when dropping a database under write load.

Initially, the changes were to prevent panics that occurred with trying to create a new WAL segment in a DB dir that had been removed. When those were fixed, a full DB deadlock occurred due to WAL locks not being released when errors were returned. When those were fixed, shards were getting re-created on writes because we deleted the DB state before updating the meta-store which cause the PointsWriter to fall into the case where it should create a shard that does not exist locally. This also uncovered that shard references were not fully removed from the tsdb.Store when a database was dropped which cause spurious errors in the logs due to the long-running periodic maintenance task goroutines. Finally, the WAL was not closed when a shard was closed which also led to spurious errors in the logs.

Fixes #4538

If a drop database is executed while writes are in flight, a panic
could occur because the WAL would fail to write to the DB dirs where
had been removed.

Partil fix for #4538
If an error occurred in this code path, the locks would not be released.
If a database is dropped, the WAL maintenance goroutines could still
kick in an fail becase the DB dirs are gone.
The shards map still held a reference to a shard that was dropped
which caused the periodic mainteance task to report errors continuously.
Prevents a race where shards are recreated after a database is dropped.
When a database is dropped, removing old segments returns an error
because the files are already gone.  Using RemoveAll handles this
case more gracefully.
@pauldix
Copy link
Member

pauldix commented Oct 26, 2015

+1

jwilder added a commit that referenced this pull request Oct 26, 2015
Fix dropping database under write load
@jwilder jwilder merged commit 68c2b6e into master Oct 26, 2015
@jwilder jwilder deleted the jw-4538 branch October 26, 2015 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[0.9.5 nightly-ff997a7] Dropping database under a write load causes panics
2 participants