feat(store/v2): parallel writes in storage sqlite backends #18320

cool-develope · 2023-10-31T20:05:05Z

Description

Closes: #XXXX

Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

confirmed the correct type prefix in the PR title
confirmed ! in the type prefix if API or client breaking change
confirmed all author checklist items have been addressed
reviewed state machine logic
reviewed API design and naming
reviewed documentation is accurate
reviewed tests and test coverage
manually tested (if applicable)

Summary by CodeRabbit

New Features
- Switched to a new SQLite driver for improved database operations.
Tests
- Added TestParallelWrites to verify concurrent write capability.
- Introduced TestParallelWriteAndPruning to test concurrent write and pruning functionality.

coderabbitai · 2023-10-31T20:05:12Z

Walkthrough

The codebase has been updated to switch from the "modernc.org/sqlite" package to the "github.com/mattn/go-sqlite3" package for SQLite database operations. This change is reflected in the import statements and driver names. Additionally, two new test functions have been introduced to verify the concurrent write and pruning capabilities of the database.

Changes

File	Change Summary
`store/storage/sqlite/db.go`	The SQLite driver has been switched from "modernc.org/sqlite" to "github.com/mattn/go-sqlite3". The database connection string has also been updated.
`store/storage/sqlite/db_test.go`	Two new test functions, `TestParallelWrites` and `TestParallelWriteAndPruning`, have been introduced. These tests verify the concurrent write and pruning capabilities of the database.
`store/storage/sqlite/iterator.go`	The import statement for the "modernc.org/sqlite" package has been removed.

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

If you reply to a review comment from CodeRabbit, the bot will automatically respond.
To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
Note: Review comments are made on code diffs or files, not on the PR overview.
Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

github-actions · 2023-10-31T20:05:26Z

@cool-develope your pull request is missing a changelog!

coderabbitai

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between f03b396 and 6886584.

Files ignored due to filter (2)

store/go.mod
store/go.sum

Files selected for processing (3)

store/storage/sqlite/db.go (1 hunks)
store/storage/sqlite/db_test.go (2 hunks)
store/storage/sqlite/iterator.go (1 hunks)

Files skipped from review due to trivial changes (1)

store/storage/sqlite/iterator.go

Additional comments: 2

store/storage/sqlite/db_test.go (1)

2-10: The import statements are well organized and only necessary packages are imported. Good use of the sync package for managing goroutines.

store/storage/sqlite/db.go (1)

8-21: The switch from "modernc.org/sqlite" to "github.com/mattn/go-sqlite3" and the change in the connection string to enable shared cache mode and write-ahead logging (WAL) are good for improving concurrency and performance. However, ensure that the new driver is compatible with the rest of your codebase and that the WAL mode is suitable for your use case. WAL mode can increase disk I/O and may not be suitable if your application has many read transactions compared to write transactions.

New(dataDir string):
Ensure that the dataDir path exists and is writable before trying to open the database file. This can prevent runtime errors.

Close():
Ensure that all database connections are closed before setting db.storage to nil. This can prevent potential data races.

GetLatestVersion(), SetLatestVersion(version uint64), Has(storeKey string, version uint64, key []byte), Get(storeKey string, targetVersion uint64, key []byte), ApplyChangeset(version uint64, cs *store.Changeset), Prune(version uint64), Iterator(storeKey string, version uint64, start, end []byte), ReverseIterator(storeKey string, version uint64, start, end []byte):
These methods are well-written and handle errors properly. However, consider adding context to the error messages to make them more informative.

PrintRowsDebug():
This method is useful for debugging, but it should not be included in production code. Consider using a logging library that can be configured to print debug messages based on the environment or log level.

coderabbitai · 2023-10-31T20:07:30Z

store/storage/sqlite/db_test.go

+func TestParallelWrites(t *testing.T) {
+	db, err := New(t.TempDir())
+	require.NoError(t, err)
+	defer db.Close()
+
+	wg := sync.WaitGroup{}
+
+	// start 10 goroutines that write to the database
+	for i := 0; i < 10; i++ {
+		wg.Add(1)
+		go func(i int) {
+			t.Log("start time", i, time.Now())
+			defer wg.Done()
+			cs := new(store.Changeset)
+			for j := 0; j < 100; j++ {
+				key := fmt.Sprintf("key-%d-%03d", i, j)
+				val := fmt.Sprintf("val-%d-%03d", i, j)
+
+				cs.AddKVPair(store.KVPair{StoreKey: storeKey1, Key: []byte(key), Value: []byte(val)})
+			}
+
+			require.NoError(t, db.ApplyChangeset(uint64(i+1), cs))
+			t.Log("end time", i, time.Now())
+		}(i)
+
+	}
+
+	wg.Wait()
+}


The TestParallelWrites function is well implemented. It correctly uses a WaitGroup to ensure all goroutines finish before the test function returns. However, it's important to note that the test does not verify the correctness of the concurrent writes. It would be beneficial to add a verification step after the wg.Wait() call to ensure that all expected key-value pairs are present in the database.

@cool-develope indeed, for a super rigorous test, you can just run over the keys/values written and ensure we've got 0-N

Looks like humans are talking. I'll be quiet for now. If you need me again, just tag @coderabbitai in a new comment.

coderabbitai · 2023-10-31T20:07:30Z

store/storage/sqlite/db_test.go

+func TestParallelWriteAndPruning(t *testing.T) {
+	db, err := New(t.TempDir())
+	require.NoError(t, err)
+	defer db.Close()
+
+	latestVersion := 100
+
+	wg := sync.WaitGroup{}
+	wg.Add(2)
+	// start a goroutine that write to the database
+	go func() {
+		defer wg.Done()
+		for i := 0; i < latestVersion; i++ {
+			cs := new(store.Changeset)
+			for j := 0; j < 100; j++ {
+				key := fmt.Sprintf("key-%d-%03d", i, j)
+				val := fmt.Sprintf("val-%d-%03d", i, j)
+
+				cs.AddKVPair(store.KVPair{StoreKey: storeKey1, Key: []byte(key), Value: []byte(val)})
+			}
+
+			require.NoError(t, db.ApplyChangeset(uint64(i+1), cs))
+		}
+	}()
+	// start a goroutine that prunes the database
+	go func() {
+		defer wg.Done()
+		for i := 10; i < latestVersion; i += 5 {
+			for {
+				v, err := db.GetLatestVersion()
+				require.NoError(t, err)
+				if v > uint64(i) {
+					t.Log("pruning version", v-1)
+					require.NoError(t, db.Prune(v-1))
+					break
+				}
+			}
+		}
+	}()
+
+	wg.Wait()
+}


The TestParallelWriteAndPruning function is also well implemented. It correctly uses a WaitGroup to ensure all goroutines finish before the test function returns. However, similar to TestParallelWrites, this test does not verify the correctness of the concurrent writes and pruning. It would be beneficial to add a verification step after the wg.Wait() call to ensure that all expected key-value pairs are present in the database and that the pruning was successful.

store/go.mod

alexanderbez

I'm OK with using the CGO variant if it's significantly faster.

store/storage/sqlite/iterator.go

odeke-em

Thank you for this change @cool-develope! I've added some feedback to ensure that you can coordinate concurrency. Also if possible, please ensure that these tests run with -race.

odeke-em · 2023-11-02T23:49:32Z

store/storage/sqlite/db_test.go

+	// start 10 goroutines that write to the database
+	for i := 0; i < 10; i++ {
+		wg.Add(1)
+		go func(i int) {


If this code is super fast, it could all run sequentially and we only have 10 goroutines. What we can however to ensure that all the goroutines write at the exact same time is to create a waiter semaphore channel that makes all the goroutines wait until we are ready for them to trigger the concurrent writes.

triggerStartCh := make(chan bool) for i := 0; i < n; i++ { wg.Add(1) go func(i int) { <-triggerStartCh // Wait until the semaphore to start the work. defer wg.Done() ... }() } // We can now trigger the start as all the goroutines are waiting to start close(triggerStartCh)

tx, good to know!

odeke-em · 2023-11-02T23:56:22Z

store/storage/sqlite/db_test.go

+func TestParallelWrites(t *testing.T) {
+	db, err := New(t.TempDir())
+	require.NoError(t, err)
+	defer db.Close()
+
+	wg := sync.WaitGroup{}
+
+	// start 10 goroutines that write to the database
+	for i := 0; i < 10; i++ {
+		wg.Add(1)
+		go func(i int) {
+			t.Log("start time", i, time.Now())
+			defer wg.Done()
+			cs := new(store.Changeset)
+			for j := 0; j < 100; j++ {
+				key := fmt.Sprintf("key-%d-%03d", i, j)
+				val := fmt.Sprintf("val-%d-%03d", i, j)
+
+				cs.AddKVPair(store.KVPair{StoreKey: storeKey1, Key: []byte(key), Value: []byte(val)})
+			}
+
+			require.NoError(t, db.ApplyChangeset(uint64(i+1), cs))
+			t.Log("end time", i, time.Now())
+		}(i)
+
+	}
+
+	wg.Wait()
+}


@cool-develope indeed, for a super rigorous test, you can just run over the keys/values written and ensure we've got 0-N

store/storage/sqlite/db_test.go

coderabbitai

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 6886584 and e61c9fd.

Files ignored due to filter (2)

store/go.mod
store/go.sum

Files selected for processing (1)

store/storage/sqlite/db_test.go (2 hunks)

Additional comments: 1

store/storage/sqlite/db_test.go (1)

93-141: The TestParallelWrites function looks good. It correctly uses a sync.WaitGroup and a channel to synchronize the start of goroutines. It also correctly checks the written data after all goroutines have finished.

coderabbitai · 2023-11-08T13:31:14Z

store/storage/sqlite/db_test.go

+func TestParallelWriteAndPruning(t *testing.T) {
+	db, err := New(t.TempDir())
+	require.NoError(t, err)
+	defer db.Close()
+
+	latestVersion := 100
+	kvCount := 100
+	prunePeriod := 5
+
+	wg := sync.WaitGroup{}
+	triggerStartCh := make(chan bool)
+
+	// start a goroutine that write to the database
+	wg.Add(1)
+	go func() {
+		<-triggerStartCh
+		defer wg.Done()
+		for i := 0; i < latestVersion; i++ {
+			cs := new(store.Changeset)
+			for j := 0; j < kvCount; j++ {
+				key := fmt.Sprintf("key-%d-%03d", i, j)
+				val := fmt.Sprintf("val-%d-%03d", i, j)
+
+				cs.AddKVPair(store.KVPair{StoreKey: storeKey1, Key: []byte(key), Value: []byte(val)})
+			}
+
+			require.NoError(t, db.ApplyChangeset(uint64(i+1), cs))
+		}
+	}()
+	// start a goroutine that prunes the database
+	wg.Add(1)
+	go func() {
+		<-triggerStartCh
+		defer wg.Done()
+		for i := 10; i < latestVersion; i += prunePeriod {
+			for {
+				v, err := db.GetLatestVersion()
+				require.NoError(t, err)
+				if v > uint64(i) {
+					t.Log("pruning version", v-1)
+					require.NoError(t, db.Prune(v-1))
+					break
+				}
+			}
+		}
+	}()
+
+	// start the goroutines
+	close(triggerStartCh)
+	wg.Wait()
+
+	// check if the data is pruned
+	version := uint64(latestVersion - prunePeriod)
+	val, err := db.Get(storeKey1, version, []byte(fmt.Sprintf("key-%d-%03d", version-1, 0)))
+	require.NoError(t, err)
+	require.Nil(t, val)
+
+	version = uint64(latestVersion)
+	val, err = db.Get(storeKey1, version, []byte(fmt.Sprintf("key-%d-%03d", version-1, 0)))
+	require.NoError(t, err)
+	require.Equal(t, []byte(fmt.Sprintf("val-%d-%03d", version-1, 0)), val)
+}


The TestParallelWriteAndPruning function also correctly uses a sync.WaitGroup and a channel to synchronize the start of goroutines. It tests both writing and pruning operations concurrently. However, there is a potential data race condition in the pruning goroutine. The db.GetLatestVersion() function is called in a loop without any delay, which could lead to a high CPU usage if the latest version is not updated quickly enough by the writing goroutine. Consider adding a small delay or a backoff mechanism in the loop to avoid this issue.

+ import "time" ... for i := 10; i < latestVersion; i += prunePeriod { for { v, err := db.GetLatestVersion() require.NoError(t, err) if v > uint64(i) { t.Log("pruning version", v-1) require.NoError(t, db.Prune(v-1)) break } + time.Sleep(10 * time.Millisecond) // add a small delay } }

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Make sure it correctly replaces the highlighted code, has no missing lines and indentaion issues.

Suggested change

func TestParallelWriteAndPruning(t *testing.T) {

db, err := New(t.TempDir())

require.NoError(t, err)

defer db.Close()

latestVersion := 100

kvCount := 100

prunePeriod := 5

wg := sync.WaitGroup{}

triggerStartCh := make(chan bool)

// start a goroutine that write to the database

wg.Add(1)

go func() {

<-triggerStartCh

defer wg.Done()

for i := 0; i < latestVersion; i++ {

cs := new(store.Changeset)

for j := 0; j < kvCount; j++ {

key := fmt.Sprintf("key-%d-%03d", i, j)

val := fmt.Sprintf("val-%d-%03d", i, j)

cs.AddKVPair(store.KVPair{StoreKey: storeKey1, Key: []byte(key), Value: []byte(val)})

}

require.NoError(t, db.ApplyChangeset(uint64(i+1), cs))

}

}()

// start a goroutine that prunes the database

wg.Add(1)

go func() {

<-triggerStartCh

defer wg.Done()

for i := 10; i < latestVersion; i += prunePeriod {

for {

v, err := db.GetLatestVersion()

require.NoError(t, err)

if v > uint64(i) {

t.Log("pruning version", v-1)

require.NoError(t, db.Prune(v-1))

break

}

}

}

}()

// start the goroutines

close(triggerStartCh)

wg.Wait()

// check if the data is pruned

version := uint64(latestVersion - prunePeriod)

val, err := db.Get(storeKey1, version, []byte(fmt.Sprintf("key-%d-%03d", version-1, 0)))

require.NoError(t, err)

require.Nil(t, val)

version = uint64(latestVersion)

val, err = db.Get(storeKey1, version, []byte(fmt.Sprintf("key-%d-%03d", version-1, 0)))

require.NoError(t, err)

require.Equal(t, []byte(fmt.Sprintf("val-%d-%03d", version-1, 0)), val)

}

import "time"

func TestParallelWriteAndPruning(t *testing.T) {

db, err := New(t.TempDir())

require.NoError(t, err)

defer db.Close()

latestVersion := 100

kvCount := 100

prunePeriod := 5

wg := sync.WaitGroup{}

triggerStartCh := make(chan bool)

// start a goroutine that write to the database

wg.Add(1)

go func() {

<-triggerStartCh

defer wg.Done()

for i := 0; i < latestVersion; i++ {

cs := new(store.Changeset)

for j := 0; j < kvCount; j++ {

key := fmt.Sprintf("key-%d-%03d", i, j)

val := fmt.Sprintf("val-%d-%03d", i, j)

cs.AddKVPair(store.KVPair{StoreKey: storeKey1, Key: []byte(key), Value: []byte(val)})

}

require.NoError(t, db.ApplyChangeset(uint64(i+1), cs))

}

}()

// start a goroutine that prunes the database

wg.Add(1)

go func() {

<-triggerStartCh

defer wg.Done()

for i := 10; i < latestVersion; i += prunePeriod {

for {

v, err := db.GetLatestVersion()

require.NoError(t, err)

if v > uint64(i) {

t.Log("pruning version", v-1)

require.NoError(t, db.Prune(v-1))

break

}

time.Sleep(10 * time.Millisecond) // add a small delay

}

}

}()

// start the goroutines

close(triggerStartCh)

wg.Wait()

// check if the data is pruned

version := uint64(latestVersion - prunePeriod)

val, err := db.Get(storeKey1, version, []byte(fmt.Sprintf("key-%d-%03d", version-1, 0)))

require.NoError(t, err)

require.Nil(t, val)

version = uint64(latestVersion)

val, err = db.Get(storeKey1, version, []byte(fmt.Sprintf("key-%d-%03d", version-1, 0)))

require.NoError(t, err)

require.Equal(t, []byte(fmt.Sprintf("val-%d-%03d", version-1, 0)), val)

}

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between e61c9fd and 363dd76.

Files selected for processing (1)

store/pruning/manager_test.go (2 hunks)

Files not reviewed due to errors (1)

store/pruning/manager_test.go (Error: Server error. Please try again later.)

odeke-em

LGTM, thank you @cool-develope!

alexanderbez · 2023-11-11T22:45:23Z

@cool-develope let's get the conflicting file resolved and get this merged 🙏

wal journal mode

6886584

cool-develope requested a review from a team as a code owner October 31, 2023 20:05

github-actions bot added the C:Store label Oct 31, 2023

github-prbot requested review from a team, tac0turtle and likhita-809 and removed request for a team October 31, 2023 20:05

coderabbitai bot reviewed Oct 31, 2023

View reviewed changes

tac0turtle reviewed Oct 31, 2023

View reviewed changes

store/go.mod Show resolved Hide resolved

tac0turtle assigned kocubinski and alexanderbez Nov 1, 2023

alexanderbez reviewed Nov 1, 2023

View reviewed changes

store/storage/sqlite/iterator.go Show resolved Hide resolved

odeke-em reviewed Nov 3, 2023

View reviewed changes

cool-develope added 3 commits November 8, 2023 08:23

comments

1b09ff3

Merge branch 'main' into feat/store_sqlite_pw

d8bceb9

go mod tidy

e61c9fd

coderabbitai bot reviewed Nov 8, 2023

View reviewed changes

cool-develope requested a review from odeke-em November 8, 2023 13:31

fix tests

363dd76

coderabbitai bot reviewed Nov 8, 2023

View reviewed changes

alexanderbez approved these changes Nov 8, 2023

View reviewed changes

odeke-em approved these changes Nov 8, 2023

View reviewed changes

alexanderbez enabled auto-merge November 11, 2023 22:45

Merge branch 'main' into feat/store_sqlite_pw

2e066ec

alexanderbez added this pull request to the merge queue Nov 13, 2023

Merged via the queue into main with commit 912390d Nov 13, 2023

alexanderbez deleted the feat/store_sqlite_pw branch November 13, 2023 12:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(store/v2): parallel writes in storage sqlite backends #18320

feat(store/v2): parallel writes in storage sqlite backends #18320

cool-develope commented Oct 31, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 31, 2023 •

edited

Loading

Walkthrough

Changes

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

github-actions bot commented Oct 31, 2023

coderabbitai bot left a comment

coderabbitai bot Oct 31, 2023 •

edited

Loading

odeke-em Nov 2, 2023

coderabbitai bot Nov 3, 2023

coderabbitai bot Oct 31, 2023

alexanderbez left a comment

odeke-em left a comment

odeke-em Nov 2, 2023

cool-develope Nov 8, 2023

odeke-em Nov 2, 2023

coderabbitai bot left a comment

coderabbitai bot Nov 8, 2023

coderabbitai bot left a comment

odeke-em left a comment

alexanderbez commented Nov 11, 2023

feat(store/v2): parallel writes in storage sqlite backends #18320

feat(store/v2): parallel writes in storage sqlite backends #18320

Conversation

cool-develope commented Oct 31, 2023 • edited by coderabbitai bot Loading

Description

Author Checklist

Reviewers Checklist

Summary by CodeRabbit

coderabbitai bot commented Oct 31, 2023 • edited Loading

Walkthrough

Changes

Chat with CodeRabbit Bot (@coderabbitai)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

github-actions bot commented Oct 31, 2023

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 31, 2023 • edited Loading

Choose a reason for hiding this comment

odeke-em Nov 2, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 3, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 31, 2023

Choose a reason for hiding this comment

alexanderbez left a comment

Choose a reason for hiding this comment

odeke-em left a comment

Choose a reason for hiding this comment

odeke-em Nov 2, 2023

Choose a reason for hiding this comment

cool-develope Nov 8, 2023

Choose a reason for hiding this comment

odeke-em Nov 2, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 8, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

odeke-em left a comment

Choose a reason for hiding this comment

alexanderbez commented Nov 11, 2023

cool-develope commented Oct 31, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 31, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Configration File (`.coderabbit.yaml`)

coderabbitai bot Oct 31, 2023 •

edited

Loading