Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Profile database queries #10588

Closed
Tracked by #10558
osmaczko opened this issue May 9, 2023 · 13 comments
Closed
Tracked by #10558

[Performance] Profile database queries #10588

osmaczko opened this issue May 9, 2023 · 13 comments
Assignees
Milestone

Comments

@osmaczko
Copy link
Contributor

osmaczko commented May 9, 2023

Description

Log all query times at the database layer (inside vendor/..)

part of: #10558

@osmaczko
Copy link
Contributor Author

osmaczko commented May 9, 2023

Code used for profiling: #10588 (comment)

@osmaczko
Copy link
Contributor Author

osmaczko commented May 9, 2023

Scenario: Reactivating an application after an extended period of inactivity, transitioning between multiple channels, and allowing each to load completely. This process is executed using a shared development account, which results in a substantial amount of data for each channel.

CSV profiling data: database-perf-1683634745.csv, the separator is µ.

@osmaczko
Copy link
Contributor Author

osmaczko commented May 9, 2023

Profiling results:

  1. All queries sorted by duration: (query, args, duration in nanosecs, duration in ms)

image

  1. Unique queries sorted by count: (query, count)

image

  1. Unique queries sorted by total duration: (query, duration in nanosecs, duration in ms)

image

@jrainville
Copy link
Member

Is the first line "mark all as read"? If so, 7 seconds for that is way too long. It's probably missing an index.

@cammellos
Copy link
Contributor

Is the first line "mark all as read"? If so, 7 seconds for that is way too long. It's probably missing an index.

we checked, it's covered by an index to some extent (local_chat_id, seen), it might be that it has to update many messages, that would happen if you have a lot of unread messages

Do discord messages count as seen/unseen? (we should make them seen if they start as unseen, unseen messages are more expensive to handle)

@jrainville
Copy link
Member

Do discord messages count as seen/unseen? (we should make them seen if they start as unseen, unseen messages are more expensive to handle)

Good question. I agree that they should start as seen. Any idea @0x-r4bbit ?

@osmaczko
Copy link
Contributor Author

osmaczko commented May 9, 2023

Do discord messages count as seen/unseen? (we should make them seen if they start as unseen, unseen messages are more expensive to handle)

Good question. I agree that they should start as seen. Any idea @0x-r4bbit ?

I will check that directly in fresh shared development account database in a sec.

@osmaczko
Copy link
Contributor Author

osmaczko commented May 9, 2023

I will check that directly in fresh shared development account database in a sec.

SELECT COUNT(*) FROM user_messages WHERE seen=0 AND discord_message_id!="" 

46223

Yeah, it seems discord messages are not marked as seen when importing.

@jrainville
Copy link
Member

Good find! That is an easy win for performance straight after an import.

@iurimatias iurimatias added this to the 0.13 milestone May 10, 2023
@osmaczko
Copy link
Contributor Author

I did a queries profiling session post #10600.

The queries total time seems alright now.

image

The first three entries are migrations. The next two entries are suspicious, but combined, they only took 4 seconds while the app ran for approximately 10 minutes, so that's negligible. Despite this, the app was still extremely laggy. I suspect we are doing something incorrect on the client side, as database queries don't block as much as we anticipated. I need to profile the app using Callgrind to see where it spends most of its time.

@caybro
Copy link
Member

caybro commented May 10, 2023

Despite this, the app was still extremely laggy. I suspect we are doing something incorrect on the client side, as database queries don't block as much as we anticipated. I need to profile the app using Callgrind to see where it spends most of its time

Well, I used hotspot and it's quite obvious where we spend most of the time, in the sha1_compress function (cf #10558 (comment)). I'm curious to see whether Callgrind provides different data

@osmaczko
Copy link
Contributor Author

Updated code for profiling that considers multiple open connections:

  • added csvFileMutex
  • fixed logger.Complete to handle all scenarios when rows are completed
diff --git a/vendor/github.com/mutecomm/go-sqlcipher/sqlite3.go b/vendor/github.com/mutecomm/go-sqlcipher/sqlite3.go
index da501aa72..b8067aa05 100644
--- a/vendor/github.com/mutecomm/go-sqlcipher/sqlite3.go
+++ b/vendor/github.com/mutecomm/go-sqlcipher/sqlite3.go
@@ -82,9 +82,11 @@ import (
 	"fmt"
 	"io"
 	"net/url"
+	"os"
 	"runtime"
 	"strconv"
 	"strings"
+	"sync"
 	"time"
 	"unsafe"
 
@@ -109,6 +111,34 @@ var SQLiteTimestampFormats = []string{
 	"2006-01-02",
 }
 
+var csvFile *os.File
+var csvFileMutex sync.Mutex
+var once sync.Once
+var err error
+
+type PerfLogger struct {
+	query string
+	args  []namedValue
+	start time.Time
+}
+
+func (pf *PerfLogger) Complete() {
+	duration := time.Since(pf.start)
+	fmt.Println("--- PerfLogger ---", pf.query, pf.args, duration.Nanoseconds())
+
+	query := strings.ReplaceAll(pf.query, "\n", " ")
+	args := strings.ReplaceAll(fmt.Sprint(pf.args), "\n", " ")
+
+	line := fmt.Sprintf("%sµ%sµ%d\n", query, args, duration.Nanoseconds())
+	csvFileMutex.Lock()
+	defer csvFileMutex.Unlock()
+	_, err = csvFile.Write([]byte(line))
+	if err != nil {
+		fmt.Println("--- PerfLogger ---", "could not write to csv", err)
+		return
+	}
+}
+
 func init() {
 	sql.Register("sqlite3", &SQLiteDriver{})
 }
@@ -143,6 +173,7 @@ type SQLiteStmt struct {
 	c      *SQLiteConn
 	s      *C.sqlite3_stmt
 	t      string
+	_query string
 	closed bool
 	cls    bool
 }
@@ -336,6 +367,14 @@ func errorString(err Error) string {
 //   _pragma_cipher_page_size=XXX
 //     Set the PRAGMA cipher_page_size to adjust the page size.
 func (d *SQLiteDriver) Open(dsn string) (driver.Conn, error) {
+	once.Do(func() {
+		csvFile, err = os.Create("database-perf-" + fmt.Sprint(time.Now().Unix()) + ".csv")
+		if err != nil {
+			fmt.Println("--- PerfLogger ---", "Can't create database perf csv:", err)
+		}
+		runtime.SetFinalizer(csvFile, (*os.File).Close)
+	})
+
 	if C.sqlite3_threadsafe() == 0 {
 		return nil, errors.New("sqlite library was not compiled for thread-safe operation")
 	}
@@ -510,7 +549,7 @@ func (c *SQLiteConn) prepare(ctx context.Context, query string) (driver.Stmt, er
 	if tail != nil && *tail != '\000' {
 		t = strings.TrimSpace(C.GoString(tail))
 	}
-	ss := &SQLiteStmt{c: c, s: s, t: t}
+	ss := &SQLiteStmt{c: c, s: s, t: t, _query: query}
 	runtime.SetFinalizer(ss, (*SQLiteStmt).Close)
 	return ss, nil
 }
@@ -609,6 +648,12 @@ func (s *SQLiteStmt) Query(args []driver.Value) (driver.Rows, error) {
 }
 
 func (s *SQLiteStmt) query(ctx context.Context, args []namedValue) (driver.Rows, error) {
+	logger := PerfLogger{
+		query: s._query,
+		args:  args,
+		start: time.Now(),
+	}
+
 	if err := s.bind(args); err != nil {
 		return nil, err
 	}
@@ -623,6 +668,8 @@ func (s *SQLiteStmt) query(ctx context.Context, args []namedValue) (driver.Rows,
 	}
 
 	go func(db *C.sqlite3) {
+		defer logger.Complete()
+
 		select {
 		case <-ctx.Done():
 			select {
@@ -661,6 +708,13 @@ func (s *SQLiteStmt) Exec(args []driver.Value) (driver.Result, error) {
 }
 
 func (s *SQLiteStmt) exec(ctx context.Context, args []namedValue) (driver.Result, error) {
+	logger := PerfLogger{
+		query: s._query,
+		args:  args,
+		start: time.Now(),
+	}
+	defer logger.Complete()
+
 	if err := s.bind(args); err != nil {
 		C.sqlite3_reset(s.s)
 		C.sqlite3_clear_bindings(s.s)

@osmaczko
Copy link
Contributor Author

Results of profiling with enabled multiple connections and fixed logger:

ColumnA: query
ColumnC: total time in ms
ColumnD: quires count

green ones are the migrations and can be ignored

image

@osmaczko osmaczko changed the title Profile database queries [Performance] Profile database queries May 23, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Status Desktop/Mobile Board May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

5 participants