Events: Migrating from v1 to v2 is slow #11056

Stebalien · 2023-07-06T16:51:23Z

We should:

Figure out what is slow. We might just be missing some indices, or maybe we need smaller transactions?
Parallelize.
Consider doing this migration in the background? This will be kind of annoying but there's no reason it can't be done.

arajasek · 2023-07-10T16:28:00Z

My guess is we're spending most of our time walking up the chain and computing TSK CIDs for heights with no entries in the DB.

Proposal:

Identify heights that actually have entries in the events DB
chainStore.GetTipsetByHeight() those heights
if tsk doesn't match, stmtDeleteOffChainEvent them
if not, select-and-delete?

Stebalien · 2023-07-10T20:22:51Z

My guess is we're spending most of our time walking up the chain and computing TSK CIDs for heights with no entries in the DB.

Not a second per tipset, unless that's an exaggeration.

Proposal:

That's going to repeatedly walk the chain, right? Although I guess the cache should help. But I'm pretty sure we have events at most heights.

arajasek · 2023-07-10T20:28:26Z

@Stebalien

My guess is we're spending most of our time walking up the chain and computing TSK CIDs for heights with no entries in the DB.

Not a second per tipset, unless that's an exaggeration.

I ran some quick diagnostics, it was pretty consistently 0.5-1 seconds, and the vast majority of them had no events. My guess is that it's because we run 2 statements on every epoch (the delete off-chain and the select), even though most are no-ops.

Proposal:

That's going to repeatedly walk the chain, right? Although I guess the cache should help. But I'm pretty sure we have events at most heights.

I was proposing doing so using the database itself, not the chain. Writing some sort of SELECT heights FROM event THAT ACTUALLY HAVE ENTRIES WE WANT TO MIGRATE / DELETE, or whatever that is in database speak.

Stebalien · 2023-07-10T20:46:16Z

Ah, I see. So one query to find the relevent epochs? Yeah, that should work. Also, it really sounds like we don't have the right indices. These statements should take milliseconds.

arajasek · 2023-07-10T21:29:46Z

Here's a commit that adds some logging, as well as a shed command that begins the migration for easy reproducing.

Command: lotus-shed eth migrate-events

Just eyeballing some results, most tipsets are taking 1s, with 500ms devoted to the off-chain delete statement, and another 500 to the select statement:

2023-07-10T17:28:14.302-0400	INFO	filter	filter/index.go:180	delete time: 533.835943ms
2023-07-10T17:28:14.774-0400	INFO	filter	filter/index.go:185	select time: 471.039961ms
2023-07-10T17:28:14.774-0400	INFO	filter	filter/index.go:166	Migrating height 3023717

2023-07-10T17:28:15.322-0400	INFO	filter	filter/index.go:180	delete time: 548.036118ms
2023-07-10T17:28:15.781-0400	INFO	filter	filter/index.go:185	select time: 459.305833ms
2023-07-10T17:28:15.781-0400	INFO	filter	filter/index.go:166	Migrating height 3023716

2023-07-10T17:28:16.300-0400	INFO	filter	filter/index.go:180	delete time: 518.672855ms
2023-07-10T17:28:16.757-0400	INFO	filter	filter/index.go:185	select time: 456.932488ms
2023-07-10T17:28:16.757-0400	INFO	filter	filter/index.go:166	Migrating height 3023715

^CReceived interrupt, shutting down... Press CTRL+C again to force shutdown
2023-07-10T17:28:17.278-0400	INFO	filter	filter/index.go:180	delete time: 520.756515ms
2023-07-10T17:28:17.743-0400	INFO	filter	filter/index.go:185	select time: 464.845068ms
2023-07-10T17:28:17.744-0400	INFO	filter	filter/index.go:166	Migrating height 3023714

2023-07-10T17:28:18.280-0400	INFO	filter	filter/index.go:180	delete time: 536.074573ms
2023-07-10T17:28:18.755-0400	INFO	filter	filter/index.go:185	select time: 474.758714ms
2023-07-10T17:28:18.755-0400	INFO	filter	filter/index.go:166	Migrating height 3023713

2023-07-10T17:28:19.311-0400	INFO	filter	filter/index.go:180	delete time: 556.234014ms
2023-07-10T17:28:19.780-0400	INFO	filter	filter/index.go:185	select time: 468.380992ms

fridrik01 · 2023-07-11T10:44:09Z

The issue was slow SQL statements performed during the migration, I added some temporary indices for the purpose of the migration in #11064 and that resulted in the total migration taking 55sec on an events.db going back since FEVM date. That should be fast enough.

TippyFlitsUK · 2023-07-11T11:57:25Z

I have just tested this on my production miner and I am fully synced with EnableEthRPC enabled!!
Took slightly longer for me in production but still only a couple of minutes!! 👍
I will keep monitoring but so far, everything is looking very solid!! 🚀

Daemon:  1.23.4-dev+mainnet+git.0d7621be5+api1.5.0
Local: lotus version 1.23.4-dev+mainnet+git.0d7621be5

2023-07-11T12:33:31.918+0100	[34mINFO[0m	filter	filter/index.go:304	upgrading event index from version 1 to version 2
2023-07-11T12:36:12.157+0100	[34mINFO[0m	filter	filter/index.go:262	Successfully migrated events to version 2 in 2m40.239250597s

lotus-daemon (copy).log

Stebalien self-assigned this Jul 6, 2023

Stebalien mentioned this issue Jul 6, 2023

feat: Make sure we don't store duplicate actor events caused to reorgs in events.db #11015

Merged

TippyFlitsUK added this to TippyCat Jul 10, 2023

TippyFlitsUK moved this to 🏗 In progress in TippyCat Jul 10, 2023

fridrik01 assigned fridrik01 and unassigned Stebalien Jul 10, 2023

fridrik01 mentioned this issue Jul 11, 2023

fix: events: Improve performance of event migration from V1 to V2 #11064

Merged

7 tasks

arajasek closed this as completed in #11064 Jul 11, 2023

github-project-automation bot moved this from 🏗 In progress to ✅ Done in TippyCat Jul 11, 2023

This was referenced Jul 11, 2023

Add tmp indices to events table while performing migration to V2 #11066

Merged

chore: build: release v1.23.3-rc2 #11067

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Events: Migrating from v1 to v2 is slow #11056

Events: Migrating from v1 to v2 is slow #11056

Stebalien commented Jul 6, 2023

arajasek commented Jul 10, 2023 •

edited

Loading

Stebalien commented Jul 10, 2023

arajasek commented Jul 10, 2023 •

edited

Loading

Stebalien commented Jul 10, 2023

arajasek commented Jul 10, 2023

fridrik01 commented Jul 11, 2023

TippyFlitsUK commented Jul 11, 2023

Events: Migrating from v1 to v2 is slow #11056

Events: Migrating from v1 to v2 is slow #11056

Comments

Stebalien commented Jul 6, 2023

arajasek commented Jul 10, 2023 • edited Loading

Stebalien commented Jul 10, 2023

arajasek commented Jul 10, 2023 • edited Loading

Stebalien commented Jul 10, 2023

arajasek commented Jul 10, 2023

fridrik01 commented Jul 11, 2023

TippyFlitsUK commented Jul 11, 2023

arajasek commented Jul 10, 2023 •

edited

Loading

arajasek commented Jul 10, 2023 •

edited

Loading