[BCFR-899] MaxLogsKept implementation #14574

reductionista · 2024-09-26T16:47:30Z

Motivation

Presently, LogPoller supports only time-based retention, via the Retention field in the filters passed to RegisterFilter. The MaxLogsKept field was added earlier in anticipation of the need for also supporting recency-count based retention. One example of a case where time based retention is risky is the Transmit event in the OCR Contract Transmitter. No matter how long the retention period is set to, there's a chance the node will be down for longer than that, and miss logs when it comes back up. This would make a bad situation even worse, because the transmit event would never be picked up at all.

Solution

This implements the MaxLogsKept feature in LogPoller. When specified, this field tells LogPoller it's okay to prune logs matching a filter if there are at least MaxLogsKept more recent matching logs in the db.
In the example above, this avoid storing any more logs than needed while always having the latest transmit event available. In this case, older transmit events are no longer relevant if there is a more recent one.
In general, this should be just as useful for anything accessed only via ChainReader's GetLatestValue() method rather than QueryKey().

A log may be pruned either because it's too old in terms of time or in terms of the number of logs being saved. It need not satisfy both theRetention and MaxLogsKeptcriteria in order to get pruned.

Testing

This was tested by changing the MaxLogsKept setting on the ContractTransmitter filter passed to LogPoller from 0 to 1, and running the CCIP load tests.

Without paging, this query is one of the most cpu intensive. Similar to the DeleteExpiredLogs query, it must go though every row of the logs table unless LogPrunePageSize is set to non-zero. But it's slower than DeleteExpiredLogs because on top of that, it also has to group, sort, and count every log in each group in order to figure out how many there are and which ones are the excess logs eligible for deletion. Also similar to DeleteExpiredLogs, the final step is to merge together all the results for logs matching multiple filters by making sure that no log is dropped unless ALL of its matching filters agree that it came before their own MaxLogsKept threshold was hit.

Without paging, the SelectExcessLogs query grew in median duration linearly as the number of logs in the table increased. After a few hours it was taking longer than the Insert queries (and any other queries) and started causing timeouts to occur, not just for itself but for other queries as well. It eventually got to a point where the p90 & p99 charts were continuously in the 4-5s range, generating many critical errors as well as a lot of backlogged queries waiting on connections.

With a paging size of 4000, you could see that the query durations grew linearly at first with the size of the table, and then leveled off at slightly more than the insert queries. It still resulted in a couple fairly large bursts of critical errors (query timeouts) during the heaviest cpu usage.

A paging size of 1000-2000 worked much better. There were a much smaller # of timeouts, and only during a brief window of time. The p99 durations were noticeably lower than the insert query durations, aside from some very brief but high spikes which we believe are due to an unrelated bug (which has been fixed in Chainlink repo, but hasn't been back posted to CCIP yet) in not letting go of connections quickly enough (due to sql logging).
Aside from these spikes, the charts look pretty healthy... so we should retest once that bug is backported but before the MaxLogsKept feature is enabled.

Another scenario tested was the performance of the query when all filters have MaxLogsKept=0. This was tested both with and without the code which skips the query unless there is at least one filter with MaxLogsKept > 0. Obviously, skipping the query took it off the chart entirely... so the rest of the chart looked the same as before this PR. Without slipping the query, it was slightly faster than the MaxLogs=1 case, but not a very significant reduction. Based on this it was decided it did make sense to disable the pruning until this feature starts being used by at least 1 filter.

With MaxLogsKept=1:

With MaxLogsKept=0 (disabled):

For both of these tests, the pruning of unmatched logs--which is also an expensive query--has been increased in frequency from every 20 ticks to every 4 ticks, to see what it looks like under the most severe conditions while they're both happening at once (with every 20 ticks, the test usually completes before the first one happens). On the MaxLogsKept=1 charts, SelectUnmatchedLogs query durations show up as cyan and SelectExcessLogs query durations as bright yellow. On the MaxLogsKept=0 charts, SelectUnmatchedLogs is pale yellow (and SelectExcessLogs does not run--as desired). LogPrunePageSize was set to 1000 for both.

dhaidashenko

Impressive query 🤯!
Left a couple of nits

core/chains/evm/logpoller/orm.go

core/chains/evm/logpoller/log_poller.go

core/chains/evm/logpoller/orm.go

core/chains/evm/logpoller/log_poller.go

Also: add block_number >= lower

Also: add deduping of ids for logs matching multiple filters

Also, remove some extraneous lines in orm_test.go

- Remove topics from SelectExcessLogs query - Early exit from loadFilters - upper >= end

cl-sonarqube-production · 2024-10-16T01:39:46Z

Quality Gate passed

Issues
0 New issues
1 Fixed issue
0 Accepted issues

Measures
0 Security Hotspots
70.1% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

reductionista force-pushed the BCFR-899-max-logs-kept branch 2 times, most recently from 6e1376d to 184422e Compare September 27, 2024 07:32

reductionista marked this pull request as ready for review September 27, 2024 07:33

reductionista requested review from a team as code owners September 27, 2024 07:33

reductionista requested review from EasterTheBunny and removed request for a team September 27, 2024 07:33

reductionista changed the base branch from develop to BCFR-900-log-poller-id-columns September 27, 2024 07:34

reductionista requested review from a team as code owners September 27, 2024 07:34

reductionista force-pushed the BCFR-899-max-logs-kept branch from 184422e to a064334 Compare September 27, 2024 07:36

reductionista changed the title ~~[BCF-899] MaxLogsKept implementation~~ [BCFR-899] MaxLogsKept implementation Sep 27, 2024

dhaidashenko reviewed Sep 27, 2024

View reviewed changes

core/chains/evm/logpoller/orm.go Outdated Show resolved Hide resolved

core/chains/evm/logpoller/orm.go Outdated Show resolved Hide resolved

core/chains/evm/logpoller/log_poller.go Show resolved Hide resolved

reductionista force-pushed the BCFR-899-max-logs-kept branch 9 times, most recently from 17c1357 to e3491b8 Compare October 1, 2024 02:41

Base automatically changed from BCFR-900-log-poller-id-columns to develop October 1, 2024 06:16

reductionista force-pushed the BCFR-899-max-logs-kept branch 2 times, most recently from 06ecc5d to 2225dac Compare October 1, 2024 22:08

reductionista commented Oct 1, 2024

View reviewed changes

core/chains/evm/logpoller/log_poller.go Show resolved Hide resolved

mateusz-sekara reviewed Oct 2, 2024

View reviewed changes

core/chains/evm/logpoller/log_poller.go Show resolved Hide resolved

core/chains/evm/logpoller/orm.go Outdated Show resolved Hide resolved

core/chains/evm/logpoller/orm.go Outdated Show resolved Hide resolved

core/chains/evm/logpoller/orm.go Outdated Show resolved Hide resolved

mateusz-sekara reviewed Oct 2, 2024

View reviewed changes

core/chains/evm/logpoller/log_poller.go Show resolved Hide resolved

reductionista added 13 commits October 15, 2024 17:36

Fix unreachable code

dbfb7f6

pnpm changeset

7159552

Re-use block-range paging from DeleteBlocksBefore for SelectExcessLogs

7228598

Also: add block_number >= lower

Use ExecPagedQuery for SelectUnmatchedLogIDs

81c1367

Also: add deduping of ids for logs matching multiple filters

Improve logging

49dea9a

Add test for SelectExcessLogIDs

dd0a6b1

Also, remove some extraneous lines in orm_test.go

Only activate count-based log pruning when needed

3e5a51f

Typo in changeset

cf32ecf

Refactor ExecPagedQuery into method of generic type RangedQuery[T]

3cc1226

Fix setting of countBasedPruningActive flag

2bfc752

Change sql comments to go comments

436ef38

Address remaining PR comments

97cfcb3

- Remove topics from SelectExcessLogs query - Early exit from loadFilters - upper >= end

Take Jordan's suggestion of replacing *atomic.Bool with atomic.Bool

76d4550

reductionista dismissed stale reviews from jmank88, dhaidashenko, and mateusz-sekara via 76d4550 October 16, 2024 00:41

reductionista force-pushed the BCFR-899-max-logs-kept branch from 2fcfc91 to 76d4550 Compare October 16, 2024 00:41

jmank88 approved these changes Oct 16, 2024

View reviewed changes

dhaidashenko approved these changes Oct 16, 2024

View reviewed changes

reductionista added this pull request to the merge queue Oct 16, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 16, 2024

reductionista added this pull request to the merge queue Oct 16, 2024

Merged via the queue into develop with commit accbf0f Oct 16, 2024
128 of 129 checks passed

reductionista deleted the BCFR-899-max-logs-kept branch October 16, 2024 16:06

This was referenced Oct 16, 2024

[DO NOT MERGE] Changeset Release Preview - v2.21.0 #13148

Draft

[DO NOT MERGE] Changeset Release Preview - v2.19.0 philipjonsen/chainlink#2

Draft

[DO NOT MERGE] Changeset Release Preview - v2.18.0 picoinnetwork/chainlink#1

Draft

github-actions bot mentioned this pull request Oct 24, 2024

[DO NOT MERGE] Changeset Release Preview - v2.18.0 fanligroup/chainlink#1

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BCFR-899] MaxLogsKept implementation #14574

[BCFR-899] MaxLogsKept implementation #14574

reductionista commented Sep 26, 2024 •

edited

Loading

dhaidashenko left a comment

cl-sonarqube-production bot commented Oct 16, 2024

[BCFR-899] MaxLogsKept implementation #14574

[BCFR-899] MaxLogsKept implementation #14574

Conversation

reductionista commented Sep 26, 2024 • edited Loading

Motivation

Solution

Testing

dhaidashenko left a comment

Choose a reason for hiding this comment

cl-sonarqube-production bot commented Oct 16, 2024

Quality Gate passed

reductionista commented Sep 26, 2024 •

edited

Loading