-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NONEVM-747 register unregister filters #938
base: develop
Are you sure you want to change the base?
Conversation
pkg/solana/logpoller/log_poller.go
Outdated
} | ||
|
||
filtersByEventSig[filter.EventSig] = append(filtersByEventSig[filter.EventSig], filter) | ||
lp.filtersToBackfill = append(lp.filtersToBackfill, filter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was thinking we could just call this directly and wait for it to complete before returning. Now that you've added this filtersToBackfill
queue, I think maybe it is something that will be useful.
But let me share some of my thought process, curious to hear any thoughts you may have to add to it...
On evm there is an annoying pattern that most products do of calling RegisterFilter
and then immediately calling Replay
afterwards and waiting for that to return. So I was hoping in the future we can just absorb Replay into RegisterFilter so that they only have to call a single function but once it returns they can rely on the data being there so the plugin can start doing whatever it needs to.
For example, CCIP says they require 8 hours of log history for the plugin to work properly. So it seems like they're going to need some way of knowing that the 8 hours has been populated before they start doing their thing? I guess I should ask them directly about what the preferred behavior is here. But even if they do want to wait, I guess it's possible some products wouldn't want to wait--so perhaps it's better that you've built it in a flexible way that could handle both (assuming we add some async method that can wrap this, blocking until the filter is done backfilling).
On evm, there is a concern about having multiple go routines writing to the logs table at the same time. On Solana, I think this should be less of an issue since the db schema is set up to separate logs associated with each filter from one another. (If a log matches multiple filters, then multiple copies of it will be saved.) So backfilling for one filter will not interfere with any backfills going on for other filters simultaneously. There still may be a problem though, which I hadn't thought of until just now... since this would open the door for the latest block to be different for different filters. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If a log matches multiple filters, then multiple copies of it will be saved.)
That reminds me, we'll need to include an extra deduping step in the ORM when logs are fetched
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My motivation to implement backfill using queue:
-
If I recall correctly, you've mentioned that a job update triggered from the UI may lead to a
RegisterFilter
call. Thus, I've assumed that Register/Unregister should be as fast as possible, and additional processing should be done in the background. -
I also assumed that the main loop logs' filtration and extraction of indexed fields may be time-consuming, so we can not lock filters for the whole iteration; we should only lock when filters are accessed. Which forced me to look for an alternative way to sync start of backfill and main loop processing with recently added filter.
I guess we can simplify the queue by replacing it with go startBackFill(from, latestProcessedBlock)
. But in that case we won't be able to use Filters as a separate type and will have to embed them into logpoller.
For example, CCIP says they require 8 hours of log history for the plugin to work properly. So it seems like they're going to need some way of knowing that the 8 hours has been populated before they start doing their thing?
I was under the impression that CCIP is more flexible and can handle eventual consistency. Thus, we only need to guarantee that all logs will eventually end up in the DB. I.e., if CCIP observes sender msgs in reverse order:
- [4,3]
- [4,3,2,1]
- [4,3,2,1,0]
It will handle it gracefully and wait for call 3 to start processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, now I recall main argument for the backfill queue.
As blocks are processed in effectively random order and Backfill operation itself is async we do not have "latest processed block" as in evm. Thus backfill should be done one every restart too. Having backfill queue allows us to do it in one place.
Also when RegisterFilter is called we might not have the latestProcessedBlock
to start the backfill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with keeping it as a queue, but I don't follow (and/or agree with?) either the premise or the conclusion of this statement:
As blocks are processed in effectively random order random order and Backfill operation itself is async we do not have "latest processed block" as in evm.
Perhaps by "processed" you mean something else, but I thought we had all agreed that we must guarantee they are always saved in order of (block_number, log_index). In part to make sure that there is a well-defined last processed block, but mostly to avoid introducing gaps in the logs. You pointed out that Awbrey's ingestor fetches them in random order from the rpc, which I agreed was a big problem. So we discussed the problem with Awbrey, and we agreed that they must be re-ordered somewhere. But he made the argument that it makes more sense to do the reordering in the callback function passed to the ingestor rather than in his PR (the ingestor itself). The callback function will be a part of the PR I'm starting next and include the re-ordering so we can ensure this guarantee.
The design of LogPoller in the design doc is based around the assumption that we always have a well-defined "latest processed block". The only difference there from evm is that because we don't have a blocks table, we will get this by fetching the latest block number from the block_number
column of the logs table instead of the block_number
column of the blocks table. It's true that if the node crashes and the last few blocks had no logs in them, we may end up re-requesting a few empty blocks. But we shouldn't have to do anything special for that in terms of the way we're fetching logs.
A side digression on the term "backfill"... which I keep thinking we might want to avoid using on Solana due to confusion:
On evm, the distinction between a backfill and a non-backfill fetching of logs is whether we fetch logs from a range of block numbers or from a single block by block hash. On Solana, we don't have to worry about re-orgs so there is no reason for the second method of fetching logs--we always fetch them by block range rather than by individual blocks. However, we do have a choice of whether to iterate through specific contract addresses we're tracking and only fetch the logs from each of those or fetching all logs and then post-filtering on contract address. Under normal operations, the decision of whether to filter by contract address or to just get everything will be based on a combination of how many blocks have been mined since the last time LogPoller ran, and how many filters are registered. If it's a lot of filters and only a few new blocks have been added to the chain since last time we polled then it makes more sense to grab everything in a single rpc call and then filter. If on the other hand, there are only a few filters registered but we're thousands of blocks behind because the node has been down for a while, it makes more sense to first ask the rpc servers which blocks actually contain transactions associated with the addresses we're interested in and then only request the specific blocks and/or transactions we need instead of fetching every tx of every block.
RegisterFilter should be the only case where we know ahead of time which of those methods will be more optimal to use: since it's only for a single filter, we should always ask first about that address rather than getting all of the blocks... regardless of how far back the starting block is or how many filters have been registered.
For now, based on the way Awbrey named the methods I'm assuming when we say "backfill" on Solana we mean the case where we're fetching log history for a single filter rather than for all filters at once. But it might be there is a better term for that, so we don't confuse it with the use of "backfill" in evm which I think means something pretty different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think the reason I decided to clarify what I think we mean by "backfill" when we're using that term on Solana is because I got a little confused by this statement:
Backfill operation itself is async
By that, I'm guessing what you mean is that RegisterFilter is async... in that it can be called at any time by any thread, while the main LogPoller loop is doing its thing. The backfill itself we could have chosen to make synchronous or async, the way I see it is... you've chosen to make it async by having RegisterFilter return immediately rather than wait for it to complete. Yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your point about not wanting to lock the filters for a long time. Might have been some other workaround but your way of handling it sounds reasonable to me. I'm not sure about the issue of UI, I think most products are already doing something similar on evm (calling RegisterFilter then Replay immediately after, which blocks until everything is backfilled). AFAIK this has been working like this for a while and hasn't caused any problems. But you could be right that we should be extra careful with Solana since these operations could potentially be more time consuming than on evm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not sure if we actually decided to do the ordering. Thus, I've tried to tailor work with filters to the current Loader implementation. But it works for the "ordered" version too.
As we both agree on keeping the queue, lets move discussion into slack or sync meeting. It would be nice to discuss how ordering will work.
|
||
func (o *DSORM) DeleteFilters(ctx context.Context, filters []Filter) error { | ||
for _, filter := range filters { | ||
err := o.DeleteFilter(ctx, filter.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might we need to create a transaction for this, and pass it down to DeleteFilter
.
What happens if some filters are deleted properly but then partway through this for loop the network connection to the db momentarily goes out and returns an error? It would return from DeleteFilters
with an error at that point, with some deleted and some still there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Alternatively, could do a single DELETE query with a list of all the filter ID's to delete.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if db fails when only part of the filters is deleted, it's safe for us to retry as we are using ID. We'll add all filters scheduled for removal back into the queue and try again.
I hesitate to wrap it into the transaction as in case of timeout due to a large number of filters to be removed; we'll get stuck trying to delete filters and rolling back due to time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I guess that's good enough.
pkg/solana/logpoller/log_poller.go
Outdated
filtersByName map[string]Filter | ||
filtersByAddress map[PublicKey]map[EventSignature][]Filter | ||
filtersToBackfill []Filter | ||
filtersToDelete []Filter // populated on start from db and pruned on first iteration of run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not matter since I don't expect the number of filters to be registered to be too huge, but I notice there are at least 2 different copies of the same Filter
struct that get saved in memory and possibly 3 or 4 in some cases.
We could reduce memory footprint by making these slices of *Filter
instead of slices of Filter
so there is only one copy around. (See elsewhere for a different suggestion about making the slices into maps; this could apply to either.)
(Probably not worth worrying about for now.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, using pointers decreases readability as it gives the impression that the filter might be modified.
Unless we expect IDL to be a large string,g I'd prefer to use struct instead of pointer.
Signed-off-by: Dmytro Haidashenko <[email protected]>
Signed-off-by: Dmytro Haidashenko <[email protected]>
df7ab9f
to
4178d4c
Compare
2d54d67
to
072b840
Compare
Quality Gate failedFailed conditions |
UnregisterFilter
calls caused by cascade removal of matching logs.Depends on: