Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplest MVP transaction filtering #398

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

CapCap
Copy link
Contributor

@CapCap CapCap commented Jun 11, 2024

an MVP version of the transaction filtering crate.

Performance

On a random 3mb proto of 1000txns during taptos, running the "everything" filter (looped) gives:

BENCH: Took 13.225167ms for 1000000 transactions (13ns each)

Random 10mb proto from April 25th with "everything" filter:

BENCH: Took 21.815375ms for 1000000 transactions (21ns each)

Graffio 100mb proto ("everything" filter, but with or):

BENCH: Took 31.559333ms for 1000000 transactions (31ns each)

More advanced feature complete version here: #392

@CapCap CapCap requested a review from a team June 11, 2024 19:12
Copy link
Collaborator

@rtso rtso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just a small comment about the not

@bowenyang007
Copy link
Collaborator

How do we plan to share this crate with aptos-core?

@bowenyang007 bowenyang007 requested a review from banool June 11, 2024 19:57
@CapCap
Copy link
Contributor Author

CapCap commented Jun 11, 2024

@bowenyang007 sorry I don't understand the question

@bowenyang007
Copy link
Collaborator

Data service needs to use this right? I'm curious if we're going to copy over the code or upload it to crates.io.

A few additional questions:

  • Can we add a readme? Particularly I'm not sure from the PR what exactly is being filtered or how it's meant to be used.
  • How are the proto fixtures supposed to work? How do we upgrade the proto?
  • Can we have comments?

@CapCap
Copy link
Contributor Author

CapCap commented Jun 11, 2024

@bowenyang007 I figured we'd do it the way we do everything else, i.e;

transaction-filter = { git = "https://github.com/aptos-labs/aptos-indexer-processors.git", rev = "4541add3fd29826ec57f22658ca286d2d6134b93" }

I can move stuff to the readme and add an example, sure

@CapCap CapCap force-pushed the simplest_transaction_filter branch 2 times, most recently from a28c57c to 13a63d8 Compare June 13, 2024 05:17
@CapCap CapCap requested a review from a team June 13, 2024 05:34
Copy link
Collaborator

@banool banool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of small things, mostly around how the builders work in certain cases. Overall structure still looks great, molto benne.

rust/Cargo.toml Show resolved Hide resolved
rust/processor/src/grpc_stream.rs Show resolved Hide resolved
rust/transaction-filter/Cargo.toml Show resolved Hide resolved
rust/transaction-filter/README.md Show resolved Hide resolved
rust/transaction-filter/README.md Show resolved Hide resolved
Comment on lines +45 to +49
if !self.data.is_allowed(&item.data) {
return false;
}

true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, could be collapsed to just returning self.data.is_allowed(&item.data) directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could! I left it as-is to make the eventual merge of advanced filtering a bit easier haha, but I can change this too

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind so much, explicit is good also.

rust/transaction-filter/src/filters/event.rs Show resolved Hide resolved
#[derive(Clone, Debug, Default, Deserialize, PartialEq, Serialize)]
#[serde(deny_unknown_fields)]
#[derive(derive_builder::Builder)]
#[builder(setter(into, strip_option), default)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if default is a footgun in this case, it might be best to have people explicitly set (even to None) all the fields for this one? I can see people forgetting to set address and only setting module for example. Not too opinionated, up to you.

Same comment for the entry function filter and the like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIR the issue is builder requires default unfortunately.

I'm not an expert in the library though and there are a lot of config options; the biggest thing I wish we could do is call the validation function when calling "build", and have that return our error type- maybe there is, maybe I can do it without a lot of boiler plate, I can take a look

Comment on lines +99 to +100
if !(self.address.is_allowed(&module.address)
&& self.function.is_allowed(&module.name))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure I'm understanding this one right. In this block, address or function could be None. I believe None by default resolves to false for is_allowed based on the trait below. Which I don't think is what we want here, if the user hasn't set a value for address then we shouldn't evaluate it as a filter?

Is there a unit test that explicitly checks this case where only one of address or function is set that can put my mind at ease hahah.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I get it, is_allowed if false if the value is None but true if the filter is None, perfect. In which case, I suppose if self.address.is_some() || self.function.is_some() is just an optimization, we don't actually need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly- we don't explicitly need it, it's just an optimization to save on some ops if we don't need it

#[derive(Clone, Debug, Default, Deserialize, PartialEq, Serialize)]
#[serde(deny_unknown_fields)]
#[derive(derive_builder::Builder)]
#[builder(setter(strip_option), default)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure this shouldn't be default, because there is only one field so you must set it otherwise it'll always be invalid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's required for the builder derive to allow partial completions- otherwise if you only set one field, and even though the overall builder is valid, it yells at you for fields being unset

I think there is some improvements we can make with it, I tried not to get too deep into it to avoid this PRs continued growth hahaha

banool added a commit to aptos-labs/aptos-core that referenced this pull request Jun 14, 2024
There are two types of transaction filtering we will support in the future:
1. Per stream configuration: The downstream declares what txns they want to receive.
2. Global configuration: At the data service level we refuse to include full txns for all streams.

This PR implements the second of these, using @CapCap's work here:
aptos-labs/aptos-indexer-processors#398.

Rather than not sending txns at all if they match the blocklist filters, we just omit the writesets and events. Not sending the txns entirely would cause issues with processors, which today assume that they will receive all txns.
banool added a commit to aptos-labs/aptos-core that referenced this pull request Jun 14, 2024
There are two types of transaction filtering we will support in the future:
1. Per stream configuration: The downstream declares what txns they want to receive.
2. Global configuration: At the data service level we refuse to include full txns for all streams.

This PR implements the second of these, using @CapCap's work here:
aptos-labs/aptos-indexer-processors#398.

Rather than not sending txns at all if they match the blocklist filters, we just omit the writesets and events. Not sending the txns entirely would cause issues with processors, which today assume that they will receive all txns.
banool added a commit to aptos-labs/aptos-core that referenced this pull request Jun 14, 2024
There are two types of transaction filtering we will support in the future:
1. Per stream configuration: The downstream declares what txns they want to receive.
2. Global configuration: At the data service level we refuse to include full txns for all streams.

This PR implements the second of these, using @CapCap's work here:
aptos-labs/aptos-indexer-processors#398.

Rather than not sending txns at all if they match the blocklist filters, we just omit the writesets and events. Not sending the txns entirely would cause issues with processors, which today assume that they will receive all txns.
@CapCap CapCap force-pushed the simplest_transaction_filter branch 6 times, most recently from f1d3f28 to 8f62517 Compare June 14, 2024 23:32
banool added a commit to aptos-labs/aptos-core that referenced this pull request Jun 19, 2024
There are two types of transaction filtering we will support in the future:
1. Per stream configuration: The downstream declares what txns they want to receive.
2. Global configuration: At the data service level we refuse to include full txns for all streams.

This PR implements the second of these, using @CapCap's work here:
aptos-labs/aptos-indexer-processors#398.

Rather than not sending txns at all if they match the blocklist filters, we just omit the writesets and events. Not sending the txns entirely would cause issues with processors, which today assume that they will receive all txns.
banool added a commit to aptos-labs/aptos-core that referenced this pull request Jun 19, 2024
There are two types of transaction filtering we will support in the future:
1. Per stream configuration: The downstream declares what txns they want to receive.
2. Global configuration: At the data service level we refuse to include full txns for all streams.

This PR implements the second of these, using @CapCap's work here:
aptos-labs/aptos-indexer-processors#398.

Rather than not sending txns at all if they match the blocklist filters, we just omit the writesets and events. Not sending the txns entirely would cause issues with processors, which today assume that they will receive all txns.
banool added a commit to aptos-labs/aptos-core that referenced this pull request Jun 19, 2024
There are two types of transaction filtering we will support in the future:
1. Per stream configuration: The downstream declares what txns they want to receive.
2. Global configuration: At the data service level we refuse to include full txns for all streams.

This PR implements the second of these, using @CapCap's work here:
aptos-labs/aptos-indexer-processors#398.

Rather than not sending txns at all if they match the blocklist filters, we just omit the writesets and events. Not sending the txns entirely would cause issues with processors, which today assume that they will receive all txns.
banool added a commit to aptos-labs/aptos-core that referenced this pull request Jun 21, 2024
There are two types of transaction filtering we will support in the future:
1. Per stream configuration: The downstream declares what txns they want to receive.
2. Global configuration: At the data service level we refuse to include full txns for all streams.

This PR implements the second of these, using @CapCap's work here:
aptos-labs/aptos-indexer-processors#398.

Rather than not sending txns at all if they match the blocklist filters, we just omit the writesets and events. Not sending the txns entirely would cause issues with processors, which today assume that they will receive all txns.
@banool banool force-pushed the simplest_transaction_filter branch from 8f62517 to 020a761 Compare June 21, 2024 12:06
@banool banool force-pushed the simplest_transaction_filter branch from 020a761 to 5fc070b Compare June 21, 2024 12:12
banool added a commit to aptos-labs/aptos-core that referenced this pull request Jun 21, 2024
There are two types of transaction filtering we will support in the future:
1. Per stream configuration: The downstream declares what txns they want to receive.
2. Global configuration: At the data service level we refuse to include full txns for all streams.

This PR implements the second of these, using @CapCap's work here:
aptos-labs/aptos-indexer-processors#398.

Rather than not sending txns at all if they match the blocklist filters, we just omit the writesets and events. Not sending the txns entirely would cause issues with processors, which today assume that they will receive all txns.
banool added a commit to aptos-labs/aptos-core that referenced this pull request Jun 21, 2024
There are two types of transaction filtering we will support in the future:
1. Per stream configuration: The downstream declares what txns they want to receive.
2. Global configuration: At the data service level we refuse to include full txns for all streams.

This PR implements the second of these, using @CapCap's work here:
aptos-labs/aptos-indexer-processors#398.

Rather than not sending txns at all if they match the blocklist filters, we just omit the writesets and events. Not sending the txns entirely would cause issues with processors, which today assume that they will receive all txns.
rtso pushed a commit to aptos-labs/aptos-core that referenced this pull request Jun 24, 2024
There are two types of transaction filtering we will support in the future:
1. Per stream configuration: The downstream declares what txns they want to receive.
2. Global configuration: At the data service level we refuse to include full txns for all streams.

This PR implements the second of these, using @CapCap's work here:
aptos-labs/aptos-indexer-processors#398.

Rather than not sending txns at all if they match the blocklist filters, we just omit the writesets and events. Not sending the txns entirely would cause issues with processors, which today assume that they will receive all txns.
rtso pushed a commit to aptos-labs/aptos-core that referenced this pull request Jun 24, 2024
There are two types of transaction filtering we will support in the future:
1. Per stream configuration: The downstream declares what txns they want to receive.
2. Global configuration: At the data service level we refuse to include full txns for all streams.

This PR implements the second of these, using @CapCap's work here:
aptos-labs/aptos-indexer-processors#398.

Rather than not sending txns at all if they match the blocklist filters, we just omit the writesets and events. Not sending the txns entirely would cause issues with processors, which today assume that they will receive all txns.
rtso pushed a commit to aptos-labs/aptos-core that referenced this pull request Jun 24, 2024
There are two types of transaction filtering we will support in the future:
1. Per stream configuration: The downstream declares what txns they want to receive.
2. Global configuration: At the data service level we refuse to include full txns for all streams.

This PR implements the second of these, using @CapCap's work here:
aptos-labs/aptos-indexer-processors#398.

Rather than not sending txns at all if they match the blocklist filters, we just omit the writesets and events. Not sending the txns entirely would cause issues with processors, which today assume that they will receive all txns.
rtso added a commit to aptos-labs/aptos-core that referenced this pull request Jun 24, 2024
* [GRPC] Enable data service ZSTD and update crate that uses old tonic (#13621)

* replace println with tracing

* [GRPC] Simple Transaction Filtering

* Improve transaction filter comments, exports, README, fix lz4 in tests

* [Data Service] Implement simple upstream transaction filtering (#13699)

There are two types of transaction filtering we will support in the future:
1. Per stream configuration: The downstream declares what txns they want to receive.
2. Global configuration: At the data service level we refuse to include full txns for all streams.

This PR implements the second of these, using @CapCap's work here:
aptos-labs/aptos-indexer-processors#398.

Rather than not sending txns at all if they match the blocklist filters, we just omit the writesets and events. Not sending the txns entirely would cause issues with processors, which today assume that they will receive all txns.

---------

Co-authored-by: Max Kaplan <[email protected]>
Co-authored-by: yuunlimm <[email protected]>
Co-authored-by: CapCap <[email protected]>
Co-authored-by: Daniel Porteous <[email protected]>
rtso pushed a commit to aptos-labs/aptos-core that referenced this pull request Jun 24, 2024
There are two types of transaction filtering we will support in the future:
1. Per stream configuration: The downstream declares what txns they want to receive.
2. Global configuration: At the data service level we refuse to include full txns for all streams.

This PR implements the second of these, using @CapCap's work here:
aptos-labs/aptos-indexer-processors#398.

Rather than not sending txns at all if they match the blocklist filters, we just omit the writesets and events. Not sending the txns entirely would cause issues with processors, which today assume that they will receive all txns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants