Move DB upserts to it's own batch of tasks via channel #288

CapCap · 2024-02-21T01:18:29Z

Worker Flow

The application starts by initializing a Worker instance with the necessary configurations such as the processor
configuration, database connection string, GRPC data service address, and other parameters.
The Worker instance then runs migrations on the database to ensure the schema is up-to-date.
The Worker fetches the chain ID from the GRPC service and verifies it against the database.
The Worker then starts a fetcher task that continuously fetches transactions from the GRPC stream and writes them into a channel.
The number of transactions fetched in each batch is determined by the pb_channel_txn_chunk_size parameter.
Concurrently, the Worker also starts multiple processor tasks that consume the transactions from the channel.
These tasks process the transactions in parallel.
The number of processor tasks is determined by the number_concurrent_processing_tasks parameter.
TODO: is this right? The size of the channel is determined by the PB_FETCH_QUEUE_SIZE parameter.
Each processor task uses a specific Processor instance to process the transactions.
The type of Processor used depends on the configuration provided when initializing the Worker.
Each Processor type corresponds to a different way of processing transactions.
After processing the transactions, the processor tasks send the results to a gap detector.
The gap detector checks for any gaps in the processed transactions and panics if it finds any.
The maximum batch size for gap detection is determined by the gap_detection_batch_size parameter.
The processed transactions are also sent to the DbWriter instance associated with the Processor, via a channel.
The DbWriter is responsible for writing the processed transactions to the database ("executing" them).
The size of the channel is determined by the query_executor_channel_size parameter.
The number of concurrent DB writer tasks is determined by the number_concurrent_db_writer_tasks parameter.
The DbWriter sends the transactions to be written to the database in chunks.
It uses an AsyncSender to send QueryGenerator instances to a DB writer task.
Each QueryGenerator contains a table name and a DbExecutable instance, which represents the transactions to be written to the database.
The chunk size for sending queries to the database is determined by the per_table_chunk_sizes parameter;
this parameter specifies the maximum number of rows to be inserted in a single query. It is a map from table name to chunk size.
The DB writer task executes the queries represented by the DbExecutable instances.
If an error occurs during execution, it logs the error and continues with the next query.
This process continues in a loop, with the fetcher task fetching transactions,
the processor tasks processing them, and the DB writer task writing them to the database,
and the gap detector ensuring that if there is a large gap in the transactions, it panics.

# Architecture Diagram

 ┌──────────────┐
 │ GRPC Service │
 └─────┬────────┘
       │Stream
 ┌─────▼────────┐ Transaction ┌───────────┐
 │ Fetcher Task ├────Chunk ───► Processor │
 └─────┬────────┘   Channel   │ Tasks     │
       │ Channel              └────┬──────┘
 ┌─────▼────────┐                  │ DbWriter
 │ Gap Detector │                  │ Channel
 └──────────────┘             ┌────▼────────┐
                              │ DB Executor │
                              └────┬────────┘
                                   │
                              ┌────▼─────┐
                              │ Database │
                              └──────────┘

Also adds the ability to tune insert sizes per table, and skip writes to tables entirely 😄

rust/processor/src/db_writer.rs

just-in-chang · 2024-03-01T00:19:47Z

rust/processor/src/db_writer.rs

+
+        let mut res = self.execute_query(conn.clone()).await;
+
+        // TODO: HAVE BETTER RETRY LOGIC HERE?


Possible exponential backoff here like we talked about 😁

rust/processor/src/db_writer.rs

just-in-chang · 2024-03-01T19:45:03Z

rust/processor/src/processors/nft_metadata_processor.rs

@@ -161,19 +163,19 @@ impl ProcessorTrait for NftMetadataProcessor {
            .await?;
        }

-        let db_insertion_duration_in_secs = db_insertion_start.elapsed().as_secs_f64();
+        let db_channel_insertion_duration_in_secs = db_insertion_start.elapsed().as_secs_f64();


I'm just using the duration it takes to send all the PubSub messages for this metric since I'm not writing to Postgres at all. Not sure if we should just set this to 0 so it doesn't affect the distribution? Not sure if it matters at all but I can make that change haha

rust/processor/src/models/user_transactions_models/user_transactions.rs

rust/processor/src/processors/stake_processor.rs

rust/processor/src/config.rs

rust/processor/src/db_writer.rs

rust/processor/src/models/token_v2_models/v2_token_ownerships.rs

rust/processor/src/processors/account_transactions_processor.rs

rust/processor/src/processors/token_v2_processor.rs

Remove transactions and make migrations not async lint fmt clippy diesel async TLS sslrootcert lint lint support optional tls format more log parallel inserst lint bigger pool pool size 200 try bigger buffer try fixed 100 insert size use ahash + update rust smaller batches, bigger pool increase pool size to 800 small refac for readability increase buffer to 150 try batch size 20 back to 100 buffer refactor grpc into separate file lint try 40mb buffers insert of 10 again ARC instead of cloning txns lint avoid another clone try size 50 try 100 tryp 65 Change threading model for higher parallelism and throughput (#249) Co-authored-by: jillxuu <[email protected]> clean cleanup try 200 connections coin processor spawn blocking sleep well ARC and consistent parallelism database parallelism undo no CDB compat Use gap detector Don't panic in gaps TEMP CHANGE FOR LOAD TEST send in chunks gap detector bigger parallel writes to db try chunks of 40 5k gap fix channel length post load test cleanup temporary execute in chunks cleanup and comments Add config for table chunk size cleanup

progress more progress more trying temp pause migrating over using traits lint lint and clean lint

grao1991 · 2024-03-12T21:29:01Z

rust/processor/src/db_writer.rs

+
+// A holder struct for processors db writing so we don't need to keep adding new params
+#[derive(Clone)]
+pub struct DbWriter {


I feel this should be called as DbQueryGenerator, QueryGenerator should be called as Query, and launch_db_writer_task should be DbWriter.

I added DataWithQuery- let me know if you think the current is more or less clear?

Still feel the QueryGenerator is a weird name, especially when it doesn't have a method called generate.

grao1991 · 2024-03-12T21:31:37Z

rust/processor/src/db_writer.rs

+
+pub fn launch_db_writer_task(
+    query_receiver: AsyncReceiver<QueryGenerator>,
+    processor_name: &'static str,


would it be better to pass this per query?

I want to move this to a trait or OOP or something instead of all this passing

Sure. I just want to decouple the writer from the processor (especially you only need this for logging/metrics purpose), to allow us potentially running multiple processors together in a process.

grao1991 · 2024-03-12T21:31:59Z

rust/processor/src/db_writer.rs

+    let tasks = (0..num_tasks)
+        .map(|_| launch_db_writer_task(query_receiver.clone(), processor_name, conn.clone()))
+        .collect::<Vec<_>>();
+    futures::future::try_join_all(tasks)


you probably want a separate runtime for this

agreed, I probably do- will do those changes in a follow up PR

grao1991 · 2024-03-15T03:56:27Z

rust/processor/src/data_with_query.rs

+        }
+    }
+
+    /*


nit: remove this?

grao1991 · 2024-03-15T03:59:34Z

rust/processor/src/db_writer.rs

+
+pub fn launch_db_writer_task(
+    query_receiver: AsyncReceiver<QueryGenerator>,
+    processor_name: &'static str,


Sure. I just want to decouple the writer from the processor (especially you only need this for logging/metrics purpose), to allow us potentially running multiple processors together in a process.

grao1991 · 2024-03-15T04:00:27Z

rust/processor/src/db_writer.rs

+    pub db_executable: Box<dyn DbExecutable>,
+}
+
+pub fn launch_db_writer_task(


nit: don't need pub?

I will always pub everything because the number of times something being pub that shouldn't be has hurt me across my career is 0, whereas the number of times i've had to fork things because things arent public is somewhere around 1000 :-P

my experience was the other way around 🤔 I heavily rely on visibility to reason about the design when reading other people's code.

ex: diesel doesnt expose table names; we'd need a proc macro, or to fork, or PR it, etc. Easily avoidable mess 🙃

grao1991 · 2024-03-15T04:01:23Z

rust/processor/src/db_writer.rs

+    }
+}
+
+pub fn diesel_error_to_metric_str(error: &Error) -> &'static str {


nit: don't need pub?

grao1991 · 2024-03-15T04:01:41Z

rust/processor/src/db_writer.rs

+
+// A holder struct for processors db writing so we don't need to keep adding new params
+#[derive(Clone)]
+pub struct DbWriter {


nit: can we use pub(crate) when possible?

grao1991 · 2024-03-15T04:06:10Z

rust/processor/src/db_writer.rs

+                        )
+                        .await;
+                    query_res.expect("Error executing query");
+                    drop(query_generator);


do we need to explicitly drop here?
should we drop it in a background thread?

I explicitly drop so that I can be guaranteed it holds across this unsafe block

grao1991 · 2024-03-15T04:10:40Z

rust/processor/src/db_writer.rs

+
+// A holder struct for processors db writing so we don't need to keep adding new params
+#[derive(Clone)]
+pub struct DbWriter {


Still feel the QueryGenerator is a weird name, especially when it doesn't have a method called generate.

grao1991 · 2024-03-15T04:14:41Z

rust/processor/src/db_writer.rs

+        }
+    }
+
+    pub fn chunk_size<Item: field_count::FieldCount>(&self, table_name: &str) -> usize {


not related to this PR, but one day we probably also need to have limit on bytes size in addition to # of rows.

grao1991 · 2024-03-15T04:16:59Z

rust/processor/src/db_writer.rs

+        let chunk_size = self.chunk_size::<Item>(table_name);
+        let chunks = get_chunks(items_to_insert.len(), chunk_size);
+        for (start_ind, end_ind) in chunks {
+            let items = items_to_insert[start_ind..end_ind].to_vec();


is it a copy here?

it is :-( I think there's likely a way to avoid it (and I didn't want pop or whatever) but I did not do it yet

grao1991 · 2024-03-15T04:23:31Z

rust/processor/src/main.rs

@@ -10,7 +10,7 @@ const RUNTIME_WORKER_MULTIPLIER: usize = 2;

 fn main() -> Result<()> {
    let num_cpus = num_cpus::get();
-    let worker_threads = (num_cpus * RUNTIME_WORKER_MULTIPLIER).max(16);
+    let worker_threads = num_cpus * RUNTIME_WORKER_MULTIPLIER;


not related to this PR, but why do we need a multiplier here?

hyperthreading? IDK aptos-core had this I think

num_cpus::get() actually considers hyperthreading.

just-in-chang · 2024-03-18T10:13:02Z

rust/processor/src/db_writer.rs

+    // TODO: make this config/constant?
+    fn max_retries(&self) -> usize {
+        2
+    }


We have #321 to make num retries and timeout into config params for the 1.10 release, could follow smtg similar?

yes- I want to wait until we can pipe configs through to processors etc more easily

Not related to this PR, but in the future we should prob have a system that allows each processor to have a "context" that can be accessed/passed around from where ever... The context would just contain processor specific stuff like Arcs of channel txs/rxs, in-memory cache, config, etc

Been thinking of ways to do this but think the best is just to have a trait in SDK 🤣

rtso reviewed Feb 21, 2024

View reviewed changes

rust/processor/src/db_writer.rs Outdated Show resolved Hide resolved

rtso reviewed Feb 21, 2024

View reviewed changes

rust/processor/src/db_writer.rs Outdated Show resolved Hide resolved

just-in-chang self-requested a review February 22, 2024 19:35

CapCap changed the base branch from main to arc_in_processors February 29, 2024 00:54

CapCap force-pushed the processor_channel_to_db_writers branch 2 times, most recently from 4aba938 to e38eebd Compare February 29, 2024 01:02

CapCap requested a review from rtso February 29, 2024 04:31

just-in-chang approved these changes Mar 1, 2024

View reviewed changes

CapCap force-pushed the arc_in_processors branch 5 times, most recently from d459a44 to 5726900 Compare March 6, 2024 19:50

Base automatically changed from arc_in_processors to main March 6, 2024 20:23

CapCap force-pushed the processor_channel_to_db_writers branch 5 times, most recently from 83cbe81 to 56e03ca Compare March 7, 2024 22:03

ying-w marked this pull request as ready for review March 8, 2024 02:21

rtso reviewed Mar 8, 2024

View reviewed changes

rust/processor/src/config.rs Show resolved Hide resolved

rust/processor/src/db_writer.rs Show resolved Hide resolved

rust/processor/src/db_writer.rs Outdated Show resolved Hide resolved

rtso reviewed Mar 9, 2024

View reviewed changes

rust/processor/src/db_writer.rs Show resolved Hide resolved

rtso reviewed Mar 9, 2024

View reviewed changes

rust/processor/src/models/token_v2_models/v2_token_ownerships.rs Outdated Show resolved Hide resolved

rtso reviewed Mar 9, 2024

View reviewed changes

rust/processor/src/processors/account_transactions_processor.rs Outdated Show resolved Hide resolved

CapCap force-pushed the processor_channel_to_db_writers branch from 7fd40ee to 35c9ef2 Compare March 9, 2024 00:35

rtso reviewed Mar 9, 2024

View reviewed changes

rust/processor/src/processors/token_v2_processor.rs Outdated Show resolved Hide resolved

CapCap added 3 commits March 11, 2024 12:38

DB writer that pulls from channel with it's own parallelism

e9a1996

progress more progress more trying temp pause migrating over using traits lint lint and clean lint

cleanup

4a52342

CapCap force-pushed the processor_channel_to_db_writers branch from 35c9ef2 to 4a52342 Compare March 11, 2024 19:45

grao1991 reviewed Mar 12, 2024

View reviewed changes

CapCap added 7 commits March 13, 2024 18:34

progress with Fn trait

61cd852

try unsafe

c836645

try unsafe

094897c

this mess

6db4493

tmp

9cf0d78

asd

a1e863f

wow

c124bff

CapCap force-pushed the processor_channel_to_db_writers branch from aa91bac to fa1a28d Compare March 14, 2024 22:00

Devex fixes

cbd47d6

CapCap force-pushed the processor_channel_to_db_writers branch from fa1a28d to cbd47d6 Compare March 14, 2024 22:23

grao1991 reviewed Mar 15, 2024

View reviewed changes

just-in-chang reviewed Mar 18, 2024

View reviewed changes

grao1991 mentioned this pull request Jul 19, 2024

[EI-444] Migrate events processor to sdk #470

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move DB upserts to it's own batch of tasks via channel #288

Move DB upserts to it's own batch of tasks via channel #288

CapCap commented Feb 21, 2024 •

edited

Loading

just-in-chang Mar 1, 2024

just-in-chang Mar 1, 2024

grao1991 Mar 12, 2024

CapCap Mar 14, 2024

grao1991 Mar 15, 2024

grao1991 Mar 12, 2024

CapCap Mar 14, 2024

grao1991 Mar 15, 2024

grao1991 Mar 12, 2024

CapCap Mar 14, 2024

grao1991 Mar 15, 2024

grao1991 Mar 15, 2024

grao1991 Mar 15, 2024

CapCap Mar 15, 2024

grao1991 Mar 15, 2024

CapCap Mar 18, 2024

grao1991 Mar 15, 2024

grao1991 Mar 15, 2024

CapCap Mar 15, 2024

grao1991 Mar 15, 2024

CapCap Mar 15, 2024

grao1991 Mar 15, 2024

grao1991 Mar 15, 2024

grao1991 Mar 15, 2024

CapCap Mar 19, 2024

grao1991 Mar 15, 2024

CapCap Mar 19, 2024

grao1991 Mar 19, 2024

just-in-chang Mar 18, 2024

CapCap Mar 18, 2024 •

edited

Loading

just-in-chang Mar 18, 2024

CapCap Mar 19, 2024


		let mut res = self.execute_query(conn.clone()).await;

		// TODO: HAVE BETTER RETRY LOGIC HERE?

+                      }
+                  }
+                  /*

Move DB upserts to it's own batch of tasks via channel #288

Are you sure you want to change the base?

Move DB upserts to it's own batch of tasks via channel #288

Conversation

CapCap commented Feb 21, 2024 • edited Loading

Worker Flow

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CapCap Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CapCap commented Feb 21, 2024 •

edited

Loading

CapCap Mar 18, 2024 •

edited

Loading