Internal tracing for prover and verifier (internal tracing PR 2 of 3) #12874

tizoc · 2023-03-20T22:57:16Z

In addition to the tracing output generated by the first PR, here two new output files are added, one for the verifier process used by the transition frontier (other verifier processes are not currently traced), and another one for the prover process used by the block producer. If the node is launched with internal tracing enabled, those two subprocesses will be launched with internal tracing enabled too. When the mina client is used to toggle internal tracing, an RPC call will be made to those subprocesses to toggle internal tracing too.

Explain your changes:

This PR implements internal tracing for the prover and verifier. Builds on top of #12703

Explain how you tested your changes:

By consuming the traces with https://github.com/openmina/internal-trace-consumer and running it on our cluster.

Example of a structured trace generated by the trace consumer:

https://gist.github.com/tizoc/aab4d1dcf9d6b1d6857b0ac8ce094f03

Checklist:

Modified the current draft of release notes with details on what is completed or incomplete within this project
Document code purpose, how to use it
- Mention expected invariants, implicit constraints
Tests were added for the new behavior
- Document test purpose, significance of failures
- Test names should reflect their purpose
All tests pass (CI will check this if you didn't)
Serialized types are in stable-versioned modules
Does this close issues? List them

Closes #0000

mrmr1993

This looks great! Left a few nits, but also: do you have any statistics around the additional overhead of adding these new log messages? I'm hoping that it's minimal, but if it isn't, we should call out that overhead explicitly in the CLI flag's / command's description.

mrmr1993 · 2023-03-21T19:00:26Z

src/lib/logger/impl.mli

+    | Error
+    | Faulty_peer
+    | Fatal
+    | Internal


Nit: it might be cleaner to call this log level Checkpoint or similar, Internal feels somewhat non-specific.

Original name was "spy" (because it was short and what it did was to expose what happened internally so that an external consumer could process it). Then it was switched to "internal" based on feedback here: #12703 (comment)

Ideally I would have used "trace" but it is taken already. "checkpoint" works too (not 100% ideal because there is some output that is not specifically checkpoints, like metadata).

Just let me know and I will rename it.

@psteckler what's your preference between Checkpoint and Internal?

src/lib/mina_lib/mina_lib.ml

mrmr1993 · 2023-03-21T19:38:59Z

src/lib/transition_router/initial_validator.ml

@@ -309,6 +320,16 @@ let run ~logger ~trust_system ~verifier ~transition_reader
                      , time_received ) ;
                    return ()
                | Error error ->
+                    Internal_tracing.with_state_hash state_hash
+                    @@ fun () ->
+                    [%log internal] "Failure"


Should this log message be more specific? Failure seems very broad.

Check the ~metadata argument, the reason for the failure is attached there.

In the GCP stackdriver view, this will show up as the message Failure, and you have to manually expand to get the metadata. If there's a more descriptive message that could go here instead, that would help us interpret the tracing messages while looking at the raw logs.

I see. Let me think about it for a bit, but I think something like Failure:<arbitrary output> (with : or \t or some specific separator I can use to ignore the remaining text in the consumer) would do. Would that be good enough? (I don't know what it looks like in stackdriver).

@mrmr1993 btw just to make sure we are on the same page. The files produced by the internal tracing don't look like the JSON logs produced by the node, they just contain a sequence of events and control commands that look like this:

{"current_block":"3NL3n4UnfDEmbx2hap27Z3z7PU1D4pktAd9qmj2PvnCUAhnNT4rQ"} ["Produce_state_transition_proof",1679321390.6394472] {"metadata":{"transactions_count":0}} {"current_block":"3NLqqdkAQnbNorXKRFuWC9fz45jvdF1XUiwtseWzRbCB8vaxUpkS"} {"block_metadata":{"blockchain_length":"2"}} ["External_block_received",1679321408.8380573] ["Initial_validation",1679321408.8467133] ["Verify_blockchain_snarks",1679321408.8499875] {"metadata":{"count":1}} {"current_block":"3NL3n4UnfDEmbx2hap27Z3z7PU1D4pktAd9qmj2PvnCUAhnNT4rQ"} ["Produce_chain_transition_proof",1679321409.389129] ["Produce_validated_transition",1679321409.3892083] ["Build_breadcrumb",1679321409.3894844] {"block_metadata":{"coinbase_receiver":"B62qnzLUoQaPPM33X6JpVJUMq36QSibEgZep4U2idqFwxj3GXK8drXF","creator":"B62qnzLUoQaPPM33X6JpVJUMq36QSibEgZep4U2idqFwxj3GXK8drXF","global_slot":"3","previous_state_hash":"3NLSc2Y4MkCgPHGbfeuNX23Hpk5ZPs9LHgegf28DcMWWVKCxz9wo","slot":"3","winner":"B62qogjiboJ1tsGm42AtnWuxiL6yX3qDb2CGFVtTgD9iEjxGSEQ3uzm"}} ["Validate_staged_ledger_diff",1679321409.3895411] ["Prediff",1679321409.3908107] ["Verify_commands",1679321409.390832] {"metadata":{"count":0}} ["Verify_commands_done",1679321415.2565856] ["Apply_diff",1679321415.2566586] ["Update_coinbase_stack",1679321415.2567432] {"metadata":{"coinbases":1,"commands_count":0,"max_throughput":128,"proofs_waiting":0,"spots_available":128,"transactions":1,"works":0}} ["Update_ledger_and_get_statements",1679321415.2567675] {"metadata":{"partition":"single"}} ["Update_ledger_and_get_statements_done",1679321415.2621703] ["Update_coinbase_stack_done",1679321415.2622075] {"metadata":{"data_len":1,"is_new_stack":true,"transactions_len":1}} ["Check_for_sufficient_snark_work",1679321415.2622213] {"metadata":{"free_space":128,"required_pairs":0,"slots":1,"work_count":0}} ["Check_zero_fee_excess",1679321415.262237] ["Fill_work_and_enqueue_transactions",1679321415.262251] {"metadata":{"emitted_proof":false,"merge_jobs_created":0,"scan_state_added_works":0,"total_proofs":0}} ["Update_pending_coinbase_collection",1679321415.262411]

I mention this because I don't know what is the format expected by stackdriver. Obviously, if needed another log processor+transport can be registered that outputs logs in the same format as the rest of the logs.

In any case, if it is useful I can add extra text after the failure (something like a comment so it looks like ["Failure // Validation callback expired", 1679321415.43431])

mrmr1993 · 2023-03-21T19:39:06Z

src/lib/transition_router/initial_validator.ml

@@ -329,6 +350,11 @@ let run ~logger ~trust_system ~verifier ~transition_reader
                  |> Protocol_state.hashes )
                    .state_hash
                in
+                Internal_tracing.with_state_hash state_hash
+                @@ fun () ->
+                [%log internal] "Failure"


Likewise here

Same as above, it is attached in ~metadata

tizoc · 2023-03-21T20:37:32Z

@mrmr1993 thanks for the review!

do you have any statistics around the additional overhead of adding these new log messages? I'm hoping that it's minimal, but if it isn't, we should call out that overhead explicitly in the CLI flag's / command's description.

I don't have any, but if you point me to a good way to compare it I can do some measurements. I guess I would need an archive node to collect some blocks, and then try to replay them?

Note that enabling/disabling the tracing only has an effect on I/O, otherwise the logging functions are still called, and the parameters to it are still created. But in general, I don't think it should have a noticeable impact. Compared to regular logging, the output is quite a bit more compact (the examples in the gist are expanded and not what the node generates). Also there is no tracing done inside loops which is where even small overheads could add up and become meaningful.

Btw please note that the core was implemented in this PR #12703 (from where the first 3 commits come from), this one adds the extra parts required for tracing the verifier and prover.

tizoc · 2023-03-22T19:59:41Z

Rebased over berkeley and made it the target branch for the PR.

deepthiskumar · 2023-03-29T00:03:18Z

!ci-build-me

deepthiskumar · 2023-03-30T16:53:09Z

!ci-build-me

tizoc · 2023-03-31T13:13:10Z

Just pushed a fix for the test that was failing (Was missing an update to the Verifier.create call)

deepthiskumar · 2023-03-31T16:41:08Z

!ci-build-me

deepthiskumar · 2023-04-02T21:32:28Z

!ci-build-me

tizoc · 2023-04-11T15:48:00Z

!ci-build-me

tizoc · 2023-04-11T18:48:58Z

!ci-build-me

tizoc · 2023-04-11T22:25:41Z

!ci-build-me

tizoc · 2023-04-12T11:41:42Z

!ci-build-me

… for the prover

…acing

Dynamic enabled/disable of internal tracing for prover and verifier

…erifier and prover Needed to be able to properly reconstruct traces containing checkpoints of multiple concurrent calls.

tizoc · 2023-04-18T19:36:23Z

!ci-build-me

tizoc · 2023-04-20T21:05:49Z

!ci-build-me

tizoc · 2023-04-20T22:10:22Z

!ci-build-me

deepthiskumar · 2023-04-20T23:21:51Z

!approved-for-mainnet

tizoc requested review from a team as code owners March 20, 2023 22:57

tizoc mentioned this pull request Mar 20, 2023

Feature/kimchi internal tracing (internal tracing PR 3 of 3) #12875

Closed

6 tasks

tizoc force-pushed the feature/prover-internal-tracing branch 3 times, most recently from 7446d68 to e138ef3 Compare March 21, 2023 11:49

mrmr1993 approved these changes Mar 21, 2023

View reviewed changes

tizoc force-pushed the feature/prover-internal-tracing branch 2 times, most recently from 2356c55 to b41e77c Compare March 22, 2023 19:58

tizoc changed the base branch from develop to berkeley March 22, 2023 19:59

tizoc force-pushed the feature/prover-internal-tracing branch from b41e77c to d55a134 Compare March 28, 2023 20:40

tizoc changed the title ~~Internal tracing for prover and verifier~~ Internal tracing for prover and verifier (internal tracing PR 2 of 3) Mar 30, 2023

tizoc force-pushed the feature/prover-internal-tracing branch from d55a134 to 4fc99aa Compare March 30, 2023 01:13

MartinMinkov approved these changes Mar 30, 2023

View reviewed changes

tizoc force-pushed the feature/prover-internal-tracing branch 3 times, most recently from f025b49 to d153e5f Compare April 5, 2023 16:37

tizoc mentioned this pull request Apr 11, 2023

Internal tracing implementation (internal tracing PR 1 of 3) #12703

Merged

13 tasks

tizoc force-pushed the feature/prover-internal-tracing branch from e073761 to 03383d3 Compare April 11, 2023 18:48

jdsteinhauser approved these changes Apr 11, 2023

View reviewed changes

tizoc mentioned this pull request Apr 13, 2023

Internal tracing for prover and verifier (internal tracing PR 2 of 3) / develop branch version #13031

Merged

tizoc and others added 8 commits April 18, 2023 15:11

Add internal_tracing.context_logger library

d883922

Internal tracing for verifier

d6f10d6

feat(internal_tracing/block_prove): set up logging + internal tracing…

eb184ea

… for the prover

feat(internal_tracing/block_prove): pickles block proof generation tr…

3924764

…acing

Add internal tracing toggle RPC to verifier and prover

809434e

Dynamic enabled/disable of internal tracing for prover and verifier

internal_tracing: Add control command to detect context switches in v…

b5ec274

…erifier and prover Needed to be able to properly reconstruct traces containing checkpoints of multiple concurrent calls.

More tweaks based on review feedback

4e300b8

Fix test (update verifier constructor call)

448cd7d

tizoc force-pushed the feature/prover-internal-tracing branch from 5b0e430 to 448cd7d Compare April 18, 2023 18:12

lk86 approved these changes Apr 18, 2023

View reviewed changes

jdsteinhauser approved these changes Apr 18, 2023

View reviewed changes

georgeee approved these changes Apr 18, 2023

View reviewed changes

Merge branch 'berkeley' into feature/prover-internal-tracing

90ca883

deepthiskumar merged commit bd2e8e3 into MinaProtocol:berkeley Apr 20, 2023

tizoc deleted the feature/prover-internal-tracing branch April 20, 2023 23:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal tracing for prover and verifier (internal tracing PR 2 of 3) #12874

Internal tracing for prover and verifier (internal tracing PR 2 of 3) #12874

tizoc commented Mar 20, 2023 •

edited by georgeee

Loading

mrmr1993 left a comment

mrmr1993 Mar 21, 2023

tizoc Mar 21, 2023

mrmr1993 Mar 21, 2023

mrmr1993 Mar 21, 2023

tizoc Mar 21, 2023

mrmr1993 Mar 21, 2023

tizoc Mar 21, 2023 •

edited

Loading

tizoc Mar 21, 2023 •

edited

Loading

mrmr1993 Mar 21, 2023

tizoc Mar 21, 2023

tizoc commented Mar 21, 2023

tizoc commented Mar 22, 2023

deepthiskumar commented Mar 29, 2023

deepthiskumar commented Mar 30, 2023

tizoc commented Mar 31, 2023

deepthiskumar commented Mar 31, 2023

deepthiskumar commented Apr 2, 2023

tizoc commented Apr 11, 2023

tizoc commented Apr 11, 2023

tizoc commented Apr 11, 2023

tizoc commented Apr 12, 2023

tizoc commented Apr 18, 2023

tizoc commented Apr 20, 2023

tizoc commented Apr 20, 2023

deepthiskumar commented Apr 20, 2023

Internal tracing for prover and verifier (internal tracing PR 2 of 3) #12874

Internal tracing for prover and verifier (internal tracing PR 2 of 3) #12874

Conversation

tizoc commented Mar 20, 2023 • edited by georgeee Loading

Explain your changes:

Explain how you tested your changes:

Checklist:

mrmr1993 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tizoc Mar 21, 2023 • edited Loading

Choose a reason for hiding this comment

tizoc Mar 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tizoc commented Mar 21, 2023

tizoc commented Mar 22, 2023

deepthiskumar commented Mar 29, 2023

deepthiskumar commented Mar 30, 2023

tizoc commented Mar 31, 2023

deepthiskumar commented Mar 31, 2023

deepthiskumar commented Apr 2, 2023

tizoc commented Apr 11, 2023

tizoc commented Apr 11, 2023

tizoc commented Apr 11, 2023

tizoc commented Apr 12, 2023

tizoc commented Apr 18, 2023

tizoc commented Apr 20, 2023

tizoc commented Apr 20, 2023

deepthiskumar commented Apr 20, 2023

tizoc commented Mar 20, 2023 •

edited by georgeee

Loading

tizoc Mar 21, 2023 •

edited

Loading

tizoc Mar 21, 2023 •

edited

Loading