-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internal tracing implementation (internal tracing PR 1 of 3) #12703
Internal tracing implementation (internal tracing PR 1 of 3) #12703
Conversation
604f9d3
to
c2cd190
Compare
Question, would it be a problem to update yojson to version 2.0.2? That would let me avoid a bunch of small allocations when handling the logged events by reusing a permanent buffer when rendering the JSON strings from the Yojson values. The --- /home/bruno/projects/mina-protocol/mina/opam.export 2023-02-21 18:48:29.462164153 -0300
+++ opam.export 2023-02-24 09:56:21.981966163 -0300
@@ -1,4 +1,5 @@
opam-version: "2.0"
+compiler: ["ocaml-base-compiler.4.14.0"]
roots: [
"alcotest.1.1.0"
"angstrom.0.15.0"
@@ -12,16 +13,16 @@
"cohttp-async.5.0.0"
"core_extended.v0.14.0"
"extlib.1.7.8"
- "graphql-async.0.13.0"
+ "graphql-async.0.14.0"
"graphql-cohttp.0.13.0"
- "graphql-lwt.0.13.0"
+ "graphql-lwt.0.14.0"
"graphql_ppx.1.2.2"
"js_of_ocaml.4.0.0"
"js_of_ocaml-ppx.4.0.0"
"js_of_ocaml-toplevel.4.0.0"
"lmdb.1.0"
"menhir.20210419"
- "merlin.4.5-414"
+ "merlin.4.7-414"
"ocaml-base-compiler.4.14.0"
"ocamlformat.0.20.1"
"ocamlgraph.1.8.8"
@@ -38,7 +39,7 @@
"rpc_parallel.v0.14.0"
"sexp_diff_kernel.v0.14.0"
"utop.2.9.1"
- "yojson.1.7.0"
+ "yojson.2.0.2"
]
installed: [
"alcotest.1.1.0"
@@ -102,7 +103,7 @@
"ctypes-foreign.0.4.0"
"digestif.0.9.0"
"domain-name.0.3.0"
- "dot-merlin-reader.4.2"
+ "dot-merlin-reader.4.6"
"dune.3.3.1"
"dune-build-info.3.1.1"
"dune-configurator.2.8.2"
@@ -115,10 +116,10 @@
"fix.20201120"
"fmt.0.8.6"
"fpath.0.7.3"
- "graphql.0.13.0"
- "graphql-async.0.13.0"
+ "graphql.0.14.0"
+ "graphql-async.0.14.0"
"graphql-cohttp.0.13.0"
- "graphql-lwt.0.13.0"
+ "graphql-lwt.0.14.0"
"graphql_parser.0.12.2"
"graphql_ppx.1.2.2"
"incremental.v0.14.0"
@@ -144,8 +145,9 @@
"menhir.20210419"
"menhirLib.20210419"
"menhirSdk.20210419"
- "merlin.4.5-414"
+ "merlin.4.7-414"
"merlin-extend.0.6.1"
+ "merlin-lib.4.7-414"
"mew.0.1.0"
"mew_vi.0.5.0"
"minicli.5.0.2"
@@ -253,7 +255,7 @@
"uuseg.13.0.0"
"uutf.1.0.2"
"variantslib.v0.14.0"
- "yojson.1.7.0"
+ "yojson.2.0.2"
"zarith.1.7"
"zarith_stubs_js.v0.14.1"
"zed.3.1.0" |
3a8f0b3
to
cfe1559
Compare
Still need to revise the recorded checkpoints, and maybe add more documentation. API could change a little but I think it is ready for reviewing as-is. Removing draft status. |
f65863f
to
b773585
Compare
There's an existing logging mechanism, is this additional mechanism needed? You could add a |
(EDIT: see next comment, I may have misunderstood what you were suggesting) Hello @psteckler, that is a good question. That approach was considered when switching to the current implementation, and the separate output channel is not strictly needed (except for the part that attaches the current block information to the async execution context and detects when the async scheduler switches to processing a different block, so that checkpoints can be properly associated to the correct block trace), but the current implementation has some advantages:
|
Thinking a bit more about this, I may have mis-understood what you meant @psteckler.
I got confused by the above because that level exists already, and at some point I considered just adding more But what you meant is that I could add a new separate level for these internal traces, correct? Then by registering a new log consumer with a custom processor and transport, I could do the same thing I am doing now. So I can still control the format, output target, etc, and also be able to dynamically toggle internal tracing without affecting regular logging. I think that would solve many of the issues I mentioned above, but a big one that is not solved is the requirement for having a logger handle in the scope (but maybe there is a solution for that). |
I'd forgotten about the existing Propagating the |
I did a quick test by adding a dummy I will investigate your suggested approach more, and if I don't find any serious blocker (but I think it should work fine), I will commit an alternative version based on that. |
b773585
to
c8fc3d2
Compare
@psteckler hello. I just pushed a new version that uses the logging system as you suggested. A new checkpoints looks like this now: [%log spy] "Generate_next_state" ;
let%bind next_state_opt =
generate_next_state ~constraint_constants ~scheduled_time
~block_data ~previous_protocol_state ~time_controller
~staged_ledger:(Breadcrumb.staged_ledger crumb)
~transactions ~get_completed_work ~logger ~log_block_creation
~winner_pk:winner_pubkey ~block_reward_threshold
in
[%log spy] "Generate_next_state_done" ; and special commands look like this: [%log spy] "@block_metadata"
~metadata:
[ ( "blockchain_length"
, Mina_numbers.Length.to_yojson
@@ Mina_block.blockchain_length
@@ Breadcrumb.block breadcrumb )
] ; The |
Btw, a few things may change (probably not much) and I still have to document the new interface, but it would be very useful to get your opinion on this version. Once everything is ready and if we decide to go with this version I will re-organize the commits so that they follow the original structure:
|
@tizoc Thanks, I prefer this style. There is still the one comment I had made re Would |
Great! I am going to settle with this approach then. I will reorganize the commits once I am done verifying the checkpoints and making sure I am not missing anything important that the external trace processing tool needs.
Sure, I will fix that in the final pass (I was avoiding any changes to the original code unless required to trace new checkpoints). (EDIT: just fixed it now instead)
Yes, that works too. |
b778ea1
to
5784a0b
Compare
@deepthiskumar pushed. |
Oops, pushed to another PR's branch, pushed here now |
!ci-build-me |
The docker image issue is fixed now, but I now see new failures that weren't there before. |
!ci-build-me |
2 similar comments
!ci-build-me |
!ci-build-me |
a8cf9db
to
333323e
Compare
333323e
to
abbd24d
Compare
!ci-build-me |
!ci-build-me |
1 similar comment
!ci-build-me |
hmmm, I got an invite to the organization, but seems I still cannot trigger the CI run myself |
!ci-build-me |
!ci-build-me |
1 similar comment
!ci-build-me |
@tizoc you need to be a public member. You should be able to change that in your settings |
@deepthiskumar thank you!! Just tested it here and it worked now #12874 (comment) |
!approved-for-mainnet |
This PR implements support for internal tracing. It is still a WIP but is already close to the final version.
Note that this is a new implementation, different from the one we have been deploying to our cluster, but it records enough information to allow for the reproduction of the same features provided by the original implementation.
This version is much simpler, and only records the flat, low-level trace without trying to make any sense of it. This will be done by an external consumer (unlike in the version we have been using so far that performs this inside the node, and exposes traces through new graphql queries). This external consumer doesn't need to belong to this repository, and can change without requiring changes to the node itself (except when new traces or metadata is required).
The interface and format are documented in https://github.com/openmina/mina/blob/feature/internal-tracing-develop/src/lib/internal_tracing/internal_tracing.mli
Impact:
Scope:
Right now there are 3 commits separated to ease reviewing.
I will keep adding commits on top but eventually I will merge them back accordingly to each one of those 3 commits.
Pending:
Proof tracing. @binier is working on this, it needs to be adapted to this new version of internal tracing (shouldn't be an issue). May be submitted as a separate PR later?(left for later)-internal-tracing
flag enables it).develop
)develop
when compared torelease/2.0.0
)Record any other events that may be useful (like status changes: bootstrap, synced, etc)(left for later)Unit/integration tests? not sure how to best approach this, but ideally during the test suite traces would be collected and checked to ensure that they remain correct after changes made to the node.(left for later)Checklist: