Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Add dumprecords command #745

Merged
merged 3 commits into from
Jan 25, 2024

Conversation

JanKrivanek
Copy link
Collaborator

@JanKrivanek JanKrivanek commented Jan 17, 2024

Fixes #741

Context

#741 brought a nice idea. I needed something similar for my investigation - just counts, not sizes - so I stitched something for both.

Samples

Just summary on OrchardCore (runs for about 0.7s on my machine):

>binlogtool dumprecords OrchardCore.binlog --include-total --include-rollup --exclude-details

Total records: 1069320

RecordType, Count, Size[bytes]
==============================
Message, 568034, 38.46
NameValueList, 38359, 60.16
BuildStarted, 1, 12.00
ProjectEvaluationStarted, 417, 35.90
ProjectImported, 64414, 55.11
AssemblyLoad, 22, 55.59
PropertyReassignment, 23492, 41.90
ProjectEvaluationFinished, 417, 1912.71
ProjectStarted, 14300, 56.33
TargetStarted, 49962, 38.80
TaskStarted, 23227, 36.90
TaskFinished, 23227, 34.15
TargetFinished, 49962, 52.04
TaskParameter, 145155, 98.90
EnvironmentVariableRead, 792, 37.90
TargetSkipped, 53035, 58.53
ProjectFinished, 14300, 31.30
TaskCommandLine, 198, 40.51
Error, 5, 46.00
BuildFinished, 1, 14.00

Details for binlog from new console:

>binlogtool dumprecords

RecordType, Size[bytes]
=======================
Message, 50
Message, 50
NameValueList, 9
BuildStarted, 12
Message, 46
Message, 46
Message, 46
Message, 46
Message, 46
ProjectEvaluationStarted, 34
AssemblyLoad, 54
AssemblyLoad, 54
AssemblyLoad, 54
AssemblyLoad, 54
Message, 41
ProjectImported, 45
ProjectImported, 49
Message, 39
...

Limitations

  • Summaries won't run on broken binlogs (just the details list) - feels as acceptable limitation (broken binlogs should be less frequent and just a list of records untill the last complete should then suffice)
  • The reported sizes for the new format do not include the record type and the offset itself - so there are 2 bytes per each record not reported (but I'd say that's how we want it)

Breaking changes

  • BinLogReader.WrapperStream was removed. It was public, but hopefully nobody was using it. If yes - they can easily switch to TransparentReadStream.

Perf

Summary for OrchardCore runs ~0.178 seconds with the new binlogs (offsets included), ~1.967 seconds with old binlogs. Both after warmup runs.
On full details dumping the difference is not so visible, as majority of the time is spend flushing the records (it's ~56 vs ~58 seconds)

@KirillOsenkov
Copy link
Owner

Does it support the full list of all records (not the summary)? Useful for diffing. Even just the types of records would be valuable so that you can diff two .txt files and see what is the first record where they start to differ.

Would be nice to support formats before 18 too. We can compute records lengths like ReadRecords does, see my draft in #741

@JanKrivanek
Copy link
Collaborator Author

@KirillOsenkov - All comments sounds sound and easily accomodable - I'll work on this in a few days.

As for:

Does it support the full list of all records (not the summary)? Useful for diffing. Even just the types of records would be valuable > so that you can diff two .txt files and see what is the first record where they start to differ.

The ungrouped list is the default mode. I added the optional groupping as that's what I needed for my other investigation.
Though - do you expect the sizes to differ a bit? Would you want to have an option to dump just the types, without sizes?

@KirillOsenkov
Copy link
Owner

When I have to investigate a corrupt binlog, I usually have the good and the bad binlogs. So I dump a flat list of records to see which record is the first broken one. Even the record number and kind would be valuable, but if we can have the size of each record it'd be better (for completeness). If the size of the corrupted record changes, would be nice to know it.

We could deduce the record size from the start positions of records in the stream (this is how we show the progress bar when reading the binlog). But of course in the new format we get the record sizes for free.

@JanKrivanek JanKrivanek marked this pull request as ready for review January 23, 2024 18:12
@KirillOsenkov KirillOsenkov merged commit 10c5464 into KirillOsenkov:main Jan 25, 2024
1 check passed
@KirillOsenkov
Copy link
Owner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add binlogtool dumprecords
2 participants