Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid formatting in the binary output #942

Closed
larytet opened this issue Dec 9, 2018 · 3 comments
Closed

Avoid formatting in the binary output #942

larytet opened this issue Dec 9, 2018 · 3 comments

Comments

@larytet
Copy link

larytet commented Dec 9, 2018

We do not have to output constant format strings at all in the binary format. There are two possibilities

  • Collect the constant strings (format strings) in the compilation phase, create a table of all format strings, use constexpr hash of the filename and source line as a key in the table. When decompressing the binary stream look for the format strings in the table or in the source code itself. Example https://github.com/larytet/emcpp/blob/master/src/Log.h#L98
  • Cache the format string using the address of the string as a key. The idea is that linker places constant strings in one place - .text section of the executable. If the address is not belonging to the .text hashtable which uses the format string hash will do. Example (Go): https://github.com/larytet/binlog

In the first approach application does not analyse the format strings. In the first approach format strings will not be part of the executable file (the .text section of the ELF is going to be smaller).

In the second approach the application analyses every format string once, keeps all required for output to the binary stream information.

In both approaches the application outputs only variadic arguments.

Both approaches limit the application to use immutable format strings, and, for the best performance, constant strings. The upside is x2-x4 performance gain and a smaller log.

@gabime
Copy link
Owner

gabime commented Dec 9, 2018

We do not have to output constant format strings at all in the binary forma

No sure I understand what is the binary format.
What is the use case ? sending log data to a different process ?

@larytet
Copy link
Author

larytet commented Dec 9, 2018

Example

In the perfect world I would like to have a way to do something like

auto my_logger = spdlog::binary_logger("binary_logger", "logs/basic-log.bin");
// Outputs 64 bits hash of the format string "Hello %s" 
// and a null terminated string "world" to the file "logs/basic-log.bin"
my_logger->info("Hello: %s", "world");

My assumptions

  • I need logs for human consumption from time to time (rarely)
  • I want to keep logging on at all times, I want to log as much as possible
  • Machine processing does not require human readable format

Objectives

When logging binary data (instead of outputting of human readable lines) I have two objectives

  • Less log data, smaller log files, less I/O load
  • Faster processing, lower performance impact when generating the logs.

The idea

In the binary logging mode I can log only the variadic part of the logger arguments. In the binary logging mode I output "encoded" data. I can "decode" the log data on demand given the binary log itself and some additional input generated at compilation time. Think about data serialization applied to logging.

If I need a completely transparent way to consume binary logs by a human being I can add a user space filesystem which decodes log files on the go. When a user types tail -f mylog the filesystem analyses the output, converts if necessary and presents the user with a nicely formatted output ready for human consumption.

The approach I suggest only looks revolutionary. It is used quite often in systems with tight requirements for the response latency. I am surprised that I can not find open source which targets this use case.

@gabime
Copy link
Owner

gabime commented Dec 9, 2018

I am surprised that I can not find open source which targets this use case.

Take a look at nanolog. It does what you need.

@gabime gabime closed this as completed Dec 10, 2018
bachittle pushed a commit to bachittle/spdlog that referenced this issue Dec 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants