-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance/reliability considerations for TLSRPT internal storage #2
Comments
Regarding: Performance and reliability considerationsPreliminary tests have shown that commit operations are expensive in SQLite. As we do not need transactional safety for every single record, the expected load should not pose a problem. We envision two tuneable parameters on the daemon side:
These configuration parameters will not be exclusive but will act in combination. |
In my comments below I assume batches with up to 100 updates, and an MTA sending 1500 updates/s I suppose that the simulation involved a loop around blocking database update calls. In some TLSRPT design, the MTA sends datagrams to the TLSRPT receiver, so that the MTA will not be blocked by the flow control that is part of a connection-oriented protocol. Perhaps the TLSRPT receiver implementation can use distinct threads for flushing the database and for receiving updates from the MTA, so that the receiver won't miss too many updates during the database flush every 1/15th second? Unlike a update-generating loop that blocks when a database flushes buffers, the MTA's updates will arrive stochastically in time. If a single-threaded receiver can handle 1500 updates/s in a blocking flow, then I expect that it will start to miss updates above 500/s with a stochastic flow. What happens in the real world will depend on kernel buffer capacity. |
I wrote a in November 2023:
The above idea based on atomic appends will not work for a multi-writer implementation (multiple writers per file). On many BSD-based systems, PIPE_BUF is the required minimum of 512 bytes. That is already smaller than typical status updates observed with an actual implementation, and a design should be able to handle updates that have 3x the typical size. Gzip compression would reduce the size of an update to 60%, and would not solve the problem. The idea can still work for a single-writer implementation (i.e one writer per file), because that does not need atomic appends. |
The current design is a single-writer implementation buffering in RAM and writing to disk when a configurable amount of datagrams (1000 datagrams) has arrived or in times of low load after a configurable interval (5 seconds) has passed. |
However, the readout especially of the domain list indeed is challenging, but will be solved by switching the database each day. Writing only happens for today´s data, while reading only happens fpr yesterday´s data. That way reader and writer do not conflict at all. |
The newest commit not only changed the livrary build to GNU Autotools but also added a program "bench" in tools/benchmark with several commandline parameters. The "bench" tool first tries to measure the maximum rate of datagrams with a blocking socket. Then in an endless loop a numbe rof background threads specified with the--threads option are run at varying rates, starting at 10% of the maximum rate in 10% increments up to 90% before restarting at 10%. The --burstwait parameter specifies the seconds to wait between burst loads. This gives an impression of what peak loads can be handled during what average loads. The --newsock parameter can be used to switch to reusing the exisiting connection and sockets. |
This note is based on "TLSRPT for MTAs" Version 0.01. I summarize my understanding of the global architecture, present ball-park performance numbers, and make suggestions for the internal storage.
Over-all architecture
Client-side library. Each call reports the status of one TLS session (success, or one of the specified modes). The library is written in C and may be called from MTAs written in C or any language that can call into C (examples: C++, Go, Java, and many dynamically-compiled languages).
TLSRPT receiver. This receives one report from a client library over some IPC channel and updates an internal log. There may be one TLSRPT receiver per MTA instance, or a shared receiver for a group of MTA instances. But see performance/reliability considerations below.
Storage layer. This persists status information until it is needed to generate a TLSRPT report.
TLSRPT reporter. This generates a daily aggregate report on behalf of one or more MTA instances, and submits the result according to a policy published by a mail sending domain.
Performance and reliability considerations
A high-performance MTA such as Postfix manages multiple concurrent SMTP connections (up to 100 by default). Each SMTP protocol engine and associated TLS engine are managed by one SMTP client process. Updates through the TLSRPT client library will therefore be made concurrently.
Depending on destinations and configuration, one can expect that a typical Postfix MTA will max out at ~300 outbound connections/second. This was ~300 in 2012 when TLS was not as universal as it is now (STARTTLS adds ~three TCP round trip times), and when computers and networks were a bit slower (but not by a lot). See Viktor Dukhovni's post in https://groups.google.com/g/mailing.postfix.users/c/pPcRJFJmdeA
The C client library does not guarantee that a status update will reach a TLSRPT receiver. A status that cannot be sent will be dropped without blocking progress in an MTA. It is therefore OK if the persistence layer cannot accept every status update, however it should not lose updates under forseeable loads.
The design considers using SQLite for storage. By default the SQLite update latency is measured in hundreds of milliseconds, i.e. 10 updates/second where a single Postfix instance needs up to ~300 updates/second. Part of this latency is caused by SQLite invoking fsync() for every update. These fsync() calls would not just slow down SQLite, but they would also hurt MTA performance, especially when a message has multiple SMTP destinations. Postfix is careful to call fsync() only once during the entire lifetime of a message; I had to convince Linux distributions to NOT fsync() the maillog file after every record, because their syslogd daemon was consuming more resources than all Postfix processes combined.
The SQLite update latency can be reduced by 'batching' database updates in a write-ahead log, (for example, PRAGMA journal_mode = WAL; PRAGMA wal_autocheckpoint = 0; PRAGMA synchronous = NORMAL;) but now you need to periodically flush the write-ahead log, or turn on wal_autocheckpoint. For examples, see https://stackoverflow.com/questions/21590824/sqlite-updating-one-record-is-very-relatively-slow
Observations and suggestions
I am not convinced that batching SQLite updates will be sufficient to handle forseeable status update rates from even a single Postfix MTA instance.
To handle forseeable update rates, perhaps TLSRPT internal storage can be implemented as a collection of sequential append-only files with names that correspond to the corresponding reporting time window.
As long a write to a (local) file is smaller than PIPE_BUF bytes (see below) the POSIX spec guarantees that the write is atomic. Combined with O_APPEND (see below), this guarantees that write-append operations will be serialized. My expectation is that the size of as status update will be well under the minimum PIPE_BUF value..
Background
Semantics of O_APPEND and atomic writes <= PIPE_BUF. https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html
PIPE_BUF (_POSTIX_PIPE_BUF) is not smaller than 512 bytes. https://pubs.opengroup.org/onlinepubs/7908799/xsh/limits.h.html
The text was updated successfully, but these errors were encountered: