Performance analysis #63

Glandos · 2018-07-25T10:30:40Z

When working with fail2ban, I noticed that some operations were really slow while reading from journal instead of text log files. This is not really noticeable with the default daemon operation of fail2ban, but more with its test tool called fail2ban-regex.

The main idea is that currently, all field's values are converted, which is completely uneeded for a lot of use cases. Moreover, the use of a ChainMap seems useless, but it is extremely costly, especially when no custom converters are provided.

I've implemented a test tool to monitor some possible improvements. Here are the raw results:

On the X axis, you can see patch sets, grouped by loads, that are further described. On the Y axis, these are number of cycles, as reported by perf (from the Linux kernel). The tests were performed with two sets:

One journal from openSuse Tumbleweed desktop: Intel Core i5-3320M, 138371 messages with 2928331 fields.
One journal from my Debian/testing server: Intel Core2 Duo E8400. I lost the figures, but it was roughly the same.

The chart above is from the Tumbleweed desktop, but the chart made using Core2 has strictly the same ratio. Only the numbers are a bit higher.

Loads

They are operations to be done with one journal entry. There are currently 4 of them:

empty: do nothing with the entry. This is a corner case.
digest: do some simple maths. It extracts two fields, _SYSTEMD_UNIT and MESSAGE.
fail2ban: extracted from fail2ban code, to format an entry in order to be parsed by fail2ban core logic.
read_all_fields: this reads all fields and values, and just count the fields. This is a corner case.

Patch sets

init_reader_nochainmap (aka nochainmap)

Currently, the Reader class uses a ChainMap to merge default converters and user provided ones. This is only done if Python version is greather or equal than 3.3. I've noticed that the time spent into the get() method is really high. It seems that ChainMap is good for merging either big dictionaries (where a call to update() is costly), or dynamic dictionaries (where an update into one dictonary is reflected into the ChainMap instance). It seems that Reader is not in any of these 2 cases, so this patche completely removes the use of ChainMap, copies the default converters dict, and updates it with custom converters if needed.

This is by far the easiest, less intrusive, and most efficient patch in all cases.

convert_entry_lazy

This is the first step of a convert-on-demand step. Instead of converting all field values in _convert_entry, we just return a MutableMapping with all already fetched values as bytes, and convert values only on request. Converted values are not recorded, since we assume the caller will not try to fetch the same field twice.

get_next() is untouched, so special fields (__CURSOR, __MONOTONIC_TIMESTAMP, __REALTIME_TIMESTAMP) are already fetched, but not converted.

This is a good approach, but there is nearly always a better way to do it, except for the read_all_fields corner case.

get_next_lazy (aka next_lazy)

This is an evolution of convert_entry_lazy, that modifies get_next() itself. By default, there is no call to _get_all(), and only an instance of Mapping is returned. Special fields are not even computed. When requested, a field is then fetched using _get() and then _convert_value(). Result is not recorded for the same reason.

To be able to iterate on this Mapping, the first call to __iter__ or __len__ will fetch all fields using _get_all(), add special field names, and record the result. A __bool__ method is also provided to avoid costly call to __len__() just to test if the current entry is valid (it is).

This is the best approach for cases where there is a small amount of fields to read, as in empty and digest loads.

get_next_lazy_prefetched (aka next_lazy_prefetched)

This is a mix between convert_entry_lazy and get_next_lazy, to show the effects of just calling _get_all(), without creating special entries. Values are only converted on request, and special fields are fetched and converted on request.

This is usually a better approach than convert_entry_lazy, except in read_all_fields, which I can't explain. As we can see in the digest load, the difference between this and get_next_lazy is minimal, so we can assume that calling _get_all() has a negligible impact, except for the empty corner case.

convert_value_nolist (immature)

This patch is not visible in the chart.

The goal is to remove the isinstance() test when converting values, since the vast majority of values are not list. The impact is not negligible, but I found some logs where values returned by _get_all are lists. The journal format has no such list, but field name can be duplicated, so _get_all gather them into lists. In my tests, I found 3 messages with such cases, using journal from openSuse Tumbleweed, with SYSLOG_IDENTIFIER field. On Debian/testing, I was not able to found this case.

Analysis

Disable ChainMap usage in all cases: this is always a win. I will open a merge request for this soon.

Use convert-on-demand: In nearly all cases, it is more interesting to call _get_all() without converting anything.

Import caveat for get_next_lazy: on request, it uses calls to _get(). If a field is duplicated, _get only returns the first value, while _get_all will gather all values in a list. This is notified in systemd/systemd#9696.

As a result of this, I would like to write a patch for get_next() that does the same job as get_next_lazy_prefetched by default, with an optional boolean (defaulted to True) for calling or not get_all. When set to False, it will be the same as get_next_lazy, with the important caveat mentioned before. But for small loads, it will be faster.

Conclusion

I would like to know if you have any thoughts about this:

is the cycles a good metric?
have I missed something about some edge cases in journal entries?
[…]

You can use run_all.sh in my own tool, that will write a report_aggregate.csv which is easy to turn into chart using LibreOffice or any other charting tool of your choice.

The text was updated successfully, but these errors were encountered:

See systemd#63

keszybz · 2020-11-12T18:21:32Z

I merged the patch to remove ChainMap. For the other stuff: if you have something worth fixing, please submit a patch.

Glandos added a commit to Glandos/python-systemd that referenced this issue Jul 26, 2018

Remove ChainMap

1ec9a5c

See systemd#63

Glandos mentioned this issue Jul 26, 2018

Remove ChainMap #64

Closed

keszybz closed this as completed in 2166b57 Nov 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance analysis #63

Performance analysis #63

Glandos commented Jul 25, 2018

keszybz commented Nov 12, 2020

Performance analysis #63

Performance analysis #63

Comments

Glandos commented Jul 25, 2018

Loads

Patch sets

init_reader_nochainmap (aka nochainmap)

convert_entry_lazy

get_next_lazy (aka next_lazy)

get_next_lazy_prefetched (aka next_lazy_prefetched)

convert_value_nolist (immature)

Analysis

Conclusion

keszybz commented Nov 12, 2020