Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance analysis #63

Closed
Glandos opened this issue Jul 25, 2018 · 1 comment
Closed

Performance analysis #63

Glandos opened this issue Jul 25, 2018 · 1 comment

Comments

@Glandos
Copy link
Contributor

Glandos commented Jul 25, 2018

When working with fail2ban, I noticed that some operations were really slow while reading from journal instead of text log files. This is not really noticeable with the default daemon operation of fail2ban, but more with its test tool called fail2ban-regex.

The main idea is that currently, all field's values are converted, which is completely uneeded for a lot of use cases. Moreover, the use of a ChainMap seems useless, but it is extremely costly, especially when no custom converters are provided.

I've implemented a test tool to monitor some possible improvements. Here are the raw results:
report_all

On the X axis, you can see patch sets, grouped by loads, that are further described. On the Y axis, these are number of cycles, as reported by perf (from the Linux kernel). The tests were performed with two sets:

  • One journal from openSuse Tumbleweed desktop: Intel Core i5-3320M, 138371 messages with 2928331 fields.
  • One journal from my Debian/testing server: Intel Core2 Duo E8400. I lost the figures, but it was roughly the same.

The chart above is from the Tumbleweed desktop, but the chart made using Core2 has strictly the same ratio. Only the numbers are a bit higher.

Loads

They are operations to be done with one journal entry. There are currently 4 of them:

  • empty: do nothing with the entry. This is a corner case.
  • digest: do some simple maths. It extracts two fields, _SYSTEMD_UNIT and MESSAGE.
  • fail2ban: extracted from fail2ban code, to format an entry in order to be parsed by fail2ban core logic.
  • read_all_fields: this reads all fields and values, and just count the fields. This is a corner case.

Patch sets

init_reader_nochainmap (aka nochainmap)

Currently, the Reader class uses a ChainMap to merge default converters and user provided ones. This is only done if Python version is greather or equal than 3.3. I've noticed that the time spent into the get() method is really high. It seems that ChainMap is good for merging either big dictionaries (where a call to update() is costly), or dynamic dictionaries (where an update into one dictonary is reflected into the ChainMap instance). It seems that Reader is not in any of these 2 cases, so this patche completely removes the use of ChainMap, copies the default converters dict, and updates it with custom converters if needed.

This is by far the easiest, less intrusive, and most efficient patch in all cases.

convert_entry_lazy

This is the first step of a convert-on-demand step. Instead of converting all field values in _convert_entry, we just return a MutableMapping with all already fetched values as bytes, and convert values only on request. Converted values are not recorded, since we assume the caller will not try to fetch the same field twice.

get_next() is untouched, so special fields (__CURSOR, __MONOTONIC_TIMESTAMP, __REALTIME_TIMESTAMP) are already fetched, but not converted.

This is a good approach, but there is nearly always a better way to do it, except for the read_all_fields corner case.

get_next_lazy (aka next_lazy)

This is an evolution of convert_entry_lazy, that modifies get_next() itself. By default, there is no call to _get_all(), and only an instance of Mapping is returned. Special fields are not even computed. When requested, a field is then fetched using _get() and then _convert_value(). Result is not recorded for the same reason.

To be able to iterate on this Mapping, the first call to __iter__ or __len__ will fetch all fields using _get_all(), add special field names, and record the result. A __bool__ method is also provided to avoid costly call to __len__() just to test if the current entry is valid (it is).

This is the best approach for cases where there is a small amount of fields to read, as in empty and digest loads.

get_next_lazy_prefetched (aka next_lazy_prefetched)

This is a mix between convert_entry_lazy and get_next_lazy, to show the effects of just calling _get_all(), without creating special entries. Values are only converted on request, and special fields are fetched and converted on request.

This is usually a better approach than convert_entry_lazy, except in read_all_fields, which I can't explain. As we can see in the digest load, the difference between this and get_next_lazy is minimal, so we can assume that calling _get_all() has a negligible impact, except for the empty corner case.

convert_value_nolist (immature)

This patch is not visible in the chart.

The goal is to remove the isinstance() test when converting values, since the vast majority of values are not list. The impact is not negligible, but I found some logs where values returned by _get_all are lists. The journal format has no such list, but field name can be duplicated, so _get_all gather them into lists. In my tests, I found 3 messages with such cases, using journal from openSuse Tumbleweed, with SYSLOG_IDENTIFIER field. On Debian/testing, I was not able to found this case.

Analysis

Disable ChainMap usage in all cases: this is always a win. I will open a merge request for this soon.

Use convert-on-demand: In nearly all cases, it is more interesting to call _get_all() without converting anything.

Import caveat for get_next_lazy: on request, it uses calls to _get(). If a field is duplicated, _get only returns the first value, while _get_all will gather all values in a list. This is notified in systemd/systemd#9696.

As a result of this, I would like to write a patch for get_next() that does the same job as get_next_lazy_prefetched by default, with an optional boolean (defaulted to True) for calling or not get_all. When set to False, it will be the same as get_next_lazy, with the important caveat mentioned before. But for small loads, it will be faster.

Conclusion

I would like to know if you have any thoughts about this:

  • is the cycles a good metric?
  • have I missed something about some edge cases in journal entries?
  • […]

You can use run_all.sh in my own tool, that will write a report_aggregate.csv which is easy to turn into chart using LibreOffice or any other charting tool of your choice.

Glandos added a commit to Glandos/python-systemd that referenced this issue Jul 26, 2018
@Glandos Glandos mentioned this issue Jul 26, 2018
@keszybz
Copy link
Member

keszybz commented Nov 12, 2020

I merged the patch to remove ChainMap. For the other stuff: if you have something worth fixing, please submit a patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants