You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When working with fail2ban, I noticed that some operations were really slow while reading from journal instead of text log files. This is not really noticeable with the default daemon operation of fail2ban, but more with its test tool called fail2ban-regex.
The main idea is that currently, all field's values are converted, which is completely uneeded for a lot of use cases. Moreover, the use of a ChainMap seems useless, but it is extremely costly, especially when no custom converters are provided.
I've implemented a test tool to monitor some possible improvements. Here are the raw results:
On the X axis, you can see patch sets, grouped by loads, that are further described. On the Y axis, these are number of cycles, as reported by perf (from the Linux kernel). The tests were performed with two sets:
One journal from openSuse Tumbleweed desktop: Intel Core i5-3320M, 138371 messages with 2928331 fields.
One journal from my Debian/testing server: Intel Core2 Duo E8400. I lost the figures, but it was roughly the same.
The chart above is from the Tumbleweed desktop, but the chart made using Core2 has strictly the same ratio. Only the numbers are a bit higher.
Loads
They are operations to be done with one journal entry. There are currently 4 of them:
empty: do nothing with the entry. This is a corner case.
digest: do some simple maths. It extracts two fields, _SYSTEMD_UNIT and MESSAGE.
fail2ban: extracted from fail2ban code, to format an entry in order to be parsed by fail2ban core logic.
read_all_fields: this reads all fields and values, and just count the fields. This is a corner case.
Patch sets
init_reader_nochainmap (aka nochainmap)
Currently, the Reader class uses a ChainMap to merge default converters and user provided ones. This is only done if Python version is greather or equal than 3.3. I've noticed that the time spent into the get() method is really high. It seems that ChainMap is good for merging either big dictionaries (where a call to update() is costly), or dynamic dictionaries (where an update into one dictonary is reflected into the ChainMap instance). It seems that Reader is not in any of these 2 cases, so this patche completely removes the use of ChainMap, copies the default converters dict, and updates it with custom converters if needed.
This is by far the easiest, less intrusive, and most efficient patch in all cases.
convert_entry_lazy
This is the first step of a convert-on-demand step. Instead of converting all field values in _convert_entry, we just return a MutableMapping with all already fetched values as bytes, and convert values only on request. Converted values are not recorded, since we assume the caller will not try to fetch the same field twice.
get_next() is untouched, so special fields (__CURSOR, __MONOTONIC_TIMESTAMP, __REALTIME_TIMESTAMP) are already fetched, but not converted.
This is a good approach, but there is nearly always a better way to do it, except for the read_all_fields corner case.
get_next_lazy (aka next_lazy)
This is an evolution of convert_entry_lazy, that modifies get_next() itself. By default, there is no call to _get_all(), and only an instance of Mapping is returned. Special fields are not even computed. When requested, a field is then fetched using _get() and then _convert_value(). Result is not recorded for the same reason.
To be able to iterate on this Mapping, the first call to __iter__ or __len__ will fetch all fields using _get_all(), add special field names, and record the result. A __bool__ method is also provided to avoid costly call to __len__() just to test if the current entry is valid (it is).
This is the best approach for cases where there is a small amount of fields to read, as in empty and digest loads.
This is a mix between convert_entry_lazy and get_next_lazy, to show the effects of just calling _get_all(), without creating special entries. Values are only converted on request, and special fields are fetched and converted on request.
This is usually a better approach than convert_entry_lazy, except in read_all_fields, which I can't explain. As we can see in the digest load, the difference between this and get_next_lazy is minimal, so we can assume that calling _get_all() has a negligible impact, except for the empty corner case.
convert_value_nolist (immature)
This patch is not visible in the chart.
The goal is to remove the isinstance() test when converting values, since the vast majority of values are not list. The impact is not negligible, but I found some logs where values returned by _get_allare lists. The journal format has no such list, but field name can be duplicated, so _get_all gather them into lists. In my tests, I found 3 messages with such cases, using journal from openSuse Tumbleweed, with SYSLOG_IDENTIFIER field. On Debian/testing, I was not able to found this case.
Analysis
Disable ChainMap usage in all cases: this is always a win. I will open a merge request for this soon.
Use convert-on-demand: In nearly all cases, it is more interesting to call _get_all() without converting anything.
Import caveat for get_next_lazy: on request, it uses calls to _get(). If a field is duplicated, _get only returns the first value, while _get_all will gather all values in a list. This is notified in systemd/systemd#9696.
As a result of this, I would like to write a patch for get_next() that does the same job as get_next_lazy_prefetched by default, with an optional boolean (defaulted to True) for calling or not get_all. When set to False, it will be the same as get_next_lazy, with the important caveat mentioned before. But for small loads, it will be faster.
Conclusion
I would like to know if you have any thoughts about this:
is the cycles a good metric?
have I missed something about some edge cases in journal entries?
[…]
You can use run_all.sh in my own tool, that will write a report_aggregate.csv which is easy to turn into chart using LibreOffice or any other charting tool of your choice.
The text was updated successfully, but these errors were encountered:
Glandos
added a commit
to Glandos/python-systemd
that referenced
this issue
Jul 26, 2018
When working with fail2ban, I noticed that some operations were really slow while reading from journal instead of text log files. This is not really noticeable with the default daemon operation of
fail2ban
, but more with its test tool calledfail2ban-regex
.The main idea is that currently, all field's values are converted, which is completely uneeded for a lot of use cases. Moreover, the use of a
ChainMap
seems useless, but it is extremely costly, especially when no custom converters are provided.I've implemented a test tool to monitor some possible improvements. Here are the raw results:
On the X axis, you can see patch sets, grouped by loads, that are further described. On the Y axis, these are number of cycles, as reported by
perf
(from the Linux kernel). The tests were performed with two sets:The chart above is from the Tumbleweed desktop, but the chart made using Core2 has strictly the same ratio. Only the numbers are a bit higher.
Loads
They are operations to be done with one journal entry. There are currently 4 of them:
_SYSTEMD_UNIT
andMESSAGE
.Patch sets
init_reader_nochainmap (aka nochainmap)
Currently, the
Reader
class uses aChainMap
to merge default converters and user provided ones. This is only done if Python version is greather or equal than 3.3. I've noticed that the time spent into theget()
method is really high. It seems thatChainMap
is good for merging either big dictionaries (where a call toupdate()
is costly), or dynamic dictionaries (where an update into one dictonary is reflected into theChainMap
instance). It seems thatReader
is not in any of these 2 cases, so this patche completely removes the use ofChainMap
, copies the default converters dict, and updates it with custom converters if needed.This is by far the easiest, less intrusive, and most efficient patch in all cases.
convert_entry_lazy
This is the first step of a convert-on-demand step. Instead of converting all field values in
_convert_entry
, we just return aMutableMapping
with all already fetched values as bytes, and convert values only on request. Converted values are not recorded, since we assume the caller will not try to fetch the same field twice.get_next()
is untouched, so special fields (__CURSOR
,__MONOTONIC_TIMESTAMP
,__REALTIME_TIMESTAMP
) are already fetched, but not converted.This is a good approach, but there is nearly always a better way to do it, except for the read_all_fields corner case.
get_next_lazy (aka next_lazy)
This is an evolution of convert_entry_lazy, that modifies
get_next()
itself. By default, there is no call to_get_all()
, and only an instance ofMapping
is returned. Special fields are not even computed. When requested, a field is then fetched using_get()
and then_convert_value()
. Result is not recorded for the same reason.To be able to iterate on this
Mapping
, the first call to__iter__
or__len__
will fetch all fields using_get_all()
, add special field names, and record the result. A__bool__
method is also provided to avoid costly call to__len__()
just to test if the current entry is valid (it is).This is the best approach for cases where there is a small amount of fields to read, as in empty and digest loads.
get_next_lazy_prefetched (aka next_lazy_prefetched)
This is a mix between convert_entry_lazy and get_next_lazy, to show the effects of just calling
_get_all()
, without creating special entries. Values are only converted on request, and special fields are fetched and converted on request.This is usually a better approach than convert_entry_lazy, except in read_all_fields, which I can't explain. As we can see in the digest load, the difference between this and get_next_lazy is minimal, so we can assume that calling
_get_all()
has a negligible impact, except for the empty corner case.convert_value_nolist (immature)
This patch is not visible in the chart.
The goal is to remove the
isinstance()
test when converting values, since the vast majority of values are not list. The impact is not negligible, but I found some logs where values returned by_get_all
are lists. The journal format has no such list, but field name can be duplicated, so_get_all
gather them into lists. In my tests, I found 3 messages with such cases, using journal from openSuse Tumbleweed, withSYSLOG_IDENTIFIER
field. On Debian/testing, I was not able to found this case.Analysis
Disable
ChainMap
usage in all cases: this is always a win. I will open a merge request for this soon.Use convert-on-demand: In nearly all cases, it is more interesting to call
_get_all()
without converting anything.Import caveat for
get_next_lazy
: on request, it uses calls to_get()
. If a field is duplicated,_get
only returns the first value, while_get_all
will gather all values in a list. This is notified in systemd/systemd#9696.As a result of this, I would like to write a patch for
get_next()
that does the same job as get_next_lazy_prefetched by default, with an optional boolean (defaulted to True) for calling or notget_all
. When set toFalse
, it will be the same as get_next_lazy, with the important caveat mentioned before. But for small loads, it will be faster.Conclusion
I would like to know if you have any thoughts about this:
You can use
run_all.sh
in my own tool, that will write areport_aggregate.csv
which is easy to turn into chart using LibreOffice or any other charting tool of your choice.The text was updated successfully, but these errors were encountered: