Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-30962: Add caching to Logger.isEnabledFor() #2752

Merged
merged 7 commits into from
Jul 30, 2017
Merged

Conversation

avylove
Copy link
Contributor

@avylove avylove commented Jul 18, 2017

This is a simple and brute-force caching method, but cuts the time required to log non-enabled levels in half.

Another approach would be to move the cache dictionary to the manager. This would speed up cache clearing, but take additional lookups during normal operation.

import timeit

count = 10000000

set_up = """
import logging
level1 = logging.getLogger('1')
level2 = logging.getLogger('1.2')
level3 = logging.getLogger('1.2.3')
"""

set_up_with_caching = """
import logging_with_caching as logging
level1 = logging.getLogger('1')
level2 = logging.getLogger('1.2')
level3 = logging.getLogger('1.2.3')
"""

statement = "level3.debug('Debug message')"

orig = timeit.timeit(statement, setup=set_up, number=count)
caching = timeit.timeit(statement, setup=set_up_with_caching, number=count)

print('Non-caching:  ', orig)
print('With Caching: ', caching)
print('Time decrease: %.2f%%' % ((orig - caching) / orig * 100))
Non-caching:   8.451851543039083
With Caching:  4.039903722004965
Time decrease: 52.20%

https://bugs.python.org/issue30962

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

Thanks again to your contribution and we look forward to looking at it!

@avylove avylove changed the title Add caching to Logger.isEnabledFor() bpo-30962: Add caching to Logger.isEnabledFor() Jul 18, 2017
@vstinner vstinner added the performance Performance or resource usage label Jul 18, 2017
Called when level changes are made
"""

_acquireLock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that a lock is useful here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it needed to keep another thread from changing loggerDict or the cache dictionaries?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, right, I first understood that the lock was supposed to protect the cache :-) I don't think that the cache deserves a lock.

self._cache[level] = level >= self.getEffectiveLevel()
_releaseLock()

return self._cache[level]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a risk of race condition if first you test if the key exists and then sets the key.

Why not using: "try: return self._cache[level] except KeyError: ..."?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real race condition is if clear_cache() is called in-between. But, yes, your proposed idiom would fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Fixed that to use try/except

self._cache[level] = level >= self.getEffectiveLevel()
_releaseLock()

return self._cache[level]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a still a race condition here if _clear_cache() is called just after _releaseLock(). You will need to store the value in a local variable, e.g:

_acquireLock()
is_enabled = self._cache[level] = ...
_releaseLock()
return is_enabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Updated

@pitrou pitrou requested a review from vsajip July 19, 2017 17:02
@pitrou
Copy link
Member

pitrou commented Jul 19, 2017

I've requested a review from Vinay, who is the official author and maintainer of the logging module.

Copy link
Member

@vsajip vsajip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs tests which will confirm that the caching works correctly. Nothing too complicated, just a test which works on both root and non-root loggers, sets levels and gets/asserts the effective levels, sets levels again and confirms effective levels are as expected, clears the cache manually and confirms the levels again etc.

@avylove
Copy link
Contributor Author

avylove commented Jul 28, 2017

@vsajip Test added. Let me know if this is what you're looking for.

Copy link
Member

@vsajip vsajip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detail is in comments against the code changes.

class CachingTest(BaseTest):
def test_caching(self):
root = self.root_logger
level1 = logging.getLogger("abc")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are loggers, not levels, and the names should reflect that. I understand you're talking about level in the logger hierarchy, but still, the use of level for the loggers is potentially confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will fix

self.assertEqual(level2._cache, {}) # Cache is empty

self.assertTrue(level2.isEnabledFor(logging.ERROR))
self.assertEqual(level2._cache, {logging.ERROR: True}) # Cache is populated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What values would the logger level cache dict have? Either empty or a level pointing to True? Could there be multiple keys in this dict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keys are the levels passes to isEnabledFor(), values are always True or False. I can lookup more levels to get a more well-rounded test.

self.assertFalse(level2.isEnabledFor(logging.ERROR))
self.assertTrue(root.isEnabledFor(logging.ERROR))
self.assertEqual(root._cache, {logging.ERROR: True}) # Root cache is populated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should do level2.setLevel(logging.NOTSET). and see that the isEnabledFor() for the two loggers and the root logger are as expected.

Also, it makes sense to add some tests for getEffectiveLevel() to ensure that the values are as expected without the presence of caching.

return self._cache[level]
except KeyError:
_acquireLock()
if self.manager.disable >= level:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not make more sense to cache the result of getEffectiveLevel(), rather than isEnabledFor()? From what I can see, the time saving is really from cutting out the hierarchy walk in the former method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is where most of the time savings is, but it's still faster (Something like 5%) than checking self.manager.disable every time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does have me thinking, in the LoggerAdapter class, there's currently no caching. LoggerAdapter.isEnabledFor() reimplements the logic from Logger.isEnabledFor(). If we point to Logger.isEnabledFor() instead of reimplementing, we get caching. I don't see a functional difference. Is there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vsajip The only thing I see is LoggerAdapter.getEffectiveLevel() could be overwritten and that would change the behavior of LoggerAdapter.isEnabledFor(). That would not be the case if LoggerAdapter.isEnabledFor() pointed to Logger.isEnabledFor(). So if we want caching on adapters (and we probably do), the caching logic should be duplicated on the adapter. The alternative is to give up some performance and move caching to getEffectiveLevel(). Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a functional difference. Is there?

The manager.disable check is [duplicated] there because it would potentially allow skipping the hierarchy walk of getEffectiveLevel().

The alternative is to give up some performance and move caching to getEffectiveLevel()

Although it slows down isEnabledFor() because of still needing the manager.disable check, I think it would be better to cache getEffectiveLevel(). It's easier to reason about. The manager.disable level shouldn't be conflated with the effective levels of individual loggers, IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why the manager.disable check is in Logger.isEnabledFor(). What I was trying to say is having LoggerAdapter.isEnabledFor() reimplement Logger.isEnabledFor() only makes sense if someone is going to subclass LoggerAdapter and overwrite LoggerAdapter.getEffectiveLevel().

I wasn't associating the caching with the levels, but rather the isEnabledFor() call. I guess another benefit of moving it would be it also speeds up calls to getEffectiveLevel(). I'm not sure how often that's used outside of this module, but I'm sure it is.

@@ -4406,6 +4406,29 @@ def test_basic(self):
self.assertTrue(found, msg=msg)


class CachingTest(BaseTest):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add this to an existing test case class rather than adding a separate class just for this one method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll move it to the LoggerTest class

@avylove
Copy link
Contributor Author

avylove commented Jul 28, 2017

@vsajip I was wrong about how much slower it would be. I ran some benchmarks to compare putting the caching in isEnabledFor() vs getEffectiveLevel() and I think it would be best to leave it where it is.

Here's how I implemented caching in getEffectiveLevel(). I retained the dictionary so the check and fetch can be a single operation to avoid race conditions.

def getEffectiveLevel(self):
    try:
        return self._cache['_elevel']
    except KeyError:
        _acquireLock()
        logger = self
        effective_level = NOTSET
        while logger:
            if logger.level:
                effective_level = logger.level
                break
            logger = logger.parent
        self._cache['_elevel'] = effective_level
        _releaseLock()

        return effective_level

I expanded my original benchmarking code to test at all the loggers in the hierarchy, not just the third level

import timeit

count = 10000000

set_up = """
root = logging.getLogger()
logger1 = logging.getLogger('1')
logger2 = logging.getLogger('1.2')
logger3 = logging.getLogger('1.2.3')
"""

for logger in ('root', 'logger1', 'logger2', 'logger3'):
    statement = "%s.debug('Debug message')" % logger

    orig = timeit.timeit(statement, number=count,
                         setup='import logging_orig as logging\n' + set_up)
    ief = timeit.timeit(statement, number=count,
                         setup='import logging_cache_ief as logging\n' + set_up)
    gel = timeit.timeit(statement, number=count,
                         setup='import logging_cache_gel as logging\n' + set_up)

    print('--- %s ---' % logger)
    print('Non-caching:  ', orig)
    print('With Caching (isEnabledFor): ', ief)
    print('With Caching (getEffectiveLevel): ', gel)
    print('Time decrease (isEnabledFor): %.2f%%' % ((orig - ief) / orig * 100))
    print('Time decrease (getEffectiveLevel): %.2f%%' % ((orig - gel) / orig * 100))

From the results you can see that as the logger hierarchy grows, the difference becomes less apparent, but caching at isEnabledFor() is much faster. I think this can be attributed to the significant reduction in operations (lookups and comparisons).

--- root ---
Non-caching:   5.672386677993927
With Caching (isEnabledFor):  3.272040506999474
With Caching (getEffectiveLevel):  5.023688525019679
Time decrease (isEnabledFor): 42.32%
Time decrease (getEffectiveLevel): 11.44%
--- logger1 ---
Non-caching:   6.022408013988752
With Caching (isEnabledFor):  3.288163874996826
With Caching (getEffectiveLevel):  5.099931188975461
Time decrease (isEnabledFor): 45.40%
Time decrease (getEffectiveLevel): 15.32%
--- logger2 ---
Non-caching:   7.034587770991493
With Caching (isEnabledFor):  3.271738503011875
With Caching (getEffectiveLevel):  5.257907326973509
Time decrease (isEnabledFor): 53.49%
Time decrease (getEffectiveLevel): 25.26%
--- logger3 ---
Non-caching:   7.899996431020554
With Caching (isEnabledFor):  3.420868766028434
With Caching (getEffectiveLevel):  5.191981957992539
Time decrease (isEnabledFor): 56.70%
Time decrease (getEffectiveLevel): 34.28%

@vsajip
Copy link
Member

vsajip commented Jul 30, 2017

Good stuff. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants