Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Support thread-local keys for logger #991

Closed
Dunedan opened this issue Feb 2, 2022 · 9 comments · Fixed by #5141
Closed

RFC: Support thread-local keys for logger #991

Dunedan opened this issue Feb 2, 2022 · 9 comments · Fixed by #5141
Labels
feature-request feature request good first issue Good for newcomers help wanted Could use a second pair of eyes/hands logger RFC

Comments

@Dunedan
Copy link
Contributor

Dunedan commented Feb 2, 2022

Key information

  • RFC PR:
  • Related issue(s), if known:
  • Area: Logger
  • Meet tenets: Yes
  • Approved by:
  • Reviewed by:

Summary

Provide the ability to append thread-local keys to a logger which only get included in log records logged in the thread they got added to.

Motivation

As AWS Lambda provides multiple CPU cores it makes sense for certain workloads to distribute work across multiple threads. Currently it's not possible to append keys to a logger on a per-thread basis. That however would be very helpful for logging thread-specific information.

structlog, another popular Python logger offering structured logging, supports such a feature under the term "thread-local contexts".

Proposal

In addition to manage globally scoped keys, Logger needs an interface to manage thread-local keys. This interface must consist of the ability to append keys, remove keys and clear all thread-local keys of the current thread. The ability to clear all keys of the current thread is helpful when using thread pools where threads are being reused, but shouldn't retain log keys of previous usages.

I can imagine two ways how this could work (but I'd be happy with another way as well): Additional methods for Logger for managing thread-local keys or the ability to get a "thread-aware" child logger which can then be used to configure thread-local keys.

Proposal 1: Additional methods

Using additional methods for the existing Logger class would require to somehow indicate that they relate to the current thread. So names like append_thread_local_keys/remove_thread_local_keys/clear_thread_local_keys could be used. Here is an example how that could look like:

import concurrent.futures

from aws_lambda_powertools import Logger

logger = Logger(service="payment")


def process_payments(payment_id):
    logger.clear_thread_local_keys()
    logger.append_thread_local_keys(payment_id=payment_id)
    logger.info("Start processing payment")
    ...
    logger.info("End processing payment")


with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    futures = [executor.submit(process_payments, payment_id) for payment_id in range(10)]
    for future in concurrent.futures.as_completed(futures):
        future.result()

Proposal 2: Child logger

Offering the ability to get a thread-local child logger from an existing logger would offer usage like that:

import concurrent.futures

from aws_lambda_powertools import Logger

logger = Logger(service="payment")


def process_payments(payment_id):
    child_logger = logger.get_thread_local_logger()
    child_logger.append_keys(payment_id=payment_id)
    child_logger.info("Start processing payment")
    ...
    child_logger.info("End processing payment")


with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    futures = [executor.submit(process_payments, payment_id) for payment_id in range(10)]
    for future in concurrent.futures.as_completed(futures):
        future.result()

For performance reasons when re-using threads it might be worth to consider to not create a new child logger for every call to Logger.get_thread_local_logger(), but instead to just to the clear keys of a possibly already existing child logger for the same thread.

One drawback of this proposal might be that it'd be cumbersome to use when the logic running in a thread is more complex and spans multiple functions, as the logger instance would need to be passed between these functions.

If this feature should be available in other runtimes (e.g. Java, Typescript), how would this look like to ensure consistency?

IMO it shouldn't be a problem to port that to the other runtimes in the same way as well.

User Experience

How would customers use it?

Without the need for thread-local keys, customers would us AWS Lambda Powertools' Logger the same way as they do right now.
With the need for thread-local keys, customers would call the respective methods to enhance records logged in a thread with additional useful information. A code example for that is shown above.

Any configuration or corner cases you'd expect?

There might be different expectations regarding the behavior when globally scoped keys get appended while a thread is running and its logger configuration has been already done. Will such global keys be included in records logged by the thread immediately or only after the logger configuration for the thread got re-initialized? I personally would prefer global keys to get included immediately into records logged by threads as well.

Demonstration of before and after on how the experience will be better

Before:

import concurrent.futures

from aws_lambda_powertools import Logger

logger = Logger(service="payment")


def process_payments(payment_id):
    logger.info("Start processing payment", extra={"payment_id": payment_id})
    ...
    logger.info("End processing payment", extra={"payment_id": payment_id})


with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    futures = [executor.submit(process_payments, payment_id) for payment_id in range(10)]
    for future in concurrent.futures.as_completed(futures):
        future.result()

After:

import concurrent.futures

from aws_lambda_powertools import Logger

logger = Logger(service="payment")


def process_payments(payment_id):
    logger.append_thread_local_keys(payment_id=payment_id)
    logger.info("Start processing payment")
    ...
    logger.info("End processing payment")


with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    futures = [executor.submit(process_payments, payment_id) for payment_id in range(10)]
    for future in concurrent.futures.as_completed(futures):
        future.result()

Drawbacks

The only drawbacks I can imagine right now are a more complex Logger interface and a possible performance impact when logging records, depending on the resulting implementation.

Rationale and alternatives

  • What other designs have been considered? Why not them?

The only other option I did consider was manually creating one logger per thread, but that feels cumbersome, because aside from the thread-local keys, it would need to be configured identical to the logger for the main thread.

  • What is the impact of not doing this?

Customers who wish to log thread-local keys have no straight-forward way available to do so and either have to employ workarounds like using the extra keyword to add the desired keys for each record or to use another logger.

Unresolved questions

@Dunedan Dunedan added RFC triage Pending triage from maintainers labels Feb 2, 2022
@heitorlessa
Copy link
Contributor

Wow, what a great write up @Dunedan !! Thank you! I really like the first proposal - this makes it explicit whoever is reading the code, it doesn't incur into an additional unbounded logger, etc.

I have a few questions:

  • Would you be able to contribute so we fast track this?
    • I can help with anything you might need e.g. docs, review, additional code, prioritize it for the next release, etc.
  • For thread safety mutations, are you thinking of using thread local storage or something else?
  • Should additional keys added outside a thread also be logged within a thread?
    • It's the same logger and formatter. For example, logger.append_keys(order_id=1234) runs after thread started, once a log statement (logger.info("something") happens within the thread it'll naturally have order_id=1234
  • Do we need to modify log statements method to use a different log dictionary state? e.g. logger.info, logger.warning
  • Shall we include thread_id by default as part of the log attributes too?

Internals

For Logger, we keep state within the Formatter under self.log_format. Append/Remove keys are simply mutating that dict. A child logger doesn't have a formatter registered so it propagates to the parent allowing our bi-directional UX, hence some of the questions.

When a log statement occurs, Python logging calls our custom format so we can extract log keys, remove null keys, get latest X-Ray ID, etc..

Super looking forward to this.

@heitorlessa heitorlessa added area/logger and removed triage Pending triage from maintainers labels Feb 3, 2022
@Dunedan
Copy link
Contributor Author

Dunedan commented Feb 7, 2022

  • Would you be able to contribute so we fast track this?

Sorry, I'm afraid don't have enough time for that right now. I'd be happy to provide feedback though.

  • For thread safety mutations, are you thinking of using thread local storage or something else?

Thread local storage sounds good. That's also how structlog handles it: https://github.com/hynek/structlog/blob/30736d643c64308d3d00953c22a8f3d1e4e39f1f/src/structlog/threadlocal.py#L176

  • Should additional keys added outside a thread also be logged within a thread?

    • It's the same logger and formatter. For example, logger.append_keys(order_id=1234) runs after thread started, once a log statement (logger.info("something") happens within the thread it'll naturally have order_id=1234

For me that'd fit my expectation: Whenever I add something to the global context, it'll appear in log records afterwards, no matter if the record was logged by the main thread or any other thread.

  • Shall we include thread_id by default as part of the log attributes too?

Usually I'd say yes please, but there is one argument against adding additional keys by default to the logs: CloudWatch ingestion cost. We're already thinking twice about what to add to CloudWatch logs, simply because it's already one of the biggest line items on our AWS bill.

@heitorlessa heitorlessa added the good first issue Good for newcomers label Mar 7, 2022
@heitorlessa heitorlessa added the feature-request feature request label Jun 13, 2022
@heitorlessa heitorlessa added the help wanted Could use a second pair of eyes/hands label Aug 1, 2022
@heitorlessa heitorlessa added the triage Pending triage from maintainers label Aug 22, 2022
@heitorlessa
Copy link
Contributor

Update: haven't forgotten this, but will likely to tackle only next year due to re:Invent and PTOs. We've recently added support for kwargs in all log statements that should make this problem slightly easier: https://awslabs.github.io/aws-lambda-powertools-python/2.2.0/core/logger/#ephemeral-metadata

from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.typing import LambdaContext

logger = Logger()


def handler(event: dict, context: LambdaContext) -> str:
    logger.info("Collecting payment", request_id="1123")

    return "hello world"

@heitorlessa heitorlessa removed the triage Pending triage from maintainers label Nov 9, 2022
@shanab
Copy link
Contributor

shanab commented Jan 30, 2023

Hi @heitorlessa,

I'd be happy to work on the implementation of this issue if you haven't already started.

@heitorlessa
Copy link
Contributor

Hi @heitorlessa,

I'd be happy to work on the implementation of this issue if you haven't already started.

I somehow missed this important message :/ Absolutely!! I haven't started and I'd love help here. ContextVars would highly likely be the solution here - we couldn't do that before as we supported 3.6.

thank you @shanab !!

@SimonBFrank
Copy link
Contributor

Hey @heitorlessa, I want to contribute to this RFC. I got the implementation working locally and the PR should be ready in the next day or two. I just need to update some test cases and documentation. I will post the PR once it is ready.

@leandrodamascena
Copy link
Contributor

Hi everyone! @SimonBFrank is working on PR #5141 to implement thread-safe methods for the Logger utility and we are having some important discussions there. I will paste the link to the last comment here and would love some feedback there.

Last comment: #5141 (comment)

@github-project-automation github-project-automation bot moved this from Backlog to Coming soon in Powertools for AWS Lambda (Python) Oct 22, 2024
Copy link
Contributor

⚠️COMMENT VISIBILITY WARNING⚠️

This issue is now closed. Please be mindful that future comments are hard for our team to see.

If you need more assistance, please either tag a team member or open a new issue that references this one.

If you wish to keep having a conversation with other community members under this issue feel free to do so.

@github-actions github-actions bot added the pending-release Fix or implementation already in dev waiting to be released label Oct 22, 2024
Copy link
Contributor

This is now released under 3.2.0 version!

@github-actions github-actions bot removed the pending-release Fix or implementation already in dev waiting to be released label Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request feature request good first issue Good for newcomers help wanted Could use a second pair of eyes/hands logger RFC
Projects
Status: Coming soon
Development

Successfully merging a pull request may close this issue.

5 participants