Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Fix: Defensively copy context entities #340

Merged
merged 3 commits into from
Jun 27, 2022

Conversation

tyler-dodge
Copy link
Contributor

@tyler-dodge tyler-dodge commented Jun 9, 2022

Description

Before this change, concurrent async tasks would all share the same instance of the entities list. This change makes it so they each get their own copy of the list.

This matters because the recorder modifies the entities list in place, which makes it so concurrent subtasks end up looking at the wrong item in the entites list when deciding the parent subsegment:

def get_trace_entity(self):
"""
Return the current trace entity(segment/subsegment). If there is none,
it behaves based on pre-defined ``context_missing`` strategy.
If the SDK is disabled, returns a DummySegment
"""
if not getattr(self._local, 'entities', None):
if not global_sdk_config.sdk_enabled():
return DummySegment()
return self.handle_context_missing()
return self._local.entities[-1]

You can see this in the unit test added. Before this change the unit test would print out the following:

Subseg 1 ID: 5b1c1cf70db2e808
Subseg 2 ID: 4a138f09cd3c6813
Subseg 3 ID: f97c02073584be8a
Subseg 4 ID: efac5645672e1cd6
Subseg 5 ID: 6874f083c496d388
Subseg 6 ID: f447815d60c734d6
Subseg 7 ID: edf7d60ea3d6875e
Subseg 8 ID: d2a1980e83a569f4
Subseg 9 ID: 6a0f1059ca4a91b6
Subseg 10 ID: 75421c2a04106214

Parent Segment ID: a9037c7ccc92e0c8 # Correct!
Subseg parent ID: a9037c7ccc92e0c8 # Correct!
Subseg parent ID: 4a138f09cd3c6813 # WRONG (all the ones below are wrong)
Subseg parent ID: f97c02073584be8a
Subseg parent ID: efac5645672e1cd6
Subseg parent ID: 6874f083c496d388
Subseg parent ID: f447815d60c734d6
Subseg parent ID: edf7d60ea3d6875e
Subseg parent ID: d2a1980e83a569f4
Subseg parent ID: 6a0f1059ca4a91b6
Subseg parent ID: 75421c2a04106214

With this change, all the subsegments have the correct parent ID.

(enowell) You can also visualize this in X-Ray from my tests:

Before:

image

Subsegments are children of each other even though they are concurrent:

After:

image

Subsegments have the correct parent ID.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

I think this possibly fixes the issue here:

Fixes: #310
Fixes: #164

@tyler-dodge tyler-dodge requested a review from a team as a code owner June 9, 2022 16:32
@tyler-dodge tyler-dodge force-pushed the master branch 2 times, most recently from 4f9b78d to d070b98 Compare June 13, 2022 23:30
@NathanielRN
Copy link
Contributor

Thanks @tyler-dodge! Just curious, do you have a screenshot of this failing or the bug you observed? I think this change makes sense but I think it would be useful to see it breaking. (Or maybe unit tests that show it works if that’s easier to add?).

Before this change, concurrent async tasks would all share the same
instance of the entities list. This change makes it so they each get
their own copy of the list. This matters because the recorder modifies
the list in place, which makes it so concurrent subtasks have the
wrong parent subsegment.
@tyler-dodge
Copy link
Contributor Author

I added a unit test for it in the latest force push that I also verified failed prior to this change.

Co-authored-by: Nathaniel Ruiz Nowell <[email protected]>
Copy link
Contributor

@NathanielRN NathanielRN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for your help Tyler! This is a great bug fix and I look forward to seeing it being deployed 🙂

I added some suggestions for documentation to help us in the future, would you mind taking a look? Thanks!

tests/test_async_recorder.py Outdated Show resolved Hide resolved
aws_xray_sdk/core/async_context.py Outdated Show resolved Hide resolved
@tyler-dodge
Copy link
Contributor Author

The modifications you added all look great to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants