Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sdk-trace-base): Export processed spans while exporter failed #4287

Merged
merged 3 commits into from
Jan 24, 2024

Conversation

Zirak
Copy link
Contributor

@Zirak Zirak commented Nov 10, 2023

Which problem is this PR solving?

While the exporter deals with a batch of spans, new spans may come in and wait to be exported. As previously implemented, a successful export would notice these waiting spans, triggering a renewed timer check, but not so for an unsuccessful export.

The result was that, prior to this commit, a failing export may end up in a situation where no further spans will be exported. This is due to the behaviour of _addToBuffer when the queue is full: Imagine an export which fails after a long timeout (because of, for instance, network troubles). While the connection waits to be timed out, the span queue fills up. Once completely full, no new calls to recheck the timer will be done. On its own, this behaviour is fine. When combined with the patched bug, this leads to a rather confusing case where the exporter never tries exporting.

Short description of the changes

This PR makes sure these spans are acknowledged and handled, even when exporting fails.

@pichlermarc you've mentioned working on refactoring the exporter; here's an interesting edge case.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

  • Unit test
  • Functional testing against a real-world setup, before and after the patch:
    • Start the instrumented application once with a working collector, see that everything flows
    • Start the application again, this time without a collector listening on the configured port; wait for for a few failed exports, then run the collector again, seeing whether spans arrive

Checklist:

  • Followed the style guidelines of this project
  • Unit tests have been added
  • Documentation has been updated

@Zirak Zirak requested a review from a team November 10, 2023 21:58
@@ -221,7 +221,7 @@ export abstract class BatchSpanProcessorBase<T extends BufferConfig>
const flush = () => {
this._isExporting = true;
this._flushOneBatch()
.then(() => {
.finally(() => {
this._isExporting = false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the finally and catch set _isExporting. Initially I removed the assignment from the catch, but that gave me slight nervous jitters

While the exporter deals with a batch of spans, new spans may come in and wait
to be exported. As previously implemented, a successful export would notice
these waiting spans, triggering a renewed timer check, but not so for an
unsuccessful export.

The result was that, prior to this commit, a failing export may end up in a
situation where no further spans will be exported. This is due to the behaviour
of `_addToBuffer` when the queue is full: Imagine an export which fails after a
long timeout (because of, for instance, network troubles). While the connection
waits to be timed out, the span queue fills up. Once completely full, no new
calls to recheck the timer will be done. On its own, this behaviour is
fine. When combined with the patched bug, this leads to a rather confusing case
where the exporter never tries exporting.
@Zirak Zirak force-pushed the export-queue-filled branch from e6d8d3e to 2f1d7a5 Compare November 10, 2023 22:03
Copy link

codecov bot commented Nov 14, 2023

Codecov Report

Merging #4287 (95a7829) into main (0f6518d) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4287      +/-   ##
==========================================
+ Coverage   92.20%   92.21%   +0.01%     
==========================================
  Files         336      336              
  Lines        9515     9515              
  Branches     2017     2017              
==========================================
+ Hits         8773     8774       +1     
+ Misses        742      741       -1     
Files Coverage Δ
...dk-trace-base/src/export/BatchSpanProcessorBase.ts 94.21% <100.00%> (+0.82%) ⬆️

Copy link

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Jan 22, 2024
Copy link
Member

@pichlermarc pichlermarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the PR must've gotten lost in my inbox.
Looks good, thanks for taking care of this.

@pichlermarc pichlermarc added bug Something isn't working pkg:sdk-trace-base and removed stale labels Jan 24, 2024
@pichlermarc pichlermarc merged commit 0635ab1 into open-telemetry:main Jan 24, 2024
20 checks passed
@Zirak Zirak deleted the export-queue-filled branch January 24, 2024 21:51
Zirak added a commit to Zirak/opentelemetry-js that referenced this pull request Sep 14, 2024
…en-telemetry#4287)

* fix(sdk-trace-base): Export processed spans while exporter failed

While the exporter deals with a batch of spans, new spans may come in and wait
to be exported. As previously implemented, a successful export would notice
these waiting spans, triggering a renewed timer check, but not so for an
unsuccessful export.

The result was that, prior to this commit, a failing export may end up in a
situation where no further spans will be exported. This is due to the behaviour
of `_addToBuffer` when the queue is full: Imagine an export which fails after a
long timeout (because of, for instance, network troubles). While the connection
waits to be timed out, the span queue fills up. Once completely full, no new
calls to recheck the timer will be done. On its own, this behaviour is
fine. When combined with the patched bug, this leads to a rather confusing case
where the exporter never tries exporting.

* fix(changelog): add entry

---------

Co-authored-by: Marc Pichler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pkg:sdk-trace-base
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants