Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow Resource to be created with some attributes resolving asynchronously #2933

Closed
wants to merge 1 commit into from

Conversation

aabmass
Copy link
Member

@aabmass aabmass commented Apr 27, 2022

Which problem is this PR solving?

Fixes #2912

Short description of the changes

  • In @opentelemetry/sdk-node
    • Adds an optional Promise<ResourceAttributes> to the Resource constructor which can asynchronously resolve some attributes. When this promise resolves, the attributes will be merged with the "synchronous" attributes.
    • Deprecated detectResources() in favor of a new method detectResourcesSync() which defers promises into the resource to be resolved later
    • Update Detector.detect() signature to allow it to be synchronous and updated documentation that the synchronous variant should be used
  • In @opentelemetry/sdk-node, updated NodeSDK.start() and its dependencies to now be synchronous (doesn't return a promise). This is a breaking change, but that package is experimental.
  • Updated BatchSpanProcessorBase and SimpleSpanProcessor to await the resource promise before calling exporters, if the resource has asynchronous attributes AND they have not already resolved. I.e. if there is no promise in the Resource (the case for existing detectors and code), the behavior is unchanged.

There may be some test breakages in contrib because of the nature of asynchronous tests on fire-and-forget span.end(). Users shouldn't see any behavior change.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected) I don't think so?

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Existing tests pass with no modification
  • Added new tests to Resource.test.ts
  • Added detect-resources.test.ts

Checklist:

  • Followed the style guidelines of this project
  • Unit tests have been added
  • Documentation has been updated

@codecov
Copy link

codecov bot commented Apr 27, 2022

Codecov Report

Merging #2933 (8435693) into main (3cc40d7) will decrease coverage by 1.32%.
The diff coverage is 96.55%.

❗ Current head 8435693 differs from pull request most recent head eb6446f. Consider uploading reports for the commit eb6446f to get more accurate results

@@            Coverage Diff             @@
##             main    #2933      +/-   ##
==========================================
- Coverage   92.64%   91.32%   -1.33%     
==========================================
  Files         187       68     -119     
  Lines        6169     1878    -4291     
  Branches     1303      399     -904     
==========================================
- Hits         5715     1715    -4000     
+ Misses        454      163     -291     
Impacted Files Coverage Δ
packages/opentelemetry-resources/src/Resource.ts 97.50% <96.55%> (-2.50%) ⬇️
packages/opentelemetry-resources/karma.worker.js 0.00% <0.00%> (-100.00%) ⬇️
...ckages/opentelemetry-sdk-trace-web/karma.worker.js 0.00% <0.00%> (-100.00%) ⬇️
...s/opentelemetry-core/src/platform/node/sdk-info.ts 0.00% <0.00%> (-100.00%) ⬇️
...opentelemetry-core/src/platform/node/globalThis.ts 0.00% <0.00%> (-100.00%) ⬇️
...pentelemetry-core/src/platform/node/performance.ts 0.00% <0.00%> (-100.00%) ⬇️
...emetry-api-metrics/src/platform/node/globalThis.ts 0.00% <0.00%> (-100.00%) ⬇️
...lemetry-resources/src/detectors/ProcessDetector.ts 31.81% <0.00%> (-68.19%) ⬇️
...perimental/packages/otlp-exporter-base/src/util.ts 79.41% <0.00%> (-17.65%) ⬇️
...ckages/opentelemetry-exporter-zipkin/src/zipkin.ts 84.48% <0.00%> (-15.52%) ⬇️
... and 120 more

@blumamir
Copy link
Member

I really like this proposal - to propagate the async task down to resource attribute consumers.
This is a backward-compatible (I think) and very clean way to address the issue.

The promise means that every time we need to access this resource attributes, the caller will have to await to get the full attributes values.
I wonder if that is a constraint that is easy to fulfill.

For example, in aspecto's distro we read the resource attributes in a span processor and use the values to apply filtering rules on span attributes. This is done in the onEnd invocation which is a sync function of course.
I guess we could wrap it with some mechanism that aggregates the processor spans until the resource is resolved, but that is not trivial.

Another possible issue is that we cannot have any guarantee on how long it will take for the promise to resolve. Until it resolves, spans might be piling up in the exporter export invocations, all waiting for the resource to be resolved so they can be sent on the wire.

Really interesting approach, looking forward to discussing this more.

@aabmass aabmass force-pushed the sync-resource-detectors branch 3 times, most recently from fd6d202 to 3ae3618 Compare April 28, 2022 05:03
@aabmass aabmass changed the title feat: Add synchronous resource detector and move async detection into promise feat: allow Resource to be created with some attributes resolving asynchronously Apr 28, 2022
@Flarna
Copy link
Member

Flarna commented Apr 28, 2022

According to spec resource is immutable. I think delaying the read until first use in e.g. exporter sounds to me like it's still spec conform.
I don't think the await can be avoided by any wrapping/API we add in common code. A consumer awaits on first use and may store the result for future use (maybe already serialized/compressed/... to avoid to redo this on every export).

@aabmass aabmass force-pushed the sync-resource-detectors branch from 3ae3618 to 2f60366 Compare May 3, 2022 23:31
@rauno56
Copy link
Member

rauno56 commented May 11, 2022

I wonder if top level await + esm preload would be able to achieve this as well? @aabmass / @Flarna , do you know?

EDIT

Played around with it a bit. For ESM, adding an loader module would work and is able to asyncronously wait for stuff to resolve. The thing is, it's only for ESM.

For CJS, there's no TLA, so we still need a solution for today.

@dyladan
Copy link
Member

dyladan commented May 11, 2022

Played around with it a bit. For ESM, adding an loader module would work and is able to asyncronously wait for stuff to resolve. The thing is, it's only for ESM.

That's quite cool. Is the limitation of only a single loader module removed yet?

@Flarna
Copy link
Member

Flarna commented May 12, 2022

I think top Level await can be used but as far as I know it works only in ECMAScript module mode not for commonJs. I requires node.js 14.8 as far as I know. Not sure about browsers.
Once we are in module mode we have to use loaders which are still experimental.

The limitation to have only single loader is gone on node master (see here), backport to 18.x has not yet happened as some other breaking changes in loaders are planned and target is to combine all breaking changes in a single 18.x release.

@aabmass aabmass force-pushed the sync-resource-detectors branch 4 times, most recently from 01ba5d5 to 944e392 Compare June 17, 2022 20:32
@aabmass aabmass marked this pull request as ready for review June 17, 2022 21:42
@aabmass aabmass requested a review from a team June 17, 2022 21:42
@aabmass aabmass force-pushed the sync-resource-detectors branch 4 times, most recently from 46ecbba to f3c9913 Compare June 18, 2022 03:31
@aabmass aabmass force-pushed the sync-resource-detectors branch from f3c9913 to eb6446f Compare June 18, 2022 13:47
Copy link
Member Author

@aabmass aabmass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready for reviews. I am planning to add more tests in a follow up PR to keep the size manageable

Comment on lines +175 to +176
// Avoid scheduling a promise to make the behavior more predictable and easier to test
if (pendingResources.length === 0) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this check, I had to make extensive changes to the tests because the fire-and-forget nature of L179

Copy link
Member

@dyladan dyladan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess now span processors are expected to handle these async promises? We already have some users which have implemented custom span processors. I guess this is considered breaking for them? What does the spec say about breaking changes for sdk interface implementers?

@@ -130,9 +130,9 @@ export class NodeSDK {
/**
* Once the SDK has been configured, call this method to construct SDK components and register them with the OpenTelemetry API.
*/
public async start(): Promise<void> {
public start(): void {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉


// SpanAttributes from resource overwrite attributes from other resource.
const mergedAttributes = Object.assign(
{},
this.attributes,
other.attributes
);
return new Resource(mergedAttributes);

let mergedAsyncAttributesPromise: Promise<ResourceAttributes> | undefined;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth checking if async promises are resolved and adding them to the sync attributes? Probably not very likely since resources will generally all be constructed almost at the same time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking to leave it like since we already have a few special cases already. Not a strong opinion tho

let mergedAsyncAttributesPromise: Promise<ResourceAttributes> | undefined;
if (this.asyncAttributesPromise && other.asyncAttributesPromise) {
mergedAsyncAttributesPromise = Promise.all([
this.asyncAttributesPromise.catch(() => ({})),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no error handling of any kind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these extra catch handlers needed at all? There is already one add in constructor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catch handler in the constructor is not part of the this.asyncAttributesPromise (there is a branch in the promise tree). The idea is that waitForAsyncAttributes() will reject if the promise rejected so a user can handle it still.

          this.asyncAttributesPromise
             (adds the async attributes to attributes)
              /                    |             \
             /                     |              \  
      .catch handler               |           .waitForAsyncAttributes()  
          for logging              |
                                   |
                    .catch() here to to avoid
                         Promise.all() from rejecting

Open to suggestions on how to simplify this. I would use Promise.allSettled() but it's not available

*/
export interface Detector {
detect(config?: ResourceDetectionConfig): Promise<Resource>;
detect(config?: ResourceDetectionConfig): Promise<Resource> | Resource;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@open-telemetry/javascript-maintainers Can this be considered a breaking change? Users shouldn't be calling resource detectors directly but if they are I can easily see some code like detect().then(something) breaking (TypeError: detect(...).then is not a function).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a breaking change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas on alternatives?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* @returns true if no async attributes promise was provided or if the promise has resolved
* and been merged together with the sync attributes.
*/
asyncAttributesHaveResolved(): boolean {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use a getter here?

@@ -25,6 +26,10 @@ import { defaultServiceName } from './platform';
*/
export class Resource {
static readonly EMPTY = new Resource({});
private _attributes: ResourceAttributes;
private asyncAttributesPromise: Promise<ResourceAttributes> | undefined;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private asyncAttributesPromise: Promise<ResourceAttributes> | undefined;
private _asyncAttributesPromise: Promise<ResourceAttributes> | undefined;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do this, but curious why since we have private. Is it for people using vanilla JS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a convention in many TS projects including this one.

  • When looking at the generated JS code it is clear what is private
  • When accessing with obj["propname"] syntax it is possible to access private properties without TS complaining

Maybe other reasons but those are the ones on top of my mind

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally I don't care about using _ or not. But in this project it's used (hopefully) everywhere and I think we should stay consistent.

let mergedAsyncAttributesPromise: Promise<ResourceAttributes> | undefined;
if (this.asyncAttributesPromise && other.asyncAttributesPromise) {
mergedAsyncAttributesPromise = Promise.all([
this.asyncAttributesPromise.catch(() => ({})),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these extra catch handlers needed at all? There is already one add in constructor.

const resourceOrPromise = d.detect(internalConfig);
let resource: Resource;
if (resourceOrPromise instanceof Promise) {
diag.info('Resource detector %s should return a Resource directly instead of a promise.', d.constructor.name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might result in quite some log spam. The detectors API allows to return a Promise<Resource> and it's done at a lot places so logging here as info seems to be not correct to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack on the spam. Would we like to move all existing resource detectors to return the non promise form? I.e. deprecate the Promise<Resource> variant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally yes, but it needs to be considered carefully. We probably can't remove the async detectors in many cases and would be breaking. #3055 for details

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all detectors could be moved to sync return a Resource object which includes the optional async part.
But I think we should define a deprecation cycle for such changes which should include docs, deprecation logs (maybe to be enabled via some OTEL_LOG_DEPRECATION env var but that's a separate discussion).
As of now returning Promise<Resource> is ok and most/all detectors do this so we should not yet log like this.


it('should return true for asyncAttributesHaveResolved() if no promise provided', () => {
assert.ok(new Resource({'foo': 'bar'}).asyncAttributesHaveResolved());
assert.ok(Resource.empty().asyncAttributesHaveResolved());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already on line 107, maybe remove one of them

const detector: Detector = {
async detect() {
return new Resource(
{ 'sync': 'fromsync', 'async': 'fromasync' },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't matter for the test but the 'async': 'fromasync' entry is actually sync here.,

@@ -166,6 +168,17 @@ export abstract class BatchSpanProcessorBase<T extends BufferConfig> implements
}
}
);

const pendingResources = spans.map(span => span.resource)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really needed to iterate all spans on every flush? Actually all spans should refer to the same Resource instance therefore checking the first span should be fine. This would reduce the number of promises created and avoid the need of a Promise.all below.
Once this resource of this span is resolved it should be not needed anymore to check again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember there was some proposal to allow Resource to change for browser sessions, not sure if that is still a concern. Otherwise I'm happy to just check the first one. Any thoughts @dyladan ?

if (pendingResources.length === 0) {
doExport();
} else {
Promise.all(pendingResources.map(resource => resource.waitForAsyncAttributes()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If resource detection is slower then _exportTimeoutMillis we may end up in exporting spans even the scheduled export has already timed out.
Usually parallel exports are not done by BatchSpanProcessor.

if (span.resource.asyncAttributesHaveResolved()) {
doExport();
} else {
span.resource.waitForAsyncAttributes()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SimpleSpanProcessor and async exporting is not the best match as the main intention of SimpleSpanProcessor is to export immediately. But I have no idea how to solve that...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by async exporting here? It's already fire and forget on the export, since on_end() can't block

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simple span processor is meant to be as simple as possible and as fast as possible. The fact that everything happens synchronously means that in an ephemeral environment like lambda export is fully synchronous until the exporter itself makes it async (or doesn't in case of e.g. log exporter)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding the main usecase for SimpleSpanProcessor is local development and using a console exporter (or similar) as the overhead in production to do a http request per span would be crazy.
This usecase is somewhat "broken" as you may not see your span on console after span.end().

Unfortunately I have no good solution for this... One could say for local development it's also possible to us a non async resource setup and problem is solved.

@Flarna
Copy link
Member

Flarna commented Jun 22, 2022

I guess now span processors are expected to handle these async promises? We already have some users which have implemented custom span processors. I guess this is considered breaking for them? What does the spec say about breaking changes for sdk interface implementers?

I would consider this also as a breaking change so this would be semver major.

@dyladan
Copy link
Member

dyladan commented Jun 22, 2022

This PR actually introduces another problem from the specification:

OnEnd(Span)

OnEnd is called after a span is ended (i.e., the end timestamp is already set). This method MUST be called synchronously within the Span.End() API, therefore it should not block or throw an exception.

https://github.com/open-telemetry/opentelemetry-specification/blob/9abbdd39d0b35f635f833f072013431da419894e/specification/trace/sdk.md#onendspan

@dyladan
Copy link
Member

dyladan commented Jun 29, 2022

This PR actually introduces another problem from the specification:

OnEnd(Span)

OnEnd is called after a span is ended (i.e., the end timestamp is already set). This method MUST be called synchronously within the Span.End() API, therefore it should not block or throw an exception.

open-telemetry/opentelemetry-specification@9abbdd3/specification/trace/sdk.md#onendspan

I actually realized this isn't a problem. If it was, then the batch span processor would be not compliant.

@github-actions
Copy link

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Aug 29, 2022
@github-actions
Copy link

This PR was closed because it has been stale for 14 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add or change resource detection to be synchronous
5 participants