monitor-opentelemetry-exporter: Unhandled rejection on network issues or 500 from AI /v2/track endpoint #12851

jmealo · 2020-12-10T16:19:57Z

Package Name:
@azure/monitor-opentelemetry-exporter
Package Version:
1.0.0-preview.6
Operating system:
Linux
nodejs
- version:
browser
- name/version:
typescript
- version:
Is the bug related to documentation in
- README.md
- source code documentation
- SDK API docs on https://docs.microsoft.com

Describe the bug
Errors during the transmission or persistence of spans crashes the process under observation with an unhandled rejection rather than retrying the request or raising an error.

To Reproduce
Steps to reproduce the behavior:

Unplug/sever network connection or block DNS/HTTP traffic -- wait for next export to occur, it will throw an unhandled rejection with a value of 2.
1a. Get a 500 internal server error from /v2/track AI ingestion endpoint -- it will throw an unhandled rejection with a value of 2.

Expected behavior
For all azure-sdk js packages not to throw unhandled rejections/exceptions and to handle internal errors internally or provide error handling mechanisms for end-users. At the very least a use-able stack trace would be nice.

Additional context
There are unreleased fixes committed by @markwolff on GitHub that have not been published to NPM.

The text was updated successfully, but these errors were encountered:

jmealo · 2020-12-10T16:54:01Z

Ideally, something could be added to CI to test all azure-sdk for js modules: #12609

jmealo · 2020-12-10T20:05:10Z

#12856 I built @microsoft/opentelemetry-exporter-azure-monitor package from the default branch after cloning and was able to use rush and the included npm scripts to build and pack a .tgz, which upon installing, had a different public interface than the previously published module.

xirzec · 2020-12-11T03:45:21Z

#12856 I built @microsoft/opentelemetry-exporter-azure-monitor package from the default branch after cloning and was able to use rush and the included npm scripts to build and pack a .tgz, which upon installing, had a different public interface than the previously published module.

Is my understanding correct that you are saying you are unable to verify the fix because of the change in available APIs?

jmealo · 2020-12-11T16:56:51Z

Are you in the teams chat with your preview customers who were instructed to use the hook you just removed? This will be unusable for everyone.

…

On Thu, Dec 10, 2020, 10:45 PM Jeff Fisher ***@***.***> wrote: #12856 <#12856> I built @microsoft/opentelemetry-exporter-azure-monitor package from the default branch after cloning and was able to use rush and the included npm scripts to build and pack a .tgz, which upon installing, had a different public interface than the previously published module. Is my understanding correct that you are saying you are unable to verify the fix because of the change in available APIs? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#12851 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIEMKEXYUVIBAHUWDVMGOTSUGIV5ANCNFSM4UVH4XOQ> .

jmealo · 2020-12-11T17:10:31Z

@xirzec: I pinged Matthew McCleary in the Teams chat to ask for you and the azure-monitor team to be invited. He's out of office until the 16th.

We're going to be refocusing out efforts to the Open Telemetry Exporter because the cross-team development effort between AI and Monitor teams seems to be in disarray (or suffering from holiday vacation times that overlap with our January shipping schedule). FWIW, Not much chatting going on there, I think that everyone has moved on because this is unusable in its current state.

ghost · 2020-12-11T21:53:58Z

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @sameergMS, @dadunl.

Issue Details

Package Name:
@azure/monitor-opentelemetry-exporter
Package Version:
1.0.0-preview.6
Operating system:
Linux
nodejs
- version:
browser
- name/version:
typescript
- version:
Is the bug related to documentation in
- README.md
- source code documentation
- SDK API docs on https://docs.microsoft.com

Describe the bug
Errors during the transmission or persistence of spans crashes the process under observation with an unhandled rejection rather than retrying the request or raising an error.

To Reproduce
Steps to reproduce the behavior:

Unplug/sever network connection or block DNS/HTTP traffic -- wait for next export to occur, it will throw an unhandled rejection with a value of 2.
1a. Get a 500 internal server error from /v2/track AI ingestion endpoint -- it will throw an unhandled rejection with a value of 2.

Expected behavior
For all azure-sdk js packages not to throw unhandled rejections/exceptions and to handle internal errors internally or provide error handling mechanisms for end-users. At the very least a use-able stack trace would be nice.

Additional context
There are unreleased fixes committed by @markwolff on GitHub that have not been published to NPM.

Author:	jmealo
Assignees:	-
Labels:	`Client`, `Monitor`, `Service Attention`, `customer-reported`, `question`
Milestone:	-

xirzec · 2020-12-12T01:41:10Z

I reviewed the changes in #12563 and made a small test app to try out @azure/opentelemetry-exporter-azure-monitor version "^1.0.0-alpha.20201211.1".

My test app just kept generating random spans while it ran, and when disconnecting from WiFi I saw some of the expected errors, e.g.

Envelopes could not be exported and are not retriable. Error message: request to https://dc.services.visualstudio.com/v2/track failed, reason: getaddrinfo ENOTFOUND dc.services.visualstudio.com

And while it does seem like it should handle ENOTFOUND as retriable, at least it didn't bring down the app and after reconnecting to WiFi, events started sending successfully once again.

So I think this issue is resolved by #12563, though I do understand that #12856 is blocking you until it is resolved.

hectorhdzg · 2020-12-15T21:26:22Z

This issue would be resolved in next release of the exporter.

Created a task to add retry logic when there are network issues
#12904

hectorhdzg · 2021-01-26T19:08:41Z

Fix available in latest release of the package
https://www.npmjs.com/package/@azure/opentelemetry-exporter-azure-monitor

jmealo · 2021-01-28T06:36:37Z

@hectorhdzg: @xirzec: @ramya-rao-a:

I’m sorry that this isn’t the correct channel, however, I don’t have permission to open support tickets as I’m an outside contractor. I had to switch away from Open Telemetry after earlier versions caused a production outage. Now we’re out of preview and have been trying for several months reproduce a small fraction of what we had with Open Telemetry using Application Insights. I’m running against an issue with the AI ingestion endpoint completely disregarding certain properties I send depending on device type and SDK version.

There is a conflict between App Center continuous backup and Application Insights. It appears the only way to send device/location/user data is with the continuous export as the ingestion endpoint doesn't process any of the well-known azure defined properties that we send depending on the device type and SDK version specified. None of this is documented for customers. I fear that the solution involves no less than two product teams and we have already dedicated hundreds of hours of engineering time working around backend issues on your side that support fails to address.

Support met with SMEs and the solution provided was to send less data and use eslint; which I found to be offensive and inappropriate. We need an SRE or engineering resource and for Case #12012282500654 (which the manager on the project had to open for me) to be escalated please.

Matt McCleary said he would escalate the ticket for us a few months ago but I no longer have access to message him on The Open Telemetry Preview teams chat. Thanks for your time/help.

jmealo · 2021-01-28T06:44:14Z

@Dawgfan ^

ramya-rao-a · 2021-01-28T18:11:58Z

I’m sorry that this isn’t the correct channel,

Hey @jmealo,

Assuming the problem is with using the @azure/monitor-opentelemetry-exporter package which has since been renamed to @azure/opentelemetry-exporter-azure-monitor with the most recent update happening last week, this GitHub repository is definitely the right place to open issues. Here, the engineers working on the package will be responding to you directly. Since folks from Azure Monitor service are also involved here, they can help with the question on the AI ingestion endpoint. We would advise going through Azure Support if you need someone to look at your Azure resources and query the data for investigative purposes, but otherwise this repository is the best place to log issues around the packages.

Can you consider logging a GitHub issue here with the details of the problem so that we have the history and the context to help you?

new readme file for resourcemover (Azure#12851)

This was referenced Dec 10, 2020

[monitor] Change package namespace back to @azure #12843

Merged

Provide CI testing infrastructure for long-running processes #12609

Closed

ramya-rao-a added Client This issue points to a problem in the data-plane of the library. Monitor Monitor, Monitor Ingestion, Monitor Query labels Dec 11, 2020

ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Dec 11, 2020

ramya-rao-a added the Service Attention Workflow: This issue is responsible by Azure service team. label Dec 11, 2020

ghost added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Dec 11, 2020

xirzec assigned hectorhdzg Dec 14, 2020

hectorhdzg closed this as completed Jan 26, 2021

openapi-sdkautomation bot pushed a commit to AzureSDKAutomation/azure-sdk-for-js that referenced this issue Feb 7, 2021

CodeGen from PR 12851 in Azure/azure-rest-api-specs

f5a6734

new readme file for resourcemover (Azure#12851)

github-actions bot locked and limited conversation to collaborators Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

monitor-opentelemetry-exporter: Unhandled rejection on network issues or 500 from AI /v2/track endpoint #12851

monitor-opentelemetry-exporter: Unhandled rejection on network issues or 500 from AI /v2/track endpoint #12851

jmealo commented Dec 10, 2020

jmealo commented Dec 10, 2020

jmealo commented Dec 10, 2020

xirzec commented Dec 11, 2020

jmealo commented Dec 11, 2020 via email

jmealo commented Dec 11, 2020

ghost commented Dec 11, 2020

xirzec commented Dec 12, 2020

hectorhdzg commented Dec 15, 2020

hectorhdzg commented Jan 26, 2021

jmealo commented Jan 28, 2021 •

edited

Loading

jmealo commented Jan 28, 2021

ramya-rao-a commented Jan 28, 2021

monitor-opentelemetry-exporter: Unhandled rejection on network issues or 500 from AI /v2/track endpoint #12851

monitor-opentelemetry-exporter: Unhandled rejection on network issues or 500 from AI /v2/track endpoint #12851

Comments

jmealo commented Dec 10, 2020

jmealo commented Dec 10, 2020

jmealo commented Dec 10, 2020

xirzec commented Dec 11, 2020

jmealo commented Dec 11, 2020 via email

jmealo commented Dec 11, 2020

ghost commented Dec 11, 2020

xirzec commented Dec 12, 2020

hectorhdzg commented Dec 15, 2020

hectorhdzg commented Jan 26, 2021

jmealo commented Jan 28, 2021 • edited Loading

jmealo commented Jan 28, 2021

ramya-rao-a commented Jan 28, 2021

jmealo commented Jan 28, 2021 •

edited

Loading