Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

monitor-opentelemetry-exporter: Unhandled rejection on network issues or 500 from AI /v2/track endpoint #12851

Closed
1 of 6 tasks
jmealo opened this issue Dec 10, 2020 · 12 comments
Closed
1 of 6 tasks
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor Monitor, Monitor Ingestion, Monitor Query needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.

Comments

@jmealo
Copy link

jmealo commented Dec 10, 2020

  • Package Name:
    @azure/monitor-opentelemetry-exporter
  • Package Version:
    1.0.0-preview.6
  • Operating system:
    Linux
  • nodejs
    • version:
  • browser
    • name/version:
  • typescript
    • version:
  • Is the bug related to documentation in

Describe the bug
Errors during the transmission or persistence of spans crashes the process under observation with an unhandled rejection rather than retrying the request or raising an error.

To Reproduce
Steps to reproduce the behavior:

  1. Unplug/sever network connection or block DNS/HTTP traffic -- wait for next export to occur, it will throw an unhandled rejection with a value of 2.
    1a. Get a 500 internal server error from /v2/track AI ingestion endpoint -- it will throw an unhandled rejection with a value of 2.

Expected behavior
For all azure-sdk js packages not to throw unhandled rejections/exceptions and to handle internal errors internally or provide error handling mechanisms for end-users. At the very least a use-able stack trace would be nice.

Additional context
There are unreleased fixes committed by @markwolff on GitHub that have not been published to NPM.

@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Dec 10, 2020
@jmealo
Copy link
Author

jmealo commented Dec 10, 2020

Ideally, something could be added to CI to test all azure-sdk for js modules: #12609

@jmealo
Copy link
Author

jmealo commented Dec 10, 2020

#12856 I built @microsoft/opentelemetry-exporter-azure-monitor package from the default branch after cloning and was able to use rush and the included npm scripts to build and pack a .tgz, which upon installing, had a different public interface than the previously published module.

@xirzec
Copy link
Member

xirzec commented Dec 11, 2020

#12856 I built @microsoft/opentelemetry-exporter-azure-monitor package from the default branch after cloning and was able to use rush and the included npm scripts to build and pack a .tgz, which upon installing, had a different public interface than the previously published module.

Is my understanding correct that you are saying you are unable to verify the fix because of the change in available APIs?

@jmealo
Copy link
Author

jmealo commented Dec 11, 2020 via email

@jmealo
Copy link
Author

jmealo commented Dec 11, 2020

@xirzec: I pinged Matthew McCleary in the Teams chat to ask for you and the azure-monitor team to be invited. He's out of office until the 16th.

We're going to be refocusing out efforts to the Open Telemetry Exporter because the cross-team development effort between AI and Monitor teams seems to be in disarray (or suffering from holiday vacation times that overlap with our January shipping schedule). FWIW, Not much chatting going on there, I think that everyone has moved on because this is unusable in its current state.

@ramya-rao-a ramya-rao-a added Client This issue points to a problem in the data-plane of the library. Monitor Monitor, Monitor Ingestion, Monitor Query labels Dec 11, 2020
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Dec 11, 2020
@ramya-rao-a ramya-rao-a added the Service Attention Workflow: This issue is responsible by Azure service team. label Dec 11, 2020
@ghost ghost added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Dec 11, 2020
@ghost
Copy link

ghost commented Dec 11, 2020

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @sameergMS, @dadunl.

Issue Details
  • Package Name:
    @azure/monitor-opentelemetry-exporter
  • Package Version:
    1.0.0-preview.6
  • Operating system:
    Linux
  • nodejs
    • version:
  • browser
    • name/version:
  • typescript
    • version:
  • Is the bug related to documentation in

Describe the bug
Errors during the transmission or persistence of spans crashes the process under observation with an unhandled rejection rather than retrying the request or raising an error.

To Reproduce
Steps to reproduce the behavior:

  1. Unplug/sever network connection or block DNS/HTTP traffic -- wait for next export to occur, it will throw an unhandled rejection with a value of 2.
    1a. Get a 500 internal server error from /v2/track AI ingestion endpoint -- it will throw an unhandled rejection with a value of 2.

Expected behavior
For all azure-sdk js packages not to throw unhandled rejections/exceptions and to handle internal errors internally or provide error handling mechanisms for end-users. At the very least a use-able stack trace would be nice.

Additional context
There are unreleased fixes committed by @markwolff on GitHub that have not been published to NPM.

Author: jmealo
Assignees: -
Labels:

Client, Monitor, Service Attention, customer-reported, question

Milestone: -

@xirzec
Copy link
Member

xirzec commented Dec 12, 2020

I reviewed the changes in #12563 and made a small test app to try out @azure/opentelemetry-exporter-azure-monitor version "^1.0.0-alpha.20201211.1".

My test app just kept generating random spans while it ran, and when disconnecting from WiFi I saw some of the expected errors, e.g.

Envelopes could not be exported and are not retriable. Error message: request to https://dc.services.visualstudio.com/v2/track failed, reason: getaddrinfo ENOTFOUND dc.services.visualstudio.com

And while it does seem like it should handle ENOTFOUND as retriable, at least it didn't bring down the app and after reconnecting to WiFi, events started sending successfully once again.

So I think this issue is resolved by #12563, though I do understand that #12856 is blocking you until it is resolved.

@hectorhdzg
Copy link
Member

This issue would be resolved in next release of the exporter.

Created a task to add retry logic when there are network issues
#12904

@hectorhdzg
Copy link
Member

Fix available in latest release of the package
https://www.npmjs.com/package/@azure/opentelemetry-exporter-azure-monitor

@jmealo
Copy link
Author

jmealo commented Jan 28, 2021

@hectorhdzg: @xirzec: @ramya-rao-a:

I’m sorry that this isn’t the correct channel, however, I don’t have permission to open support tickets as I’m an outside contractor. I had to switch away from Open Telemetry after earlier versions caused a production outage. Now we’re out of preview and have been trying for several months reproduce a small fraction of what we had with Open Telemetry using Application Insights. I’m running against an issue with the AI ingestion endpoint completely disregarding certain properties I send depending on device type and SDK version.

There is a conflict between App Center continuous backup and Application Insights. It appears the only way to send device/location/user data is with the continuous export as the ingestion endpoint doesn't process any of the well-known azure defined properties that we send depending on the device type and SDK version specified. None of this is documented for customers. I fear that the solution involves no less than two product teams and we have already dedicated hundreds of hours of engineering time working around backend issues on your side that support fails to address.

Support met with SMEs and the solution provided was to send less data and use eslint; which I found to be offensive and inappropriate. We need an SRE or engineering resource and for Case #12012282500654 (which the manager on the project had to open for me) to be escalated please.

Matt McCleary said he would escalate the ticket for us a few months ago but I no longer have access to message him on The Open Telemetry Preview teams chat. Thanks for your time/help.

@jmealo
Copy link
Author

jmealo commented Jan 28, 2021

@Dawgfan ^

@ramya-rao-a
Copy link
Contributor

I’m sorry that this isn’t the correct channel,

Hey @jmealo,

Assuming the problem is with using the @azure/monitor-opentelemetry-exporter package which has since been renamed to @azure/opentelemetry-exporter-azure-monitor with the most recent update happening last week, this GitHub repository is definitely the right place to open issues. Here, the engineers working on the package will be responding to you directly. Since folks from Azure Monitor service are also involved here, they can help with the question on the AI ingestion endpoint. We would advise going through Azure Support if you need someone to look at your Azure resources and query the data for investigative purposes, but otherwise this repository is the best place to log issues around the packages.

Can you consider logging a GitHub issue here with the details of the problem so that we have the history and the context to help you?

openapi-sdkautomation bot pushed a commit to AzureSDKAutomation/azure-sdk-for-js that referenced this issue Feb 7, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor Monitor, Monitor Ingestion, Monitor Query needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

4 participants