Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors that are expected by way of expected_status_codes should not impact Apdex #2594

Closed
fallwith opened this issue Apr 24, 2024 · 6 comments · Fixed by #2619
Closed

Errors that are expected by way of expected_status_codes should not impact Apdex #2594

fallwith opened this issue Apr 24, 2024 · 6 comments · Fixed by #2619
Labels
2 Story Point Estimate bug GTSE Issue associated with a previously logged support escalation

Comments

@fallwith
Copy link
Contributor

We have reports that errors with status codes matching specified expected_status_codes values are correctly appearing in the Errors Inbox as "expected", but are negatively impacting Apdex.

This relevant specification text confirms that expected errors should not impact Apdex.

Expected errors MUST create traced errors and error events, but MUST NOT increment any of the Errors/* metrics in order to prevent error rate and Apdex modifications.

We'll need to determine where in the agent or back-end the denotation of an error as expected fails to prevent that error from impacting Apdex.

NOTE: The use of ignore_status_codes works as a workaround. Ignored errors are seen as not impacting Apdex. The drawback of the workaround is that ignored errors are not delivered to the Errors Inbox.

@workato-integration
Copy link

@kford-newrelic kford-newrelic added GTSE Issue associated with a previously logged support escalation estimate Issue needing estimation labels Apr 26, 2024
@kford-newrelic kford-newrelic moved this from Triage to In Quarter in Ruby Engineering Board Apr 26, 2024
@fallwith fallwith moved this from In Quarter to In Sprint in Ruby Engineering Board May 2, 2024
fallwith added a commit that referenced this issue May 2, 2024
If an error is either expected (which still creates an entry in the
Errors Inbox) or ignored (which does not), the transaction's Apdex
should not factor in the error.

A bug was discovered involving errors that were expected by way of
their HTTP response codes that would cause those errors to impact Apdex.
Errors that were expected via the agent's Noticed Errors API would
correctly not impact Apdex, but HTTP response code based expected errors
would incorrectly do so.

Now ignored errors and both kinds of expected errors will correctly not
impact Apdex when observed during a transaction.

resolves #2594
@florianpilz
Copy link

Thank you so much for taking the time and digging into the problem. :) While the linked PR #2619 would be good enough for what we need, as far as I can see this would still let expected_classes and expected_messages impact the Apdex score -- right? As the PR has added a check specifically for expected_status_codes but not the other "expected" configuration options. See https://github.com/newrelic/newrelic-ruby-agent/blob/dev/newrelic.yml#L309-L322

@fallwith
Copy link
Contributor Author

fallwith commented May 2, 2024

Thank you so much for taking the time and digging into the problem. :) While the linked PR #2619 would be good enough for what we need, as far as I can see this would still let expected_classes and expected_messages impact the Apdex score -- right? As the PR has added a check specifically for expected_status_codes but not the other "expected" configuration options. See https://github.com/newrelic/newrelic-ruby-agent/blob/dev/newrelic.yml#L309-L322

Hi @florianpilz! I'm very glad to hear that the linked PR will be helpful!

As for expected error classes and error messages, I believe those are now covered as well. I think the code changes cover everything, but my text summary was insufficient.

Previously, the only way an expected error would not impact Apdex would be if the notice_error API was used with an expected: true key/value pair argument. With PR #2619, that check for an expected: true attribute on the error is still being checked, but we are now additionally performing an ErrorCollector#expected? check on the exception and the HTTP response code (which can be nil). The exception and optional HTTP response code are sent to ErrorFilter#expected? which looks like this:

def expected?(ex, status_code = nil)
  @expected_classes.include?(ex.class.name) ||
    (@expected_messages.key?(ex.class.name) &&
    @expected_messages[ex.class.name].any? { |m| ex.message.include?(m) }) ||
    @expected_status_codes.include?(status_code.to_i)
end

That method will take care of checking for configured expected classes and messages by looking at the exception argument, and then check for an expected HTTP status code by looking at the optional status code argument.

The inclusion of these #expected? methods is the big change the PR introduced.

So I think what happened is that the code in the PR does everything we want, but my description was limited by only mentioning the status codes. If after taking another look you disagree, please let me know and I'd be happy to investigate farther.

We'll keep you posted via comments on this issue for when these changes make it into a new agent gem release.

@fallwith
Copy link
Contributor Author

fallwith commented May 2, 2024

@florianpilz I have updated the PR description text, the CHANGELOG, and the source code comments. Thanks very much for pointing out that the error classes and error messages are also possible ways to expect errors! See acb8297

@github-project-automation github-project-automation bot moved this from In Sprint to Code Complete/Done in Ruby Engineering Board May 3, 2024
@fallwith
Copy link
Contributor Author

fallwith commented May 3, 2024

Hi @florianpilz. The PR has now been approved and merged, so the bugfix should go out in the next release of the New Relic Ruby agent. Thank you again for bringing this issue to our attention!

@florianpilz
Copy link

Really nice! You are right, I only got distracted by HTTP codes being explicitly passed. Makes all sense now and looking forward to the next release. 🎉 ❤️

@kford-newrelic kford-newrelic added 2 Story Point Estimate and removed estimate Issue needing estimation labels May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 Story Point Estimate bug GTSE Issue associated with a previously logged support escalation
Projects
Archived in project
3 participants