Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

[release/3.1] Handle Counter Polling Interval of 0 #28180

Conversation

josalem
Copy link

@josalem josalem commented Jun 10, 2021

Backport of dotnet/runtime#53836 to release/3.1

Customer Impact

When listening to EventCounters there is a dedicated thread that is supposed to publish the metric values via EventSource on a periodic timer. However due a bug it is possible to put it into an infinite loop here. This occurs any time _pollingIntervalMilliseconds <= 0, which could occur for a few reasons:

  1. The listener specified EventCounterIntervalSec=0 after previously specifying a non-zero value. We would never expect a well-behaved listener to request a 0 sec interval, but the runtime shouldn't blindly trust this input.
  2. The listener enables and then disables the EventSource. Due to a race the infinite loop code could be run prior to re-checking _eventSource.IsEnabled() and it would observe _pollingIntervaMilliseconds=0

(from dotnet/runtime#53564)

Testing

The included test covers scenario 1, and manual testing was done for scenario 2.

Risk

This patch reduces overall risk by removing any looping based on user input. This should prevent scenario 1 and 2 above.

@josalem josalem requested review from noahfalk and jeffschwMSFT June 10, 2021 22:56
@josalem josalem changed the title Backport dotnet/runtime#53836 [release/3.1] Backport dotnet/runtime#53836 Jun 11, 2021
@josalem
Copy link
Author

josalem commented Jun 11, 2021

The test failures appear to be different from what this patch is intending to fix:

Return code:      1
Raw output file:      C:\h\w\B3E80993\w\BD5C09D1\e\tracing\eventcounter\Reports\tracing.eventcounter\gh53564\gh53564.output.txt
Raw output:
BEGIN EXECUTION
 "C:\h\w\B3E80993\p\corerun.exe" gh53564.exe 
[01:52:36.143] Setting interval to 1
[01:52:39.479] Setting interval to 0
[01:52:42.489] Setting interval to 1
[01:52:45.502] Setting ReadyToVerify
[01:52:45.507] Ready to verify
Test Failed - did not see one or more of the expected runtime counters.
Expected: 100
Actual: 1
END EXECUTION - FAILED
FAILED
Test Harness Exitcode is : 1
To run the test:
> set CORE_ROOT=C:\h\w\B3E80993\p
> C:\h\w\B3E80993\w\BD5C09D1\e\tracing\eventcounter\gh53564\gh53564.cmd

The failure I would expect from original issue would be for the test to hang after setting the interval to 1. I'm inclined to think this is a timing issue in the test when run in CI and on the older version of EventPipe without some of the performance patches in 5.

I'll update the test with something more explicit than just waiting after each step.

@janvorli
Copy link
Member

@josalem I've just hit the failure in this test in my PR in the runtime main branch in all Linux MUSL legs. See https://dev.azure.com/dnceng/public/_build/results?buildId=1183549&view=ms.vss-test-web.build-test-results-tab.

@josalem
Copy link
Author

josalem commented Jun 11, 2021

Alright, let me make a quick PR to fix that test upstream and I'll update this PR as well. Looks like my times are too tight for some CI legs.

@wtgodbe
Copy link
Member

wtgodbe commented Jun 14, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeffschwMSFT jeffschwMSFT added area-Diagnostics Servicing-consider Issue for next servicing release review labels Jun 15, 2021
Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. We will take for consideration in 3.1.x

@jeffschwMSFT jeffschwMSFT added this to the 3.1.x milestone Jun 15, 2021
@danmoseley danmoseley changed the title [release/3.1] Backport dotnet/runtime#53836 [release/3.1] Handle Counter Polling Interval of 0 Jun 15, 2021
@danmoseley
Copy link
Member

fixing title to match our usual pattern...

@danmoseley
Copy link
Member

@danmoseley
Copy link
Member

The test failures are the cookie ones (will be fixed by parallel PR and can be ignored) and DynamicMethodGCStress.

@josalem
Copy link
Author

josalem commented Jun 15, 2021

@danmoseley I saw those. They look to be related to System.Globalization.Native:

2021-06-14T21:02:55.8649200Z clang: error: linker command failed with exit code 1 (use -v to see invocation)
2021-06-14T21:02:55.8738650Z make[2]: *** [src/corefx/System.Globalization.Native/System.Globalization.Native.dylib] Error 1
2021-06-14T21:02:55.8840410Z make[1]: *** [src/corefx/System.Globalization.Native/CMakeFiles/System.Globalization.Native.dir/all] Error 2

Based on the warnings they looks like the compiler is complaining of deprecated API usage.

...
2021-06-14T21:02:55.3122780Z /usr/local/opt/icu4c/include/unicode/umachine.h:96:52: note: expanded from macro 'U_ATTRIBUTE_DEPRECATED'
2021-06-14T21:02:55.3223370Z #    define U_ATTRIBUTE_DEPRECATED __attribute__ ((deprecated))
2021-06-14T21:02:55.3325030Z                                                    ^
2021-06-14T21:02:55.3426540Z /Users/runner/work/1/s/src/corefx/System.Globalization.Native/pal_icushim.c:17:1: warning: 'ucol_setVariableTop' is deprecated [-Wdeprecated-declarations]
2021-06-14T21:02:55.3527170Z FOR_ALL_ICU_FUNCTIONS
2021-06-14T21:02:55.3628830Z ^
2021-06-14T21:02:55.3730520Z /Users/runner/work/1/s/src/corefx/System.Globalization.Native/pal_icushim.h:137:24: note: expanded from macro 'FOR_ALL_ICU_FUNCTIONS'
2021-06-14T21:02:55.3830850Z     PER_FUNCTION_BLOCK(ucol_setVariableTop, libicui18n)
2021-06-14T21:02:55.3932760Z                        ^
2021-06-14T21:02:55.4034000Z /usr/local/opt/icu4c/include/unicode/ucol.h:1275:1: note: 'ucol_setVariableTop' has been explicitly marked deprecated here
2021-06-14T21:02:55.4134750Z U_DEPRECATED uint32_t U_EXPORT2 
2021-06-14T21:02:55.4236050Z ^
2021-06-14T21:02:55.4327550Z /usr/local/opt/icu4c/include/unicode/umachine.h:116:29: note: expanded from macro 'U_DEPRECATED'
2021-06-14T21:02:55.4429510Z #define U_DEPRECATED U_CAPI U_ATTRIBUTE_DEPRECATED
...

Is it possible a compiler version or the version of the icu4c library changed? This patch shouldn't affect those at all.

I couldn't build the release/3.1 branch at all on my local mac either, with or without my change.

@danmoseley
Copy link
Member

@josalem you're right, I see the same errors on #28179.

@dotnet/area-system-globalization can someone please advise? the build on mac has broken.

@leecow leecow added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Jun 15, 2021
@leecow leecow modified the milestones: 3.1.x, 3.1.17 Jun 15, 2021
@danmoseley
Copy link
Member

@safern @tarekgh looks similar to dotnet/runtime#39583

@danmoseley
Copy link
Member

Yes, we need to port https://github.com/dotnet/runtime/pull/39900/files to CoreCLR release/3.1

@safern
Copy link
Member

safern commented Jun 15, 2021

@danmoseley you are correct. I can port it if help is needed.

@danmoseley
Copy link
Member

@safern would you mind? I wouldn't normally ask but this change needs to get in for an internal customer.

@tarekgh
Copy link
Member

tarekgh commented Jun 15, 2021

@danmoseley talked offline with @safern and he confirmed he is going to port the fix.

@safern
Copy link
Member

safern commented Jun 15, 2021

I created: #28181

@wtgodbe
Copy link
Member

wtgodbe commented Jun 15, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@danmoseley
Copy link
Member

The only failures are in System.Net.Primitives. It won't let me see the Test results (build not found?) but presumably it's the cookies issue, which will be fixed by the corefx ingestion PR that is up right now. When we merge that, we can rerun this.

Or, if we don't want to wait, maybe someone can recover the test failures and we can verify it's just the cookie ones. Then we can merge immediately.

@jeffschwMSFT
Copy link
Member

The failures were the Mac build issue (which should be resolved) and the System.Net failures.

@wtgodbe
Copy link
Member

wtgodbe commented Jun 15, 2021

There was also a failure in System.Text.RegularExpressions.Tests: https://dev.azure.com/dnceng/public/_build/results?buildId=1184134&view=ms.vss-test-web.build-test-results-tab&runId=35606250&resultId=168473&paneView=debug

I don't think we have time to get in the dependency updates from corefx/core-setup, then this. The current CI run for this is https://dev.azure.com/dnceng/public/_build/results?buildId=1189190&view=results - if that looks good I think we should merge this, and hold the dependency PRs for next month

@danmoseley
Copy link
Member

The regex failure will be also fixed by the corefx insertion (dotnet/corefx@23e15f8)

I'm fine merging this.

@jeffschwMSFT
Copy link
Member

I will merge and keep an eye on the run tonight.

@jeffschwMSFT jeffschwMSFT merged commit 711e612 into dotnet:release/3.1 Jun 15, 2021
@wtgodbe
Copy link
Member

wtgodbe commented Jun 15, 2021

Note that there was a merge conflict against internal/release/3.1 in the last PR, which @safern is fixing up manually

@danmoseley
Copy link
Member

Thank you @wtgodbe and @safern !

hoyosjs added a commit that referenced this pull request Jul 14, 2021
* Update branding to 3.1.17 (#28173)

* Port from 5.0: Fix Position Independent Code in CMake files (#28143)

* CoreCLR PR26323 Port: Fix PIE options

* Added missing PIE and RELRO compilation flags.

* Merged PR 15832: Port from 5.0: Fix Position Independent Code in CMake files (#28143)

Port from 5.0: Fix Position Independent Code in CMake files (#28143)

* CoreCLR PR26323 Port: Fix PIE options

* Added missing PIE and RELRO compilation flags.

* Fix System.Globalization.Native build on Big Sur (#28181)

* Fix System.Globalization.Native build on Big Sur

* Fix build

* Add flags for Linux

* [release/3.1] Handle Counter Polling Interval of 0 (#28180)

* Backport  dotnet/runtime#53836

* Fix test build

* update test

Co-authored-by: Jan Jahoda <[email protected]>
Co-authored-by: Ivan Diaz Sanchez <[email protected]>
Co-authored-by: Will Godbe <[email protected]>
Co-authored-by: Santiago Fernandez Madero <[email protected]>
Co-authored-by: John Salem <[email protected]>
hoyosjs added a commit that referenced this pull request Aug 11, 2021
* Update branding to 3.1.17 (#28173)

* Port from 5.0: Fix Position Independent Code in CMake files (#28143)

* CoreCLR PR26323 Port: Fix PIE options

* Added missing PIE and RELRO compilation flags.

* Merged PR 15832: Port from 5.0: Fix Position Independent Code in CMake files (#28143)

Port from 5.0: Fix Position Independent Code in CMake files (#28143)

* CoreCLR PR26323 Port: Fix PIE options

* Added missing PIE and RELRO compilation flags.

* Fix System.Globalization.Native build on Big Sur (#28181)

* Fix System.Globalization.Native build on Big Sur

* Fix build

* Add flags for Linux

* [release/3.1] Handle Counter Polling Interval of 0 (#28180)

* Backport  dotnet/runtime#53836

* Fix test build

* update test

* update branding to 3.1.18 (#28182)

* Update dependencies from https://github.com/dotnet/core-setup build 20210609.1 (#28178)

Microsoft.NETCore.App
 From Version 3.1.9-servicing.20459.3 -> To Version 3.1.17-servicing.21309.1

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>

* Merged PR 15716: [release/3.1] Use user read+write access for coredump file descriptor open

* [release/3.1] #28183 Fix OSX native dependency installation

* Avoid upgrading packages that are explicitly installed.
* Use a brewfile
* Change shebang to use bash and directly execute.

* [release/3.1] Update dependencies from dotnet/corefx (#28179)

[release/3.1] Update dependencies from dotnet/corefx


 - Add 3.1 AzDO feeds to the test build to ensure that the test build can correctly restore the CoreFX build.

It seems that the new CoreFX build isn't on the old blob feeds, so without this fix, we end up restoring the first 5.0.0-alpha.1 build instead of the build we want.

 - Exclude tests that depend on OOB assemblies.

 - Turn off CoreFX test legs in CI.

Now that the branch is on servicing, and the churn is low, exclude these as they
break far more often than they detect issues. These already run in CoreFX CI on release bits.
In case we want to bring them back for checked testing, we need to fix CoreFX.depproj.
It has a package - Microsoft.Private.CoreFx.OOB - that's supposed to bring in all deps
that are out of box. These are currently not getting restored and this ends up causing
File not found issues in the binder when compiling tests, making test exclusions impossible.
Please enter the commit message for your changes. Lines starting
with '#' will be ignored, and an empty message aborts the commit.

Co-authored-by: Jan Jahoda <[email protected]>
Co-authored-by: Ivan Diaz Sanchez <[email protected]>
Co-authored-by: Will Godbe <[email protected]>
Co-authored-by: Santiago Fernandez Madero <[email protected]>
Co-authored-by: John Salem <[email protected]>
Co-authored-by: Anirudh Agnihotry <[email protected]>
Co-authored-by: dotnet-bot <[email protected]>
Co-authored-by: dotnet-maestro[bot] <42748379+dotnet-maestro[bot]@users.noreply.github.com>
Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Diagnostics Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants