-
Notifications
You must be signed in to change notification settings - Fork 520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error The WMI service or the WMI provider returned an unknown error: HRESULT 0x80041033 when importing ActiveSyncDeviceAccessRules #4982
Comments
Command I run: |
I'm still having this issue. It's only happening with EXO resources but not the same one consistently. The only consistency seems to be that the WMI Provider Host process hits about 4100 MB and then it crashes. If I run 1.24.403.1 the WMI Provider Host doesn't use nearly as much memory, and it appears to return memory to the system throughout the test. I've tried pwsh 7 as well as three other machines (Win11 and 2022) and it doesn't help. |
@jadamones Quick question: When you check the memory consumption in Task Manager, is the WMI Provider Host process running in 32-bit? If yes, that would explain why it fails, since 32-bit applications have a memory limit of 4GB. Unfortunately, I'm not aware of any way how to switch to 64-bit for the WMI process, which would mitigate that issue. |
I check my system and WmiPrvSE.exe is a 64bit application. After consuming a bit more than 4 GB memory, the process changes the status to "suspended" and afterwards the import fails with the mentioned error. |
Interessting point is also, that after crashing of WmiPrvSE.exe the process is starting again, even after the powershell command reported the error: |
Hey @FabienTschanz I wondered that as well and should've mentioned that I already confirmed that. It is indeed the 64-bit provider. |
Looking at what changed between 1.24.403.1 and the next version where this problem started for me is documented below for the EXO workload. Nothing is jumping out at me but I'm not sure what went into the Misc changes. Interestingly, @JbMachtMuschel - EXOActiveSyncDeviceAccessRules were introduced. I don't have those rules defined in any of the tenants where this is a problem, and I don't monitor or define the EXOMailboxSettings resource in any of my configurations, so in my case, it doesn't appear to be an issue with either of those resources. EXOActiveSyncDeviceAccessRule
EXOMailboxSettings
MISC
Telemetry
|
@jadamones In that view, only the application name is visible, but not if it's 32-Bit or not. That is only visible in the "Processes" view of the Task Manager, when you scroll down to "Background Processes" --> WMI Provider Host (32 bit). If you go to details for a 32-bit process, you'll end up with the same view as you wrote. Unfortunately I don't have an Exchange infrastructure in my test lab (and I have no idea how to manage that 😅), so I'm not of much help even if I have your configuration. Still I will take a look at what changed in the code just as you did, maybe we can find something. |
Also @JbMachtMuschel I do not experience the loop. The WMI Provider just crashes and does not restart. |
Soon I will restart.. QueryString : iOS1 |
Hey @FabienTschanz thank you for helping out with this and looking into it! The 7th column over in that screenshot shows the system architecture as x64 for the process but regardless, I do not have the (32 bit) tag in the name under the processes view for any of them or the one that's running up the memory during a Test-DscConfiguration. Can also confirm that the file path is System32 and not SysWow64 for that particular process. |
@JbMachtMuschel Thanks, I will create some when I have a chance just to see if I get the loop. |
@jadamones Thank you. The loop is one problem, but the memory consumption of WmiPrvSE.exe seems to be the causing problem. I checked https://learn.microsoft.com/en-us/troubleshoot/windows-server/system-management-components/scenario-guide-troubleshoot-wmiprvse-quota-exceeded-issues |
@jadamones Of course, silly me. Sometimes I'm like a blind chicken. Aaanyways, under the assumption that the WMI process has a memory limit of 4GB, we could try to increase the quota per process and overall like the following in an elevated Windows PowerShell session (according to https://techcommunity.microsoft.com/t5/ask-the-performance-team/wmi-high-memory-usage-by-wmi-service-or-wmiprvse-exe/ba-p/375491): $quotaConfiguration = Get-WmiObject -Class __ProviderHostQuotaConfiguration -Namespace Root
$quotaConfiguration.MemoryAllHosts = 4 * 4GB
$quotaConfiguration.MemoryPerHost = 3 * 4GB
Set-WmiInstance -InputObject $quotaConfiguration I don't know if a restart is required, but just to be sure, I would recommend one. Another approach we can take is to bump up the |
@jadamones Thank you...the quotaConfiguration seems to be an approch, but I guess that this will reach the 16GB limit. I will inform you. Currently I am inporting the Rules and approx. 110 are already imported, but the memory consumption is very high: |
The EXO management module is a known memory hog ever since they moved to using REST API and it seems to only get worse and worse on each release, you can search in the interwebs that several people complain about this. Heck, there's even an official article [0] on how to reduce its memory usage which by the way doesn't solve the problem since it also leaks the huge amounts of memory it allocates and is never freed!!! Additionally to this the module is also prone to spitting random 0x800... errors so to workaround this problem I always loop the deployment until a max of 3 attempts if it fails. I have this setup in DevOps pipelines which run on discardable containers so they are memory constrained and always fail if no changes are done, but with the code below, which is a variation of what @FabienTschanz posted before, I can get it going along with the rest of all the workloads I use. Please bear in mind that this allows to allocate all available memory of the machine to the WMI process (mem is not allocated all at once, only when required) so you may need to do some math to calculate better values for your requirements. #region Increase Memory Quota for WMI processes
try {
$ComputerSystem = Get-CimInstance -ClassName "Win32_ComputerSystem"
}
catch {
throw $_
}
if (![String]::IsNullOrEmpty($ComputerSystem)) {
$TotalPhysicalMemory = $ComputerSystem.TotalPhysicalMemory
if (![String]::IsNullOrEmpty($TotalPhysicalMemory)) {
try {
$Quota = Get-CimInstance -Namespace "Root" `
-Class "__ProviderHostQuotaConfiguration"
}
catch {
throw $_
}
if (![String]::IsNullOrEmpty($Quota)) {
if ($Quota.MemoryAllHosts -ne $TotalPhysicalMemory) {
$Quota.MemoryAllHosts = $TotalPhysicalMemory
$Quota.MemoryPerHost = $TotalPhysicalMemory
$Quota.HandlesPerHost = 8192
$Quota.ThreadsPerHost = 512
Write-Output "Increasing WMI processes memory quota"
try {
Set-CimInstance -InputObject $Quota
}
catch {
throw $_
}
$WMIProcesses = Get-Process -Name "WMIPrvSE" `
-ErrorAction "SilentlyContinue"
if ($WMIProcesses.Count -ne 0) {
Write-Output "Restarting WMI processes"
foreach ($WMIProcess in $WMIProcesses) {
try {
Stop-Process -Id $WMIProcess.Id -Force
}
catch {
throw $_
}
}
}
}
}
}
}
#endregion Increase Memory Quota for WMI processes |
@ricmestre Thank you for the information. |
I was able to import the ActiveSyncDeviceAccessRules now :) I did a retry to observe how DSC is working on onready created rules and after a while the command throw this error: VERBOSE: Operation 'Invoke CimMethod' complete. Interessting is, that the WMI process "never gives up", even after deleting all ActiveSyncDeviceAccessRules it starts again to create them - the powershell is not involved, I terminated the shell. So I killed the WMI process now. |
Thank you all for all the effort on this. Unfortunately, these haven't resolved my issue :( . I updated the host quota as suggested, and I attempted an upgrade of the Exchange Online module (maybe I'm missing something here). After upgrading the Exchange module to 3.5.1, I get this error |
@jadamones I have had the same error with the 3.5.1 module and certificate based login. Just now I am using 3.4.0. |
Oh SMH... 🤦♂️ I realized that I didn't reboot the machine after updating the quotas. That seems to have done the trick! Interesting that the documentation just says to restart the WMI service, but definitely seems to be working after a reboot now. Thank you all for your input here. Much appreciated! |
@ricmestre Do we want to document this as a workaround somewhere? Just for the sake that it is documented and that there is something we can actually do to circumvent the issue? |
@FabienTschanz I wouldn't mind if you add something for example here [0] which looks scarcely empty and both you and I know that there are other issues out there that are not documented how to fix it or at least workaround them, but for this particular case I'd really like to have some kind of improvement in the module itself. @NikCharlebois @ykuijs @andikrueger @desmay @malauter Hi, is this something that any of you can take to the EXO team in order to solve it or at least improve the experience? Running a cmdlet here and there is one thing, another one is trying to import the whole workload to other tenants using M365DSC, or even just trying to deploy a single resource but that contains hundreds or even thousands of children which will only exacerbate this known memory issue. [0] https://microsoft365dsc.com/user-guide/get-started/troubleshooting/ |
@ricmestre I will add an entry in the troubleshooting section for this issue and raise a PR. |
Shall I open another issue regarding? |
That's how the LCM works, check https://learn.microsoft.com/en-us/powershell/dsc/managing-nodes/metaconfig?view=dsc-1.1 and settings ConfigurationMode and ConfigurationModeFrequencyMins |
Hi again, I did several tests with different ExchangeOnlinemanagement module versions and only 3.4.0 is working - 3.5.0 and 3.5.1 is throwing "Unable to decode the header '[PII of type 'System.String' is hidden". Unfortunately the error is back. It seems that WMI process does NOT release memory like in the image below, it crashes again. Then I restarted the process and once it released memory, but a while afterwards, it crashed again. @ricmestre Thanks for the info, I was not aware of this. |
That's what I expected it to be as well, but this would mean that the About the eventlog for MSCloudLoginAssistant: Instead of just flooding the event log with many events, why not "hide" it behind a custom environment variable that can be set, which will then trigger the |
Indeed, the If we cannot use the global scope, but the script scope is persisted, the logic of MSCloudLoginAssistant most likely has to cache all its state internally and expose it through public functions to retrieve. Currently, access to the $global scope is possible because every resource reauthenticates again, setting all of the connection information like Edit: To be blunt, this should be the approach: Caching relevant state internally and expose it through specific APIs is best practice and much easier than knowing which global variable is responsible for what without documentation. The best documentation from an API is through its clear naming convention. |
I'm not qualified to comment authoritatively on the 'big picture' solution here, but wanted to mention a couple of details which may be relevant: On verbose logging When trying to work out why Verbose Logging was so difficult to enable, I learned that in PowerShell, code executed in modules doesn't inherit the VerbosePreference from the calling script. Could we implement this from Stack Overflow in some of the modules so that it's not hard for end users to enable (or could we change modules like MSCloudLoginAssistant to log using Write-Debug, i.e. Verbose logging is used for resource-level logging, and Debug is used for modules like MSCloudLoginAssistant?) https://stackoverflow.com/questions/44900568/how-to-propagate-verbose-to-module-functions On On why debug mode may work differently a couple of possibly relevant quotes When debugging DSC
(so behaviour may change). And in general
(which may explain why $Global does not persist). https://devblogs.microsoft.com/powershell/debugging-powershell-dsc-class-resources/ Also someone asking what appears to be the same question (although in May 2016 so don't know if things changed e.g. when PowerShell 5.1 was released in August 2016)
https://forums.powershell.org/t/passing-variable-between-dsc-resources-at-runtime/6397/2 On general points Please be aware that all my logs above are from Microsoft365DSC 1.24.904.1 - it's a bit out of date, but I've found upgrades very painful in the past (not only due to issues with this project, but also assembly issues with the underlying PowerShell modules interacting with other PowerShell modules on the same server) - just be aware to avoid any head-scratching as you debug with the latest version. Also I am compiling everything using Azure Desired State Configuration so this is Windows PowerShell 5.1 throughout - I don't think this is relevant since I'm fairly sure this is an LCM-related issue and the LCM isn't even supported in PowerShell 7+, but just wanted to mention it. |
Thank you @Borgquite for the two links, I was not aware of them. As you said, they confirm what I suspected unfortunately for |
@FabienTschanz No worries. Have we confirmed that the connections themselves to Microsoft Graph, Exchange Online et al are able to survive when the LCM resets the runspace? (i.e. it's not necessary to run Connect-* again after the reset?) If not, this is all a bit moot? |
@Borgquite |
@FabienTschanz Yep - I guess it would be good to specifically check Exchange too, as I know that module can work quite differently to the rest. (Myself also working on other DSC stuff so appreciate this is 'as we get time :)' (NB Even if the modules don't persist, the specific Exchange memory issue is still solvable by creating a new PowerShell process for each Exchange Online connection - per tip 3). This should solve the memory issue, but would be a last resort (as performance wise, it doesn't help). |
@Borgquite I can confirm that also the authentication context of ExchangeOnline seems to survive. I was able to call @ykuijs and @NikCharlebois: Before we start throwing around everything, what do you think of the following proposition:
This way, we would ensure that the whole thing is only modifiable inside of |
@ykuijs @NikCharlebois Please give us your opinion so that we can continue towards a working version. Thank you. |
Excellent work in discovering this. Never knew this and always assumed the Global scope acted the same way as in normal PS sessions.......assumptions 😄
I like that idea. This environment variable needs to be a system wide variable......it is possible to create a user session environment variable as well. Since the LCM is running as System, it doesn't have access to the user session variable. Just one question: Within M365DSC we already log a lot. If we add the verbose logging of the MSCloudLoginAssistant to that, doesn't the output become huge and therefore difficult to read? We could combine both ideas to only log to the event log after setting the env variable.
Great idea, totally agree! We should also add a function to reset the current state (Disconnect all sessions correctly, clear the variable data, etc). Just to make sure it is easy to start over. |
@ykuijs Thank you very much for your opinion. Then I'll go ahead and migrate from the global to script scope, expose some functions to fetch and clear the state and also move the output of |
@FabienTschanz @ykuijs Thanks guys! Hopefully this will improve performance as well as solve the high memory usage issue - a really impactful change. |
@ykuijs @NikCharlebois PR open: microsoft/MSCloudLoginAssistant#188. There is a dependency for some other functionality necessary: See microsoft/MSCloudLoginAssistant#185. I tested it on my machine and it seems like the connection now is cached and used for all subsequent runs of any |
Excellent work. Just reviewed the PR. Just one small question. After that, we can merge this PR and release a new version which can then be included into the next version of M365DSC. |
@FabienTschanz Nik and I just merged the PRs in the MSCloudLoginAssistant. During the release of the new version of M365DSC we discovered an issue that we didn't think of: Would you have the opportunity to change these to the new Get function? |
Man, you are quick 😄 👍 |
Indeed 😆 I was so surprised to see it being merged and just released, I expected it to not be released for about another day so we would have time to properly migrate and check if it actually works. I expect it to work, but don't want to guarantee it 😓 |
Soo, after using it for today, it seems very much okay. I ran it repeatedly against a quick setup of mine using a couple of EXO distribution and AAD groups, everything switching in between them. Even after five consecutive runs, the memory stayed roughly the same (+/- 250MB at around 1.3GB) after the first run. I can't say how it looks like for a configuration file with, let's say, 300 items, my biggest setup was ~50 elements on my test tenant. Very curious to see what others get - A couple of optimizations in the module will follow. Please post your experiences with the new version here. |
@FabienTschanz I updated Microsoft365DSC to 1.24.1211.1 yesterday, and it was very promising! The configuration which was currently taking multiple hours (increasing as the memory is consumed), now appears to be consistently taking 7 minutes :) Great work! I've set the system-wide environment variable 'MSCLOUDLOGINASSISTANT_WRITETOEVENTLOG' to $true to see what's happening - as far as I can tell, I still receive a lot of 'Connect-M365Tenant' 'Resetting connection profile' events though (using the configuration file I previously sent you via LinkedIn) - is that expected? |
@Borgquite Thanks so much for testing and super cool it's now massively quicker. Some resources need to perform a reset of the connection profile, so that might be explicit, e.g. # Need to force reconnect to Exchange for the new permissions to kick in.
if ($null -ne (Get-MSCloudLoginConnectionProfile -Workload ExchangeOnline))
{
Write-Verbose -Message 'Waiting for 20 seconds for new permissions to be effective.'
Start-Sleep 20
Write-Verbose -Message 'Disconnecting from Exchange Online'
Reset-MSCloudLoginConnectionProfileContext
} I don't know if this is necessary. My PR #5565 was just merged, so only the requested context will be reset (instead of every workload). This will probably also help a bit for the future. Do you have a couple of logs to share? |
@FabienTschanz Yeah - the 'Resetting connection profile' events have 'Connect-M365Tenant' as the source, and many relate to MicrosoftGraph (not Reset-MSCloudLoginConnectionProfileContext or EXO) so I don't think that fix will help. I would share logs, but it's quite difficult to show how this relates to different resources running, since the Verbose logging for the resources in the PowerShell window is now separate from the MSCloudLoginAssistant event logs (I do wonder if we'd be better off making that environment variable just enable Verbose logging for MSCloudLoginAssistant, so it's easier to see the interaction, even despite the concern about noise). I did wonder if it might be the Compare-InputParametersForChange call is faulty. As far as I am aware, all my connection strings are identical (see the files I sent you earlier). If you can't immediately spot a potential issue let me know what logs you need to troubleshoot this and in what format, I can do my best to share. |
@FabienTschanz Have been looking through the logs & am fairly sure this 'reconnect lots and lots' issue is only affecting my Graph resources, not Exchange Online (I can see in the Azure sign in logs that the LCM connects to EXO roughly every 15 minutes, i.e. once per LCM session, whereas the LCM connects to Microsoft Graph every few seconds.
Sorry - above statement is incorrect, I just wasn't seeing the TenantId in my configs for Graph, but it's there. Nonetheless there may be an issue with Connect-M365Tenant when it comes to Graph, which results in connection resets, that doesn't affect EXO. |
@Borgquite Alright, I'll take a look at it once I have time with a (hopefully) large enough sample of my tenant. |
@Borgquite I just created a PR that will fix those Microsoft Graph reconnections. It was indeed the |
@FabienTschanz Awesome! I'm on Christmas break (visiting Switzerland with family!) from now until January, but it feels like you've nailed it. I'll test and confirm when I get back :) Happy Christmas! |
Description of the issue
I am trying to import more than 400 ActiveSyncDeviceAccessRules into another tenant, but after approx. 90 rules this error appears:
The WS-Management service cannot process the request. The WMI service or the WMI provider returned an unknown error: HRESULT 0x80041033
+ CategoryInfo : ResourceUnavailable: (root/Microsoft/...gurationManager:String) [], CimException
+ FullyQualifiedErrorId : HRESULT 0x80041033
+ PSComputerName : localhost
Without setting this: Set-WSManInstance -ResourceURI winrm/config -ValueSet @{MaxEnvelopeSizekb = "1048576"}
an import stops almost immediately.
Microsoft 365 DSC Version
1.24.731.1
Which workloads are affected
Exchange Online
The DSC configuration
Verbose logs showing the problem
Environment Information + PowerShell Version
The text was updated successfully, but these errors were encountered: