Crash at winrt::Windows::AI::MachineLearning::implementation::LearningModelSession::GetResults during inferencing #16988

venki-thiyag · 2023-08-03T10:47:34Z

Describe the issue

For sometime inferencing is fine, but suddenly there seems a crash and callstack seems to point to the following:

` Windows.AI.MachineLearning.dll!winrt::Windows::AI::MachineLearning::implementation::LearningModelSession::GetResults(struct winrt::com_ptr,struct winrt::hstring const &,unsigned __int64) Unknown

Windows.AI.MachineLearning.dll!winrt::Windows::AI::MachineLearning::implementation::LearningModelSession::EvaluateAsync$_ResumeCoro$1() Unknown
Windows.AI.MachineLearning.dll!winrt::impl::resume_background_callback(void *,void *) Unknown
ntdll.dll!TppSimplepExecuteCallback() Unknown
ntdll.dll!TppWorkerThread�() Unknown
KERNEL32.DLL!BaseThreadInitThunk() Unknown
ntdll.dll!RtlUserThreadStart�() Unknown
`

Exception message:
Unhandled exception at 0x00007FF8A75401CB (Windows.AI.MachineLearning.dll) in C_Usersdmytro.davydovAppDataRoamingRingCentralCrashpadreportsef626f60-0d2c-4fcc-add0-59552b00a164.dmp: 0xC0000005: Access violation reading location 0x0000000000000000.

To reproduce

It's simple inferencing, sorry unable to share model.

Attached crash dump.
winml_gpu_crash.zip

CPU description:
AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx

Urgency

No response

Platform

Windows

OS Version

Windows 10.0.22621

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.12.20220803.2.7048164

ONNX Runtime API

WinML

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

No response

fdwr · 2023-08-03T20:43:58Z

To confirm, does this only happen with running WinML with MachineLearning::LearningModelDeviceKind::DirectX (using the DirectML inside ONNX Runtime), and never CPU?
If you have any other machines with different GPU's, does it repro in them too?
Given the stack and crash point, winrt::Windows::AI::MachineLearning::implementation::LearningModelSession::GetResults, I'll pester someone familiar with WinML as a starting point...
🤔 Most threads I saw in the dump were idle, waiting for some condition, except the one you shared and thread # 0 below:

# Child-SP          RetAddr               Call Site
00 000000c7'7fbfdc78 00007ff8`9baec2ec     win32u!NtGdiDdDDIDestroyAllocation2+0x14
01 000000c7'7fbfdc80 00007ff8`9bafbe41     D3D12Core!CallAndLogImpl<long (__cdecl*)(_D3DKMT_DESTROYALLOCATION2 const * __ptr64),_D3DKMT_DESTROYALLOCATION2 * __ptr64>+0x28
02 000000c7'7fbfdce0 00007ff8`6effbc70     D3D12Core!NDXGI::CDevice::DeallocateCB_0022+0xb1
03 000000c7'7fbfdd30 0000022e`bdaf50b0     amdxc64+0x39bc70
04 000000c7'7fbfdd38 00000000`00000000     0x0000022e`bdaf50b0

venki-thiyag · 2023-08-04T05:51:47Z

@fdwr
#1 Only WinML GPU was attempted, WinML CPU was not tried
#2 So far we have seen in 1 machine only, also the same WinML GPU was working fine for some time, and then suddenly it crashed.
#3 ok?
#4 It's electron app, this thread crashed and app was blocked

fdwr · 2023-08-04T18:41:47Z

the same WinML GPU was working fine for some time, and then suddenly it crashed.

Do you mean within the same session (e.g. it runs fine for a minute, then crashes), or do you mean within a larger time range (e.g. it worked on that machine for a week, but then after a restart or automatic driver update, it started failing)?

github-actions bot added ep:DML issues related to the DirectML execution provider platform:mobile issues related to ONNX Runtime mobile; typically submitted using template platform:windows issues related to the Windows platform labels Aug 3, 2023

skottmckay removed the platform:mobile issues related to ONNX Runtime mobile; typically submitted using template label Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash at winrt::Windows::AI::MachineLearning::implementation::LearningModelSession::GetResults during inferencing #16988

Crash at winrt::Windows::AI::MachineLearning::implementation::LearningModelSession::GetResults during inferencing #16988

venki-thiyag commented Aug 3, 2023

fdwr commented Aug 3, 2023 •

edited

Loading

venki-thiyag commented Aug 4, 2023

fdwr commented Aug 4, 2023

Crash at winrt::Windows::AI::MachineLearning::implementation::LearningModelSession::GetResults during inferencing #16988

Crash at winrt::Windows::AI::MachineLearning::implementation::LearningModelSession::GetResults during inferencing #16988

Comments

venki-thiyag commented Aug 3, 2023

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

fdwr commented Aug 3, 2023 • edited Loading

venki-thiyag commented Aug 4, 2023

fdwr commented Aug 4, 2023

fdwr commented Aug 3, 2023 •

edited

Loading