-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[gfx1030][FP16][ROCm5.2] test_lrn_test failure due to hipRTC issue #1674
Comments
@atamazov could you also take a look? |
@muralinr from these two tickets that it might be the test's problem, could you check on it? |
Hi @junliume I looked at these errors. These "test_lrn_test" and "test_handle_test" failures are related to HIPRTC error compilation issues. We should ask Artem or Paul to look at this issue. test_lrn_test ..........................................***Failed Error regular expression found in output. Regex=[FAILED]414.24 sec Test #27: test_handle_test .......................................***Failed 1.25 sec |
@junliume @atamazov
|
@muralinr it seems that we need to reopen one of the above mentioned tickets. It should be assigned to compiler I think. |
Thanks @shurale-nkn I think we need to fix this testing defect too! :) |
Taken from official documentation, shall we change the line to the follwing?
|
@junliume The real problem is the variability of the words used in our output. |
On the one hand, if the error is not critical and the program can perform a task by another method, it is not necessary to terminate it. But we should be able to track it in our CI, where in that case we should always get Error message. So the mechanism is similar to the current behavior of the |
[Informative] Current convention is that the failing test should print |
@junliume Build errors in |
I am looking into |
In this case, the CI should check the output data for an error and check the correctness of the handling. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
The previous comment is partially incorrect. @junliume In which docker container have you found the issue with test_handle_test? |
[Informative] I've used 5.2 container with MIOpen installed. Rebuilding the library didn't matter because tests still use the installed library. The tests were built with BUILD_DEV=On and warning test was enabled. But the installed library was built with BUILD_DEV=Off and unable to issue build warnings. Therefore the test has failed, but the failures were actually false positives, and I got the impression that there are compiler problems in 5.2. |
@junliume Thank you. I have gfx1030 on hand, but the base driver is 4.3.0, so may I need to upgrade the node. Please update me with your results with a docker. |
[ENV]: [Observations]:
|
@atamazov it looks all related to |
[Informative] Not reproducible with 5.2 release docker and 4.3.0 base driver. |
@atamazov let's put this issue on hold then. I just checked my last run with this docker and cannot reproduce the issues either. We are trying to update base OS ROCm to 5.2 and docekr ROCm to the same, let's see if gfx1030-fp16 stability has changed. Thanks! |
Results of running
|
More info on the guilty GCC extension: As an extension, the preprocessor accepts linemarkers in non-assembler input files (see https://debrouxl.github.io/gcc4ti/cpp.html#SEC43 for details). |
@junliume Please add I see some issues in pooling kernels (build errors due to unused variables). I am going to continue fixing the issues until DEV builds pass |
@junliume In DEV builds, 640 /root/MIOpen/build/bin/test_lrn_test --half --input 1, 16, 4096, 4096 --N 5 --alpha 1 --beta 1 --K 1 --mode Within_Channel
641 FAILED: /root/MIOpen/src/hip/handlehip.cpp:85: Memory not available to allocate buffer: 536870912
...
431 1: /root/MIOpen/build/bin/test_activation --half --input 1, 16, 4096, 4096 --alpha 0.95 --beta 2.3 --gamma 3.4 --mode CLIPPEDRELU --packed 0
432 1: FAILED: /root/MIOpen/src/hip/handlehip.cpp:85: Memory not available to allocate buffer: 536870912 |
This is SWDEV-345683. |
@shurale-nkn What is this SWDEV-345683 about? Some quirks of hipMemGetInfo or hipFree or...? @junliume I can prepare the followup PR but I need 5.3.0-27 where the problem is expected to be fixed. I recommend removing |
RuntimeError: HIP out of memory while running pytorch resnext101 for FP32 |
@junliume Is this fixed with latest ROCm 6.0.2 (HIP 6.0.32831)? Thanks! |
@ppanchad-amd I think this may be closed. |
[How To Reproduce]:
test_lrn_test (Failed)
[More Details]:
The text was updated successfully, but these errors were encountered: