-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TREX: Can’t Stop Device #557
Comments
Please upload the full log file |
@trexminer so you are a support tech? |
@klokit03 I'm one of the miner developers |
I'm seeing this exact same error. Also ONLY on device with ID=2, GPU #2. |
I'm afraid I can't provide more info unless someone uploads the log file |
@trexminer Thank you for your response. I'm not sure if this is sufficient. If not I can try to download the full log file. Just to clarify, which log file? The miner, sys log file, or hive-agent log? The error code is all the way at the bottom. What's weird too is that when I get this error the hive watchdog doesn't work anymore, the miner takes my rig offline, and my other separate rig offline ( they are connected on the same ethernet switch). The only way to turn my rig back on is by physically turning off and on the rig through the pc power switch. When I restart the rig that generated the error, it automatically brings the other rig back online. My OC settings are 1200 core 1850 mem and 255 PL. I keep lowering the mem clock by 25 each time and still getting this error. I started with 2000 memory clock. The GPU is EVGA RTX 3080 XC3 Any way you can help? ---------------20210807 21:51:32 --------------- 20210807 21:51:34 ethash epoch: 432, diff: 10.00 G 20210807 21:52:26 .[32m[ OK ].[0m 757/757 - 592.87 MH/s, 28ms ... GPU #3 20210807 21:52:42 .[32m[ OK ].[0m 758/758 - 592.86 MH/s, 27ms ... GPU #2 20210807 21:53:22 .[32m[ OK ].[0m 761/761 - 592.88 MH/s, 19ms ... GPU #5 ---------------20210807 21:53:35 ---------------- 20210807 21:53:35 Main loop finished. Cleaning up resources... |
Thanks. That seems like an overlock issue. Try running with no OC at all on GPU#2 and see if it crashes again. If it doesn't then that's your root cause. |
20210804 05:50:27 TREX: Can't stop device [ID=2, GPU #2], cuda exception: CUDA_ERROR_LAUNCH_FAILED, try to reduce overclock to stabilize GPU state
I keep getting above error. I’ve reduced overclock just like it suggests but I still keep getting errors and having my rig go offline at random times. Brought down OC to 1850 (hiveos) and still crashing. When the rig crashes and goes offline it takes the other rig offline that’s connected to the same Ethernet switch. Please help
The text was updated successfully, but these errors were encountered: