Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RESOLVED in 0.26.1] trex 0.26.0 - GPU Stability Issues - Linux (Hive) #1296

Open
sublimeBradley opened this issue May 10, 2022 · 1 comment

Comments

@sublimeBradley
Copy link

sublimeBradley commented May 10, 2022

##################################################

Update 14 May 2022

Issue appears to be resolved in trex 0.26.1 release using HiveOS 0.6-217@220513 as well as @220511

##################################################

Original Ticket

Using trex miner v0.26.0 on a rig with two LHR cards and two non-LHR cards, stability issues are encountered which result in large impact to the functionality of the LHR cards' mining performance.

Initially, mining performance is greatly increased over v0.25.15 and is seemingly stable; typically between 15-20 minutes of uptime, there is seemingly a CUDA/driver crash on a per-GPU basis which then reduces a given card's hashrate by approximately 60% until the rig has been rebooted. The maximum observed time on this specific rig prior to any given crash is 25 minutes, with the second LHR card crashing at the 40 minute mark.

This issue has been observed on HiveOS, using Nvidia driver v510.60.02 and additionally on driver version 510.68.02.

Aside from reverting driver to 510.60.02 to no avail, additionally it was attempted to add some Nvidia-specific tweaks to initramfs options and rebuilding initramfs - also without success. Overclock settings have been greatly reduced for the LHR cards as well with no effect.

Attached to this ticket are:
screenfetch log
screenfetch.log

dmesg log after crash (noted as the bottom-most lines)
dmesg_aftercrash.log

nvidia-smi log prior to crash
nvidia_smi_ok.log

nvidia-smi log after crash (crash noted on GPU #1)
nvidia_smi_err.log

@wangzeming666
Copy link

wangzeming666 commented May 10, 2022

Same issues with me.
I have GPU crashed problem on 3080 cards and 3060 cards. Too many GPU crashed.
I low down 100 overclock on memery, but nothing changed.
The driver version is 512.15.
The error that t-rex miner showed me was 'Can't stop device xxx, cuda exception: CUDA_ERROR_UNKNOW.

@sublimeBradley sublimeBradley changed the title trex 0.26.0 - GPU Stability Issues - Linux (Hive) [RESOLVED in 0.26.1] trex 0.26.0 - GPU Stability Issues - Linux (Hive) May 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants