Jump to content

Welcome to ExtremeHW

Welcome to ExtremeHW, register to take part in our community, don't worry this is a simple FREE process that requires minimal information for you to signup.

 

Registered users can: 

  • Start new topics and reply to others.
  • Show off your PC using our Rig Creator feature.
  • Subscribe to topics and forums to get updates.
  • Get your own profile page to customize.
  • Send personal messages to other members.
  • Take advantage of site exclusive features.
  • Upgrade to Premium to unlock additional sites features.
IGNORED

The Foldaholics


Recommended Posts

Folding@Home Staff - Team Lead
1.1k 1,085
17 hours ago, Avacado said:

More than happy to help you mod that 750. Let me know when you are free and we can link up over Discord video. Create a file of the original vBIOS via GPU-z and save it and send it to me. 

Maybe Saturday morning. I sent you 2 files on discord.

  • Thanks 1

Owned

 Share

CPU: Ryzen 7 5800X
MOTHERBOARD: B550 PG Velocita
RAM: 4x8GB Ballistix
GPU: RX 6900 XT
PSU: LEADEX V Platinum 1KW
CASE: QUBE 500
SSD/NVME: T-FORCE CARDEA A440 PRO
OPERATING SYSTEM: 11 Pro
Full Rig Info
Link to comment
Share on other sites

  • 2 weeks later...

980 crashed again.  I'm not sure if it's a hardware issue or an issue with the WUs.  The crash has only happened on a specific set of beta WUs.  Back up and running now.  I have reported the error, so we'll see if anyone else has hit the same problem.

Edited by tictoc
Link to comment
Share on other sites

12 minutes ago, tictoc said:

980 crashed again.  I'm not sure if it's a hardware issue or an issue with the WUs.  The crash has only happened on a specific set of beta WUs.  Back up and running now.  I have reported the error, so we'll see if anyone else has hit the same problem.

Remove the beta flag? What's the difference between normal and beta PPD?  Do you have an HFM log?

Link to comment
Share on other sites

I did remove the beta flag for now on the 980, but since I am fairly active testing beta projects on the beta team, I try to keep at least one GPU running beta tasks.  The error is a CUDA kernel error, but it is fairly random and after a restart I can complete the task that had errored out.

Link to comment
Share on other sites

1 hour ago, tictoc said:

I did remove the beta flag for now on the 980, but since I am fairly active testing beta projects on the beta team, I try to keep at least one GPU running beta tasks.  The error is a CUDA kernel error, but it is fairly random and after a restart I can complete the task that had errored out.

Hmm. When was the last time you updated drivers on it?

Link to comment
Share on other sites

I'm on 470.74. 470.82 was just released today and hasn't hit the Arch repos yet.

 

Nothing much in the release notes for the driver released today, but NVIDIA is not always great about being thorough on release notes. I'll try the newest driver as soon as it's available, which will probably be later today or tomorrow.

Link to comment
Share on other sites

  • BWG pinned this topic
  • 1 month later...

Last week I pulled the 980 out of the machine with my 2080S, and the client, in it's infinite wisdom, somehow managed to swap my passkey from the removed 980 to the 2080's slot.  I got it sorted now, but that's why I had no points on the 2080S over the last few days.  There's no way I'll be able to make up the ground and get back ahead of @Bastiaan_NL😠

 

 

Link to comment
Share on other sites

Folding@Home Staff
1.3k 661
1 hour ago, tictoc said:

Last week I pulled the 980 out of the machine with my 2080S, and the client, in it's infinite wisdom, somehow managed to swap my passkey from the removed 980 to the 2080's slot.  I got it sorted now, but that's why I had no points on the 2080S over the last few days.  There's no way I'll be able to make up the ground and get back ahead of @Bastiaan_NL😠

 

 

 

Did HFM capture your estimated points in it's log?

3.50

Owned

 Share

CPU: 5600x
GPU: EVGA RTX 3090 FTW3 Ultra
GPU 2: EVGA RTX 3080ti FTW3 Ultra
GPU 3: EVGA RTX 3080ti XC3 Hybrid
GPU 4: EVGA RTX 3070ti FTW3 Ultra
GPU 5: MSI RTX 3070 Gaming X Trio
GPU 6: Asus RTX 2080ti ROG STRIX
GPU 7: EVGA RTX 3080ti FTW3 Ultra
Full Rig Info
Link to comment
Share on other sites

Premium Platinum - Lifetime
1.3k 1,294
8 hours ago, tictoc said:

Last week I pulled the 980 out of the machine with my 2080S, and the client, in it's infinite wisdom, somehow managed to swap my passkey from the removed 980 to the 2080's slot.  I got it sorted now, but that's why I had no points on the 2080S over the last few days.  There's no way I'll be able to make up the ground and get back ahead of @Bastiaan_NL😠

 

 

I was wondering why there was such a difference, but I had no idea that you posted no points.. 

And I always check the client twice after changing something, not the first time I lost my settings..

Link to comment
Share on other sites

9 hours ago, BWG said:

 

Did HFM capture your estimated points in it's log?

 

No HFM, but I do have the log.  I'm not too worried about getting my points back for the ETF, I figure part of being in the competition is keeping on top of what's happening, especially since I am a team captain.  I'll just have to up the clocks a few notches to make sure I can chart a win next month. 😉

 

I knew that the client would bork my slots, but when I double checked it I didn't compare the config to my list of my passkeys.  I just saw that it correctly kept all of my options, and didn't think to double check that the passkey in the slot was the correct passkey.  

Link to comment
Share on other sites

Folding@Home Staff
1.3k 661
34 minutes ago, tictoc said:

 

No HFM, but I do have the log.  I'm not too worried about getting my points back for the ETF, I figure part of being in the competition is keeping on top of what's happening, especially since I am a team captain.  I'll just have to up the clocks a few notches to make sure I can chart a win next month. 😉

 

I knew that the client would bork my slots, but when I double checked it I didn't compare the config to my list of my passkeys.  I just saw that it correctly kept all of my options, and didn't think to double check that the passkey in the slot was the correct passkey.  

 

I can give you the missed points if you want to add them up. Up to you.

3.50

Owned

 Share

CPU: 5600x
GPU: EVGA RTX 3090 FTW3 Ultra
GPU 2: EVGA RTX 3080ti FTW3 Ultra
GPU 3: EVGA RTX 3080ti XC3 Hybrid
GPU 4: EVGA RTX 3070ti FTW3 Ultra
GPU 5: MSI RTX 3070 Gaming X Trio
GPU 6: Asus RTX 2080ti ROG STRIX
GPU 7: EVGA RTX 3080ti FTW3 Ultra
Full Rig Info
Link to comment
Share on other sites

12 minutes ago, BWG said:

 

I can give you the missed points if you want to add them up. Up to you.

 

Right on.  Logs attached for posterity, but no need to add the points (30.2M) to the official total.  We'll just use this as a teaching moment. lol

2080S-etf_log.txt 2080S-etf_points.txt

Link to comment
Share on other sites

Folding@Home Staff
730 374
2 hours ago, tictoc said:

The Radeon VII was down for a day, but it is back up and running now.  I also bumped the clocks up a notch or two on the 2080S, and it has been stable for the last few days. 🙂

 

Nice, this was a bad time for the month swap as I was swamped with work, so the 750 Ti will run another month at stock in Windows 10 as it has been since I started it up in October.  Should hopefully provide at least a nice stable baseline of points.

Link to comment
Share on other sites

So I swapped out the ROCm drivers on the VII for the old closed source amdgpu-pro drivers, after remembering how the performance the last time I tested it was much better on the old drivers.  The card might actually be somewhat competitive in the AMD category now. 😁  p18201 | ROCm 4.5 1.9M ppd | amdgpu-pro 20.30 3M ppd

Link to comment
Share on other sites

Folding@Home Staff
1.3k 661
On 04/12/2021 at 17:35, tictoc said:

So I swapped out the ROCm drivers on the VII for the old closed source amdgpu-pro drivers, after remembering how the performance the last time I tested it was much better on the old drivers.  The card might actually be somewhat competitive in the AMD category now. 😁  p18201 | ROCm 4.5 1.9M ppd | amdgpu-pro 20.30 3M ppd

 

I made a driver change before the last FAT on my 6700XT and man the PPD went from 1.8 mill to 3.6 mill

3.50

Owned

 Share

CPU: 5600x
GPU: EVGA RTX 3090 FTW3 Ultra
GPU 2: EVGA RTX 3080ti FTW3 Ultra
GPU 3: EVGA RTX 3080ti XC3 Hybrid
GPU 4: EVGA RTX 3070ti FTW3 Ultra
GPU 5: MSI RTX 3070 Gaming X Trio
GPU 6: Asus RTX 2080ti ROG STRIX
GPU 7: EVGA RTX 3080ti FTW3 Ultra
Full Rig Info
Link to comment
Share on other sites

Folding@Home Staff
730 374

I had to restart because of a UPS battery fault so took that chance to quickly install Precision X1 and give the 750 Ti a quick 100 MHz OC and it went form 160k PPD in the FAH client to 185k PPD, so we will see if that actually turns in a noticeable bump in points once it levels off after a couple days

Link to comment
Share on other sites

Folding@Home Staff
1.3k 661
2 hours ago, axipher said:

I had to restart because of a UPS battery fault so took that chance to quickly install Precision X1 and give the 750 Ti a quick 100 MHz OC and it went form 160k PPD in the FAH client to 185k PPD, so we will see if that actually turns in a noticeable bump in points once it levels off after a couple days

 

a 100 MHz OC! 😮

 

 

3.50

Owned

 Share

CPU: 5600x
GPU: EVGA RTX 3090 FTW3 Ultra
GPU 2: EVGA RTX 3080ti FTW3 Ultra
GPU 3: EVGA RTX 3080ti XC3 Hybrid
GPU 4: EVGA RTX 3070ti FTW3 Ultra
GPU 5: MSI RTX 3070 Gaming X Trio
GPU 6: Asus RTX 2080ti ROG STRIX
GPU 7: EVGA RTX 3080ti FTW3 Ultra
Full Rig Info
Link to comment
Share on other sites

On 07/12/2021 at 19:51, BWG said:

 

I made a driver change before the last FAT on my 6700XT and man the PPD went from 1.8 mill to 3.6 mill

Only down side to the old driver I'm running in Linux is that it will randomly crash and not recover, and the only way to get the GPUs back online is a full system reboot. That led to a bit of downtime a few days ago, because I didn't notice that the GPUs had halted until I got home and noticed how cold it was in my office.

Link to comment
Share on other sites

24 minutes ago, tictoc said:

Only down side to the old driver I'm running in Linux is that it will randomly crash and not recover, and the only way to get the GPUs back online is a full system reboot. That led to a bit of downtime a few days ago, because I didn't notice that the GPUs had halted until I got home and noticed how cold it was in my office.

 

Sounds like you need to get an Arduino with a 120 VAC relay and a temperature sensor, or just a relay to control the motherboard power switch input or rest input.

Link to comment
Share on other sites

I have the pieces and parts to add a pi-kvm to the main workstation, but like my other current projects, it is just waiting until work/life slows down a bit.

 

If I had waited until I was done with the new home server to offline my other machines, this wouldn't have been an issue since I had a monitoring and alerting stack in place, along with remote ssh access via my home VPN. I hope to get all that sorted out and everything back online in the next month.

Link to comment
Share on other sites

I had another 7 hours of nothing on the Radeon VII today. 🤬

 

I'm going to see if there is an older ROCm driver that has ppd at least in the neighborhood of this old amdgpu-pro driver.  Even if the ppd is less, I know that every ROCm version after 4.0 was rock stable, so there won't be any downtime.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...

Important Information

This Website may place and access certain Cookies on your computer. ExtremeHW uses Cookies to improve your experience of using the Website and to improve our range of products and services. ExtremeHW has carefully chosen these Cookies and has taken steps to ensure that your privacy is protected and respected at all times. All Cookies used by this Website are used in accordance with current UK and EU Cookie Law. For more information please see our Privacy Policy