Jump to content

Welcome to ExtremeHW

Welcome to ExtremeHW, register to take part in our community, don't worry this is a simple FREE process that requires minimal information for you to signup.

 

Registered users can: 

  • Start new topics and reply to others.
  • Show off your PC using our Rig Creator feature.
  • Subscribe to topics and forums to get updates.
  • Get your own profile page to customize.
  • Send personal messages to other members.
  • Take advantage of site exclusive features.
  • Upgrade to Premium to unlock additional sites features.
IGNORED

The Foldaholics


tictoc

Recommended Posts

1 hour ago, Avacado said:

Ok BOIS! Up and running.

 

k4000.thumb.png.6c6ec04a739892e096ba6a049752181e.png

 

:cheers: :wheee:

 

OP is updated with your proper card. 

My 980 has been cruising along at about 900k ppd.  I'm not quite stable with the CUDA WUs at the clocks that I used to run back in the day for TC.  I might be able to eek out a few for MHz if I put it back under water.

Edited by tictoc
Link to comment
Share on other sites

Settled in at 150 core and -300 mem. I can get more, but she was locking up the system from 175-190. I need to catch up and reclaim some lost points and i'm shooting for stability. At this rate if the COVID moonshot WU's keep coming, I can snag close to 10 million points per day. She is holding at or around 160k ppd x 60 multi = 9.6 million PPD. Not a bad OC @ 19%, would like to find a way to stay at 20%. 

 

Heat doesn't seem to be the culprit, rock solid at 60c, but the power draw might be an issue, she's only pulling 60-70%. When I tried a 175 clock, it was hitting mid 80's power draw but ended up freezing the system. For whatever reason, I am using Integrated graphics and the K4000 is only running at PCI-e 2.0 8x. @tictocdo you think running the main graphics off the K4000 and trying to get full 16x would help here? I am concerned that running the monitor might inhibit my OC. 

 

There is absolutely no way to catch up to BWG, with his card getting the PPD it is and a 30 multi, he's pulling in 17 million PPD. Seems like a bit of ringer. 

Edited by Avacado
Link to comment
Share on other sites

My ppd has taken a big hit over the last few days.  My monitoring/alert server is down since I'm working on getting my network put back together and transitioning everything over to my new all-in-one home server.

 

I had been periodically looking at HFM, and everything seemed fine at 875k to 1M ppd.  Just looked at my EOC stats and the last few days ppd is way down. 

 

Checking my HFM config I had it set to report ppd based on the last 3 frames.  I must have changed it while I was testing some other cards and never switched it back. 🤦‍♂️

 

Looking through my FAH log, it appears that my rock solid OC on the 980 is not so rock solid.  I dialed the clocks back a bit, and hopefully it will stay stable.

  • Thanks 1
Link to comment
Share on other sites

Folding@Home Staff
1.3k 660
1 hour ago, tictoc said:

My ppd has taken a big hit over the last few days.  My monitoring/alert server is down since I'm working on getting my network put back together and transitioning everything over to my new all-in-one home server.

 

I had been periodically looking at HFM, and everything seemed fine at 875k to 1M ppd.  Just looked at my EOC stats and the last few days ppd is way down. 

 

Checking my HFM config I had it set to report ppd based on the last 3 frames.  I must have changed it while I was testing some other cards and never switched it back. 🤦‍♂️

 

Looking through my FAH log, it appears that my rock solid OC on the 980 is not so rock solid.  I dialed the clocks back a bit, and hopefully it will stay stable.

 

I'm not sure how yours is setup, but the heatsink power tim pad on my 960 all but disintegrated over the last 3 months of testing, and it acted up like that at month end. 

 

Now it clocks even higher after replacement.

3.50

Owned

 Share

CPU: 5600x
GPU: EVGA RTX 3090 FTW3 Ultra
GPU 2: EVGA RTX 3080ti FTW3 Ultra
GPU 3: EVGA RTX 3080ti XC3 Hybrid
GPU 4: EVGA RTX 3070ti FTW3 Ultra
GPU 5: MSI RTX 3070 Gaming X Trio
GPU 6: Asus RTX 2080ti ROG STRIX
GPU 7: EVGA RTX 3080ti FTW3 Ultra
Full Rig Info
Link to comment
Share on other sites

Folding@Home Staff
725 370
31 minutes ago, BWG said:

 

I'm not sure how yours is setup, but the heatsink power tim pad on my 960 all but disintegrated over the last 3 months of testing, and it acted up like that at month end. 

 

Now it clocks even higher after replacement.

 

You guys don't check thermal paste once a month?

Link to comment
Share on other sites

12 minutes ago, axipher said:

 

You guys don't check thermal paste once a month?

Hell no. Is there going to be a day each month for downtime for the card to preform maintenance? 

Edited by Avacado
  • Thanks 1
Link to comment
Share on other sites

20 minutes ago, axipher said:

 

You guys don't check thermal paste once a month?

 

8 minutes ago, Avacado said:

Hell no. Is there going to be a day each month for downtime for the card to preform maintenance? 

 

You just need to speed run your thermal paste checks.  It can be a comp within a comp, to see who can swap out pads and paste the quickest. 🙂

  • Thanks 2
Link to comment
Share on other sites

Folding@Home Staff
256 376
12 minutes ago, Avacado said:

Hell no. Is there going to be a day each month for downtime for the card to preform maintenance? 

 

That's actually something I was going to bring up with the team captains.  Possibly switching to a 4 week comp with a few days down time before the next comp starts up (e.g. comp runs from 1-28 each month).  It would give every one time to perform maintenance outside the comp and it might be helpful to @BWG when preparing for the next month's comp.  I'm not sure what the other captains will think about the idea, but I'll bring it up.

  • Thanks 1
Link to comment
Share on other sites

Folding@Home Staff
1.3k 660
1 hour ago, firedfly said:

 

That's actually something I was going to bring up with the team captains.  Possibly switching to a 4 week comp with a few days down time before the next comp starts up (e.g. comp runs from 1-28 each month).  It would give every one time to perform maintenance outside the comp and it might be helpful to @BWG when preparing for the next month's comp.  I'm not sure what the other captains will think about the idea, but I'll bring it up.

 

We can talk about that. Lets get the dynamics of it discussed with @zodac. It's honestly what I wanted to do, but we've always ran all month in the past.

  • Thanks 1

3.50

Owned

 Share

CPU: 5600x
GPU: EVGA RTX 3090 FTW3 Ultra
GPU 2: EVGA RTX 3080ti FTW3 Ultra
GPU 3: EVGA RTX 3080ti XC3 Hybrid
GPU 4: EVGA RTX 3070ti FTW3 Ultra
GPU 5: MSI RTX 3070 Gaming X Trio
GPU 6: Asus RTX 2080ti ROG STRIX
GPU 7: EVGA RTX 3080ti FTW3 Ultra
Full Rig Info
Link to comment
Share on other sites

9 hours ago, axipher said:

Yeah, I think I had also suggested the idea of only counting the 28 highest days in a month for each folder, so that give 0-3 days of downtime depending on the month...

 

I think this would be simple enough to implement. Currently, to calculate a team's points, I get a list of all points for a user and add them all up. Should be trivial to instead sort that list then truncate it to the top 28 (or however many days we choose). 

 

Would probably mess up the daily team stats, since I don't think it could reconcile the best 28 days for users across different days (the total stats are calculated on each update, but the daily/hourly stats are calculated and stored each day).

 

So if you're OK with daily/hourly user and team stats showing all points, and the total stats only showing the top 28 days, I think it could be done quite easily. 

  • Thanks 2
Link to comment
Share on other sites

Card was doing so good, no idea what happened. Came down stairs and the PPD dropped to half. Realized the card was running very slow core and had switched over to OpenCL and CUDA failed to initiate. After re-installing drivers and FaH, nothing seems to fix the issue. Good thing I have a card that should be here tomorrow. 

 

23:30:46:WU00:FS02:0x22:Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)

 

Any ideas?

Edited by Avacado
Link to comment
Share on other sites

Just picking the 28 most productive days of each month seems practical and reasonable and I'm not opposed to that at all, but I also like the idea of just starting on the 3rd or 4th of each month or something and give everyone the same downtime opportunity. Makes things feel more... definitive? As opposed to having the start and finish at the same time. I dunno, it sounds better in my head but when I put it into words it's not quite the same.

  • Thanks 1

3685.29

Owned

 Share

CPU: [AMD] Ryzen 9 3900X
CPU COOLER: [Cooler Master] MasterLiquid ML360R
MOTHERBOARD: [Asus] ROG Crosshair VIII Hero Wifi
RAM: [G.Skill] Trident Z 4x8 GB DDR4 3600
SSD/NVME: [Western Digital] Black 512 GB NVMe SSD
SSD/NVME 2: [Team] 4x 1 TB 2.5" SSD
HDD: [Western Digital] Black Series 3 TB HDD
GPU: [EVGA] RTX 2080 Ti FTW3 Ultra Gaming
Full Rig Info

3647.79

Owned

 Share

CPU: [AMD] Ryzen 7 3700X
CPU COOLER: [Cooler Master] MasterLiquid ML240L
MOTHERBOARD: [MSI] MAG B550M Mortar Wifi
RAM: [G.Skill] Trident Z 4x8 GB DDR4 3200
SSD/NVME: [Crucial] P2 500 GB NVMe SSD
HDD: [Western Digital] Black Series 2 TB HDD
HDD 2: [Western Digital] Caviar Green 3 TB HDD
GPU: [EVGA] RTX 2080 Ti FTW3 Ultra Hybrid Gaming
Full Rig Info
Link to comment
Share on other sites

13 hours ago, Avacado said:

Card was doing so good, no idea what happened. Came down stairs and the PPD dropped to half. Realized the card was running very slow core and had switched over to OpenCL and CUDA failed to initiate. After re-installing drivers and FaH, nothing seems to fix the issue. Good thing I have a card that should be here tomorrow. 

 

23:30:46:WU00:FS02:0x22:Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)

 

Any ideas?

 

What version of CUDA do you have installed, and what WU failed to build and reverted to OpenCL?

Link to comment
Share on other sites

I seem to be folding into a black hole.  I have completed and uploaded many WUs over the last few days, but my stats have stayed the same.  This is not an issue with the ETF stats, my stats are not updating anywhere.  I've never had this issue before.

 

I was running an older version of the client, so I just updated to 7.6.21.  Downloading my first WU with this client, so we'll see if this fixes the issue.

Edited by tictoc
  • Thanks 1
Link to comment
Share on other sites

20 minutes ago, tictoc said:

 

What version of CUDA do you have installed, and what WU failed to build and reverted to OpenCL?

The K4000 runs Cuda 3.0. All of the wu's are running OpenCl. It was the same moonshot units just fine until last night. I even did a fresh windows install, no dice. I have no idea what happened. I didn't modify anything, just came downstairs and it was gimped. Took my PPD down from 170k to 60k just like that.

 

I tried 3 different drivers, all clean installs with the same outcome. The card never draws above 70%, I don't suppose I killed it? Maybe reflash VBios?

 

The 750Ti arrives today, but I don't want to waste my card change. The surprise card comes tomorrow and I want to see if I can get that Kraken AIO installed on her. Might just have to bite the bullet and not have any real points until Wednesday when I don't have my children.

Edited by Avacado
Link to comment
Share on other sites

Folding@Home Staff
256 376
7 minutes ago, Avacado said:

It was the same moonshot units just fine until last night.

 

They updated core_22 to 0.0.16 sometime yesterday.  It caused @BWG and my GTX 960s to start failing WUs as the new core requires a newer driver/cuda version.  My 960 was able to fold the first WU on the new core update successfully, and then started failing every WU after that that used core_22.  After a reboot, the 960 is folding again, but I expect it to fail in a few minutes after it completes the first WU after the reboot.  If that happens, I'll be updating to a newer driver.

 

I expect you are having a similar issue.  Are there any newer drivers available?

 

Relevant info from Totow on discord:
 

Quote

If you look closely, you'll find that these projects are now running core 22 v0.0.16 which is built with CUDA 11.x to support CUDA on RTX 3000 hence the requirements of 456.38 drivers ... previous version (0.0.15) was built using CUDA 10 which required an older driver. Previous versions were built with CUDA 9 

 

Edited by firedfly
Link to comment
Share on other sites

10 minutes ago, Avacado said:

The K4000 runs Cuda 3.0. All of the wu's are running OpenCl. It was the same moonshot units just fine until last night. I even did a fresh windows install, no dice. I have no idea what happened. I didn't modify anything, just came downstairs and it was gimped. Took my PPD down from 170k to 60k just like that.

 

It is Compute 3.0 capable, but do you know which version of CUDA is installed?  If it's the latest driver (R470 U3 - 471.68), then it should be CUDA 11.4, and that should work with the new core that was released yesterday.  

Link to comment
Share on other sites

25 minutes ago, tictoc said:

 

It is Compute 3.0 capable, but do you know which version of CUDA is installed?  If it's the latest driver (R470 U3 - 471.68), then it should be CUDA 11.4, and that should work with the new core that was released yesterday.  

I'll double check. There are 2 versions, the enterprise or beta I think. I will look again tonight, I had downloaded the beta one, will try the other and see if it helps. 

Link to comment
Share on other sites

9 hours ago, tictoc said:

I seem to be folding into a black hole.  I have completed and uploaded many WUs over the last few days, but my stats have stayed the same.  This is not an issue with the ETF stats, my stats are not updating anywhere.  I've never had this issue before.

 

I was running an older version of the client, so I just updated to 7.6.21.  Downloading my first WU with this client, so we'll see if this fixes the issue.

 

I seem to have come out the other side of whatever black hole I was folding into.  Earlier today I had a 27.6M (3.3M actual) update on the 980, or about 3+ days worth of folding on that GPU. :wheee:

 

Not sure what actually happened, since it doesn't look like anyone else had the same thing going on.  My logs showed zero upload errors, and other than no stats registering, everything looked fine.

  • Thanks 1
Link to comment
Share on other sites

Folding@Home Staff
256 376
4 minutes ago, tictoc said:

 

I seem to have come out the other side of whatever black hole I was folding into.  Earlier today I had a 27.6M (3.3M actual) update on the 980, or about 3+ days worth of folding on that GPU. :wheee:

 

Not sure what actually happened, since it doesn't look like anyone else had the same thing going on.  My logs showed zero upload errors, and other than no stats registering, everything looked fine.

 

One of the work servers was not reporting completed work to the stats server.  They resolved that issue today and the backlog of pending updates came through.

 

Some details that were shared on the FAH discord:

Update: The good news is I figured out the issue - the stats server isn't able to SSH onto the WSes and get the updates!
The not-as-good news: I can't fix it just by myself. The login password for the WS ssh needs changing (it's been that long I guess) and the security settings won't allow ssh access until the password gets changed :smile:. But we are on it!

 

  • Thanks 1
Link to comment
Share on other sites

16 hours ago, firedfly said:

 

One of the work servers was not reporting completed work to the stats server.  They resolved that issue today and the backlog of pending updates came through.

 

Some details that were shared on the FAH discord:

Update: The good news is I figured out the issue - the stats server isn't able to SSH onto the WSes and get the updates!
The not-as-good news: I can't fix it just by myself. The login password for the WS ssh needs changing (it's been that long I guess) and the security settings won't allow ssh access until the password gets changed :smile:. But we are on it!

 

 

Thanks.  I am must have been pulling all my work from that one server, and I didin't notice points go missing for anyone else so assumed it was something on my end.

Edited by tictoc
  • Thanks 1
Link to comment
Share on other sites

Same issue as the K4000 @tictoc. No idea how to fix the issue of failed CUDA initialization. At least the 750Ti is working as expected. So I will be running that I suppose.

 

I really don't know whats happening, same issue on the 750Ti now? I can only assume that having Quadro drivers and GeForce drivers on the same machine is messing with it somehow. 

 

Errors.png.1c968152b3e6b67f96959268dd77ce3f.png

 

 

Edited by Avacado
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...

Important Information

This Website may place and access certain Cookies on your computer. ExtremeHW uses Cookies to improve your experience of using the Website and to improve our range of products and services. ExtremeHW has carefully chosen these Cookies and has taken steps to ensure that your privacy is protected and respected at all times. All Cookies used by this Website are used in accordance with current UK and EU Cookie Law. For more information please see our Privacy Policy