etf The Foldaholics

tictoc · August 9, 2021

1 hour ago, Avacado said:

Ok BOIS! Up and running.

OP is updated with your proper card.

My 980 has been cruising along at about 900k ppd. I'm not quite stable with the CUDA WUs at the clocks that I used to run back in the day for TC. I might be able to eek out a few for MHz if I put it back under water.

Edited August 9, 2021 by tictoc

Avacado · August 11, 2021

Settled in at 150 core and -300 mem. I can get more, but she was locking up the system from 175-190. I need to catch up and reclaim some lost points and i'm shooting for stability. At this rate if the COVID moonshot WU's keep coming, I can snag close to 10 million points per day. She is holding at or around 160k ppd x 60 multi = 9.6 million PPD. Not a bad OC @ 19%, would like to find a way to stay at 20%.

Heat doesn't seem to be the culprit, rock solid at 60c, but the power draw might be an issue, she's only pulling 60-70%. When I tried a 175 clock, it was hitting mid 80's power draw but ended up freezing the system. For whatever reason, I am using Integrated graphics and the K4000 is only running at PCI-e 2.0 8x. @tictocdo you think running the main graphics off the K4000 and trying to get full 16x would help here? I am concerned that running the monitor might inhibit my OC.

There is absolutely no way to catch up to BWG, with his card getting the PPD it is and a 30 multi, he's pulling in 17 million PPD. Seems like a bit of ringer.

Edited August 11, 2021 by Avacado

tictoc · August 11, 2021

With that GPU PCI-e 2.0 x8 shouldn't be any sort of a bottleneck. I think the GTX 680 only saw a 1-2% performance reduction going from PCI-e 3.0 x16 to PCI-e 2.0 x8. The K4000 is basically a cut down GTX 660 so you should be good to go.

tictoc · August 14, 2021

My ppd has taken a big hit over the last few days. My monitoring/alert server is down since I'm working on getting my network put back together and transitioning everything over to my new all-in-one home server.

I had been periodically looking at HFM, and everything seemed fine at 875k to 1M ppd. Just looked at my EOC stats and the last few days ppd is way down.

Checking my HFM config I had it set to report ppd based on the last 3 frames. I must have changed it while I was testing some other cards and never switched it back.

Looking through my FAH log, it appears that my rock solid OC on the 980 is not so rock solid. I dialed the clocks back a bit, and hopefully it will stay stable.

BWG · August 14, 2021

1 hour ago, tictoc said:

My ppd has taken a big hit over the last few days. My monitoring/alert server is down since I'm working on getting my network put back together and transitioning everything over to my new all-in-one home server.

I had been periodically looking at HFM, and everything seemed fine at 875k to 1M ppd. Just looked at my EOC stats and the last few days ppd is way down.

Checking my HFM config I had it set to report ppd based on the last 3 frames. I must have changed it while I was testing some other cards and never switched it back.

Looking through my FAH log, it appears that my rock solid OC on the 980 is not so rock solid. I dialed the clocks back a bit, and hopefully it will stay stable.

I'm not sure how yours is setup, but the heatsink power tim pad on my 960 all but disintegrated over the last 3 months of testing, and it acted up like that at month end.

Now it clocks even higher after replacement.

axipher · August 14, 2021

31 minutes ago, BWG said:

I'm not sure how yours is setup, but the heatsink power tim pad on my 960 all but disintegrated over the last 3 months of testing, and it acted up like that at month end.

Now it clocks even higher after replacement.

You guys don't check thermal paste once a month?

Avacado · August 14, 2021

12 minutes ago, axipher said:

You guys don't check thermal paste once a month?

Hell no. Is there going to be a day each month for downtime for the card to preform maintenance?

Edited August 14, 2021 by Avacado

tictoc · August 14, 2021

20 minutes ago, axipher said:

You guys don't check thermal paste once a month?

8 minutes ago, Avacado said:

Hell no. Is there going to be a day each month for downtime for the card to preform maintenance?

You just need to speed run your thermal paste checks. It can be a comp within a comp, to see who can swap out pads and paste the quickest.

firedfly · August 14, 2021

12 minutes ago, Avacado said:

Hell no. Is there going to be a day each month for downtime for the card to preform maintenance?

That's actually something I was going to bring up with the team captains. Possibly switching to a 4 week comp with a few days down time before the next comp starts up (e.g. comp runs from 1-28 each month). It would give every one time to perform maintenance outside the comp and it might be helpful to @BWG when preparing for the next month's comp. I'm not sure what the other captains will think about the idea, but I'll bring it up.

BWG · August 14, 2021

1 hour ago, firedfly said:

That's actually something I was going to bring up with the team captains. Possibly switching to a 4 week comp with a few days down time before the next comp starts up (e.g. comp runs from 1-28 each month). It would give every one time to perform maintenance outside the comp and it might be helpful to @BWG when preparing for the next month's comp. I'm not sure what the other captains will think about the idea, but I'll bring it up.

We can talk about that. Lets get the dynamics of it discussed with @zodac. It's honestly what I wanted to do, but we've always ran all month in the past.

axipher · August 14, 2021

Yeah, I think I had also suggested the idea of only counting the 28 highest days in a month for each folder, so that give 0-3 days of downtime depending on the month...

zodac · August 15, 2021

9 hours ago, axipher said:

Yeah, I think I had also suggested the idea of only counting the 28 highest days in a month for each folder, so that give 0-3 days of downtime depending on the month...

I think this would be simple enough to implement. Currently, to calculate a team's points, I get a list of all points for a user and add them all up. Should be trivial to instead sort that list then truncate it to the top 28 (or however many days we choose).

Would probably mess up the daily team stats, since I don't think it could reconcile the best 28 days for users across different days (the total stats are calculated on each update, but the daily/hourly stats are calculated and stored each day).

So if you're OK with daily/hourly user and team stats showing all points, and the total stats only showing the top 28 days, I think it could be done quite easily.

Avacado · August 15, 2021

Card was doing so good, no idea what happened. Came down stairs and the PPD dropped to half. Realized the card was running very slow core and had switched over to OpenCL and CUDA failed to initiate. After re-installing drivers and FaH, nothing seems to fix the issue. Good thing I have a card that should be here tomorrow.

23:30:46:WU00:FS02:0x22:Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)

Any ideas?

Edited August 15, 2021 by Avacado

Supercrumpet · August 16, 2021

Just picking the 28 most productive days of each month seems practical and reasonable and I'm not opposed to that at all, but I also like the idea of just starting on the 3rd or 4th of each month or something and give everyone the same downtime opportunity. Makes things feel more... definitive? As opposed to having the start and finish at the same time. I dunno, it sounds better in my head but when I put it into words it's not quite the same.

tictoc · August 16, 2021

13 hours ago, Avacado said:

Card was doing so good, no idea what happened. Came down stairs and the PPD dropped to half. Realized the card was running very slow core and had switched over to OpenCL and CUDA failed to initiate. After re-installing drivers and FaH, nothing seems to fix the issue. Good thing I have a card that should be here tomorrow.

23:30:46:WU00:FS02:0x22:Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)

Any ideas?

What version of CUDA do you have installed, and what WU failed to build and reverted to OpenCL?

tictoc · August 16, 2021

I seem to be folding into a black hole. I have completed and uploaded many WUs over the last few days, but my stats have stayed the same. This is not an issue with the ETF stats, my stats are not updating anywhere. I've never had this issue before.

I was running an older version of the client, so I just updated to 7.6.21. Downloading my first WU with this client, so we'll see if this fixes the issue.

Edited August 16, 2021 by tictoc

Avacado · August 16, 2021

20 minutes ago, tictoc said:

What version of CUDA do you have installed, and what WU failed to build and reverted to OpenCL?

The K4000 runs Cuda 3.0. All of the wu's are running OpenCl. It was the same moonshot units just fine until last night. I even did a fresh windows install, no dice. I have no idea what happened. I didn't modify anything, just came downstairs and it was gimped. Took my PPD down from 170k to 60k just like that.

I tried 3 different drivers, all clean installs with the same outcome. The card never draws above 70%, I don't suppose I killed it? Maybe reflash VBios?

The 750Ti arrives today, but I don't want to waste my card change. The surprise card comes tomorrow and I want to see if I can get that Kraken AIO installed on her. Might just have to bite the bullet and not have any real points until Wednesday when I don't have my children.

Edited August 16, 2021 by Avacado

firedfly · August 16, 2021

7 minutes ago, Avacado said:

It was the same moonshot units just fine until last night.

They updated core_22 to 0.0.16 sometime yesterday. It caused @BWG and my GTX 960s to start failing WUs as the new core requires a newer driver/cuda version. My 960 was able to fold the first WU on the new core update successfully, and then started failing every WU after that that used core_22. After a reboot, the 960 is folding again, but I expect it to fail in a few minutes after it completes the first WU after the reboot. If that happens, I'll be updating to a newer driver.

I expect you are having a similar issue. Are there any newer drivers available?

Relevant info from Totow on discord:

Quote

If you look closely, you'll find that these projects are now running core 22 v0.0.16 which is built with CUDA 11.x to support CUDA on RTX 3000 hence the requirements of 456.38 drivers ... previous version (0.0.15) was built using CUDA 10 which required an older driver. Previous versions were built with CUDA 9

Edited August 16, 2021 by firedfly

tictoc · August 16, 2021

10 minutes ago, Avacado said:

The K4000 runs Cuda 3.0. All of the wu's are running OpenCl. It was the same moonshot units just fine until last night. I even did a fresh windows install, no dice. I have no idea what happened. I didn't modify anything, just came downstairs and it was gimped. Took my PPD down from 170k to 60k just like that.

It is Compute 3.0 capable, but do you know which version of CUDA is installed? If it's the latest driver (R470 U3 - 471.68), then it should be CUDA 11.4, and that should work with the new core that was released yesterday.

Avacado · August 16, 2021

25 minutes ago, tictoc said:

It is Compute 3.0 capable, but do you know which version of CUDA is installed? If it's the latest driver (R470 U3 - 471.68), then it should be CUDA 11.4, and that should work with the new core that was released yesterday.

I'll double check. There are 2 versions, the enterprise or beta I think. I will look again tonight, I had downloaded the beta one, will try the other and see if it helps.

tictoc · August 16, 2021

9 hours ago, tictoc said:

I seem to be folding into a black hole. I have completed and uploaded many WUs over the last few days, but my stats have stayed the same. This is not an issue with the ETF stats, my stats are not updating anywhere. I've never had this issue before.

I was running an older version of the client, so I just updated to 7.6.21. Downloading my first WU with this client, so we'll see if this fixes the issue.

I seem to have come out the other side of whatever black hole I was folding into. Earlier today I had a 27.6M (3.3M actual) update on the 980, or about 3+ days worth of folding on that GPU.

Not sure what actually happened, since it doesn't look like anyone else had the same thing going on. My logs showed zero upload errors, and other than no stats registering, everything looked fine.

firedfly · August 16, 2021

4 minutes ago, tictoc said:

I seem to have come out the other side of whatever black hole I was folding into. Earlier today I had a 27.6M (3.3M actual) update on the 980, or about 3+ days worth of folding on that GPU.

Not sure what actually happened, since it doesn't look like anyone else had the same thing going on. My logs showed zero upload errors, and other than no stats registering, everything looked fine.

One of the work servers was not reporting completed work to the stats server. They resolved that issue today and the backlog of pending updates came through.

Some details that were shared on the FAH discord:

Update: The good news is I figured out the issue - the stats server isn't able to SSH onto the WSes and get the updates!
The not-as-good news: I can't fix it just by myself. The login password for the WS ssh needs changing (it's been that long I guess) and the security settings won't allow ssh access until the password gets changed :smile:. But we are on it!

Avacado · August 17, 2021

Let the default testing begin.... Muhahahaha.

Hope to get a facelift after. TIM/Pads and addition of Kraken AIO, it looks like it will fit.

tictoc · August 17, 2021

16 hours ago, firedfly said:
One of the work servers was not reporting completed work to the stats server. They resolved that issue today and the backlog of pending updates came through.

Some details that were shared on the FAH discord:
Update: The good news is I figured out the issue - the stats server isn't able to SSH onto the WSes and get the updates!
The not-as-good news: I can't fix it just by myself. The login password for the WS ssh needs changing (it's been that long I guess) and the security settings won't allow ssh access until the password gets changed :smile:. But we are on it!

Thanks. I am must have been pulling all my work from that one server, and I didin't notice points go missing for anyone else so assumed it was something on my end.

Edited August 17, 2021 by tictoc

Avacado · August 17, 2021

Same issue as the K4000 @tictoc. No idea how to fix the issue of failed CUDA initialization. At least the 750Ti is working as expected. So I will be running that I suppose.

I really don't know whats happening, same issue on the 750Ti now? I can only assume that having Quadro drivers and GeForce drivers on the same machine is messing with it somehow.

Edited August 17, 2021 by Avacado

Sign In

Welcome to ExtremeHW

etf The Foldaholics

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

Avacado

Avacado

Avacado

Posted Images

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Random Videos

Similar Content