Jump to content

Welcome to ExtremeHW

Welcome to ExtremeHW, register to take part in our community, don't worry this is a simple FREE process that requires minimal information for you to signup.

 

Registered users can: 

  • Start new topics and reply to others.
  • Show off your PC using our Rig Creator feature.
  • Subscribe to topics and forums to get updates.
  • Get your own profile page to customize.
  • Send personal messages to other members.
  • Take advantage of site exclusive features.
  • Upgrade to Premium to unlock additional sites features.
IGNORED

Threadripper Home Server


tictoc
43 Attachments

Recommended Posts

On 10/05/2021 at 22:54, AllenG said:

That is great news they finally got around to fixing it properly. At least they didn't drop your request to fix it like they did for mine, not sure if the tech i was working with just didn't want to deal with it or if my testing notes weren't up to par. Based on your posted tests, that looks good and like everything is finally functioning properly. Hopefully they actually release this bios now.

 

I guess it's time for me to open another ticket to try and get the version you have... the tech i was dealing with hasn't responded in two weeks now.

 

Rack setup is looking good! Good catch on the arrangement to keep cables fanning out proportionally... i see it backwards like that alot in real life actually, funny when you know it was that simple of a change.

 

 

@AllenG ASRock Rack has released an official BIOS with the additional PCIe bifurcation options.  I haven't tested it yet, to see how it compares to the beta BIOS I got from ASRock, but I assume it is basically the same.  

 

I will actually be moving forward with this build over the next few days, so if anyone is still interested, there will actually be some progress incoming. 🙂

  • Thanks 3
  • Respect 1
Link to comment
Share on other sites

The latest BIOS looks to be an identical copy to the beta BIOS that ASRock sent me back in May. 

I was actually able to save my settings before flashing to the new BIOS, and all of the settings were intact after the BIOS update. 🙂

Edited by tictoc
  • Thanks 1
Link to comment
Share on other sites

16 hours ago, tictoc said:

The latest BIOS looks to be an identical copy to the beta BIOS that ASRock sent me back in May. 

I was actually able to save my setting before flashing to the new BIOS, and all of the settings were intact after the BIOS update. 🙂

Loving how this build is progressing so far. Really like your 12U Cabinet cable management. My OCD is satisfied haha.

£3000

Owned

 Share

CPU: AMD Ryzen 9 7950X3D
MOTHERBOARD: MSI Meg Ace X670E
RAM: Corsair Dominator Titanium 64GB (6000MT/s)
GPU: EVGA 3090 FTW Ultra Gaming
SSD/NVME: Corsair MP700 Pro SE Gen 5 4TB
PSU: EVGA Supernova T2 1600Watt
CASE: be quiet Dark Base Pro 900 Rev 2
FANS: Noctua NF-A14 industrialPPC x 6
Full Rig Info

Owned

 Share

CPU: Intel Core i5 8500
RAM: 16GB (2x8GB) Kingston 2666Mhz
SSD/NVME: 256GB Samsung NVMe
NETWORK: HP 561T 10Gbe (Intel X540 T2)
MOTHERBOARD: Proprietry
GPU: Intel UHD Graphics 630
PSU: 90Watt
CASE: HP EliteDesk 800 G4 SFF
Full Rig Info

£3000

Owned

 Share

CPU: 2 x Xeon|E5-2696-V4 (44C/88T)
RAM: 128GB|16 x 8GB - DDR4 2400MHz (2Rx8)
MOTHERBOARD: HP Z840|Intel C612 Chipset
GPU: Nvidia Quadro P2200
HDD: 4x 16TB Toshiba MG08ACA16TE Enterprise
SSD/NVME: Intel 512GB 670p NVMe (Main OS)
SSD/NVME 2: 2x WD RED 1TB NVMe (VM's)
SSD/NVME 3: 2x Seagate FireCuda 1TB SSD's (Apps)
Full Rig Info
Link to comment
Share on other sites

Progress has been slow to zero over the last few months.  I did get the pumps installed, and plumbed up the res, pumps, and radiators last night.  Hopefully I'll drop the board in tonight and get the CPU and GPU in the loop for leak testing.

 

The down-sizing from my 42U rack to the 12U rack is mostly complete.  Once I have the server done I'll finish up the rack which will be housing the following gear: two 2U UPS, one 1U UPS, one 24 port patch panel, one 12 port patch panel, 24 port 1G switch, 8 port 10G switch, 1.5U router/firewall, and 1U PDU.  I'll probably throw up a thread on the router/firewall and the build-out for the rack once this machine is up and running.

Edited by tictoc
Link to comment
Share on other sites

29 minutes ago, tictoc said:

Progress has been slow to zero over the last few months.  I did get the pumps installed, and plumbed up the res, pumps, and radiators last night.  Hopefully I'll drop the board in tonight and get the CPU and GPU in the loop for leak testing.

 

The down-sizing from my 42U rack to the 12U rack is mostly complete.  Once I have the server done I'll finish up the rack which will be housing the following gear: two 2U UPS, one 1U UPS, one 24 port patch panel, one 12 port patch panel, 24 port 1G switch, 8 port 10G switch, 1.5U router/firewall, and 1U PDU.  I'll probably throw up a thread on the router/firewall and the build-out for the rack once this machine is up and running.

Darn, you went from 42U to 12U, You must have had a good old clear out !

£3000

Owned

 Share

CPU: AMD Ryzen 9 7950X3D
MOTHERBOARD: MSI Meg Ace X670E
RAM: Corsair Dominator Titanium 64GB (6000MT/s)
GPU: EVGA 3090 FTW Ultra Gaming
SSD/NVME: Corsair MP700 Pro SE Gen 5 4TB
PSU: EVGA Supernova T2 1600Watt
CASE: be quiet Dark Base Pro 900 Rev 2
FANS: Noctua NF-A14 industrialPPC x 6
Full Rig Info

Owned

 Share

CPU: Intel Core i5 8500
RAM: 16GB (2x8GB) Kingston 2666Mhz
SSD/NVME: 256GB Samsung NVMe
NETWORK: HP 561T 10Gbe (Intel X540 T2)
MOTHERBOARD: Proprietry
GPU: Intel UHD Graphics 630
PSU: 90Watt
CASE: HP EliteDesk 800 G4 SFF
Full Rig Info

£3000

Owned

 Share

CPU: 2 x Xeon|E5-2696-V4 (44C/88T)
RAM: 128GB|16 x 8GB - DDR4 2400MHz (2Rx8)
MOTHERBOARD: HP Z840|Intel C612 Chipset
GPU: Nvidia Quadro P2200
HDD: 4x 16TB Toshiba MG08ACA16TE Enterprise
SSD/NVME: Intel 512GB 670p NVMe (Main OS)
SSD/NVME 2: 2x WD RED 1TB NVMe (VM's)
SSD/NVME 3: 2x Seagate FireCuda 1TB SSD's (Apps)
Full Rig Info
Link to comment
Share on other sites

  • 4 weeks later...

No actual build progress, but I have had most of the software stack up and running for testing. 

 

This hit a bit of a stall as I ran into some networking performance regressions with newer kernels.  I had previously been running on the 5.10.xx LTS kernels.  There are some btrfs improvements in newer kernels that haven't been back-ported to the LTS kernel, so I decided to switch over to the stable kernel.  This led me down the rabbit hole of figuring out why my 10Gbps ports were maxing out at 1Gbps.

 

I have bisected the kernel back to the offending commit, and it is a bit of an oddball. It looks to probably be a Threadripper specific issue, due to some timer/clocksource changes.  Reverting this commit: https://lore.kernel.org/all/162437391793.395.1913539101416169640.tip-bot2@tip-bot2/  allows the 10Gbps ports to operate at full speed.  I'm playing around with it this evening, and I'll submit a bug report and/or a patch sometime tomorrow. 

 

I have a few 10Gbps NICs with the same controller as the onboard ports on the X399D8A-2T.  I tested those NICs on two other systems, and those systems did not have the same issue as my Threadripper X399 machine.  The regression occurs with the onboard 10Gbps ports and with the PCIe X550-T2 NIC on the Threadripper machine. 

  • Thanks 1
Link to comment
Share on other sites

2 hours ago, tictoc said:

No actual build progress, but I have had most of the software stack up and running for testing. 

 

This hit a bit of a stall as I ran into some networking performance regressions with newer kernels.  I had previously been running on the 5.10.xx LTS kernels.  There are some btrfs improvements in newer kernels that haven't been back-ported to the LTS kernel, so I decided to switch over to the stable kernel.  This led me down the rabbit hole of figuring out why my 10Gbps ports were maxing out at 1Gbps.

 

I have bisected the kernel back to the offending commit, and it is a bit of an oddball. It looks to probably be a Threadripper specific issue, due to some timer/clocksource changes.  Reverting this commit: https://lore.kernel.org/all/162437391793.395.1913539101416169640.tip-bot2@tip-bot2/  allows the 10Gbps ports to operate at full speed.  I'm playing around with it this evening, and I'll submit a bug report and/or a patch sometime tomorrow. 

 

I have a few 10Gbps NICs with the same controller as the onboard ports on the X399D8A-2T.  I tested those NICs on two other systems, and those systems did not have the same issue as my Threadripper X399 machine.  The regression occurs with the onboard 10Gbps ports and with the PCIe X550-T2 NIC on the Threadripper machine. 

 

I look forward to see what comes out of the LTS kernel 10 Gbps / X399 'rabbit hole' you described ! I just got the external 1 Gbps (symmetric up / down) installed and (mostly) working in my home office with fixed IP (work related), dynamic IP (the rest) as well as tv-over-internet.

Fr1GbsUU.jpg.e65c885fa3c0166ef76b693039804dfc.jpg

Down the line, symmetric 2.5 Gbps external network might become an option here as well, and I am looking at upgrading my switches and routers to at least 2.5 Gbps if not 10 Gbps. I also want to add a separate home network at 10 Gbps on either MSI X399 Creation or Asus X79 E-WS w/ multiport 10 Gbps add-in cards. For now, decent 10 Gbps value routers etc are still a rare find, though 10 Gbps add-in multi-port cards are a different matter. 

 

Switching gears but somewhat related, speaking of MSI X399 Creation or Asus X79 E-WS, what are your thoughts re. PCIe NVMe add-in cards ? The X399 Creation came with a PCIe 3.0 -based M.2 XPANDER-AERO (4x M.2,fan-cooled, extra PCIe 6 pin) and I am planning to run 4x2 TB NVMe in that in 2x raid 1, and 2x1 TB M.2 NVMe on the mobo in raid 1 as OS (likely Windows Server 2019 though not certain yet). Too complex a plan ? Also, it should be robust.

Owned

 Share

CPU: CPU: ><.......7950X3D - Aorus X670E Master - 48GB DDR5 7200 (8000) TridentZ SK Hynix - Giga-G-OC/Galax RTX 4090 670W - LG 48 OLED - 4TB NVMEs >< .......5950X - Asus CH 8 Dark Hero - 32GB CL13 DDR4 4000 - AMD R 6900XT 500W - Philips BDM40 4K VA - 2TB NVME & 3TB SSDs >> - <<.......4.4 TR 2950X - MSI X399 Creation - 32 GB CL 14 3866 - Asus RTX 3090 Strix OC/KPin 520W and 2x RTX 2080 Ti Gigabyte XTR WF WB 380W - LG 55 IPS HDR - 1TB NVME & 4TB SSDs
Full Rig Info
Link to comment
Share on other sites

For now I just reverted the commit with a patch, and I'm using that patch to build/run stable and mainline kernels for testing.  I also have a few Aquantia 10Gbps NICs, and I'll throw one of those in the server tomorrow to see if it suffers from the same issue as the Intel NICs.

 

Currently I am running 4x 1TB NVMe on an ASUS Hyper m.2 card, in md RAID-0 for a steamcache, squid, and other "victim" data.  The OS is running on 2x 1TB NVMe drives in a btrfs RAID1 array.  I'm not well versed in running something like that on Windows, so I'm not sure how complex and robust it would be.  I know that there were/are? quite a few issues with the fake raid (motherboard/BIOS) on AMD but I've never used it.      

  • Thanks 1
Link to comment
Share on other sites

5 hours ago, tictoc said:

For now I just reverted the commit with a patch, and I'm using that patch to build/run stable and mainline kernels for testing.  I also have a few Aquantia 10Gbps NICs, and I'll throw one of those in the server tomorrow to see if it suffers from the same issue as the Intel NICs.

 

Currently I am running 4x 1TB NVMe on an ASUS Hyper m.2 card, in md RAID-0 for a steamcache, squid, and other "victim" data.  The OS is running on 2x 1TB NVMe drives in a btrfs RAID1 array.  I'm not well versed in running something like that on Windows, so I'm not sure how complex and robust it would be.  I know that there were/are? quite a few issues with the fake raid (motherboard/BIOS) on AMD but I've never used it.      

  
Thanks ! 

Owned

 Share

CPU: CPU: ><.......7950X3D - Aorus X670E Master - 48GB DDR5 7200 (8000) TridentZ SK Hynix - Giga-G-OC/Galax RTX 4090 670W - LG 48 OLED - 4TB NVMEs >< .......5950X - Asus CH 8 Dark Hero - 32GB CL13 DDR4 4000 - AMD R 6900XT 500W - Philips BDM40 4K VA - 2TB NVME & 3TB SSDs >> - <<.......4.4 TR 2950X - MSI X399 Creation - 32 GB CL 14 3866 - Asus RTX 3090 Strix OC/KPin 520W and 2x RTX 2080 Ti Gigabyte XTR WF WB 380W - LG 55 IPS HDR - 1TB NVME & 4TB SSDs
Full Rig Info
Link to comment
Share on other sites

Sounds like it could be a bug in how the newer kernel calls for pci express lanes to be throttled or cut back (power management related). I've noticed in the past that x550's will work on too little lanes, but they wont do 10g. I've seen the x550's do some odd things, tend to favor the x540's still as those have always been rock solid for me.

Link to comment
Share on other sites

So, I had a few minutes this morning to play around with the commit that I had reverted to solve the networking issue.  The core of the problem is that the tsc clocksource was being marked unstable at boot, which caused the kernel to fall back to hpet, resulting in the abysmal network performance. 

 

Initially I just increased the MAX_SKEW back to 100µs and the problem went away.  I then played with it a bit and found that 70µs was sufficient to keep the kernel from switching to hpet.  Getting ready to file a bug report, and it looks like there are some patches that have been worked on over the last few weeks.  For now I'll just roll with my small patch and see what the final patches look like. https://lkml.org/lkml/2021/11/18/938

 

 

Link to comment
Share on other sites

  • 3 weeks later...

Small update.

 

Pumps installed, and tubing ran for everything except for pump->CPU and GPU->radiator.   Not sure if I want to keep the drain on the pump inlet or swap it to the outlet. 🤔  Either way, there will be some tipping/flippiing of the case in order to get the CPU and GPUs drained for the inevitable GPU swap out.

 

loopFans_1.thumb.jpg.16759a1e7532d161a85b08f5005596a2.jpg

loopFans_2.thumb.jpg.2ca2dd9378c1ee84ce118c20239fa694.jpg

loopFans_3.thumb.jpg.4ca0e348b78387656a2646e9333a107d.jpg

 

 

Looking to pull the machine off my test bench later today.  Then I'll crack open the CPU/GPU blocks and give them a good cleaning.  After that I'll get the system installed into the case, leak test it, and then give it a final clean and flush with Blitz Part 2.  I might actually have this thing up and running in the next couple of days. 🙂

  • Thanks 1
Link to comment
Share on other sites

  • 1 month later...

I didn't get this up and runing, but I do have everything except for the wiring done.  I will post some pics when I get home this evening.

 

Part of my goal on downsizing was to offline some data from my main file server, but I didn't get rid of as much data as I thought I would. 

Final main storage pool will be an 8 drive btrfs raid10 with 4x 8TB Seagate EXOS and 4x16TB Seagate EXOS.  I'll update the OP, once I finalize the rest of the storage.

Edited by tictoc
Link to comment
Share on other sites

Ready to leak test.  All the difficult to access wiring is done, and I'll slap the drives in after it gets a leak test and Blitz Part 2.

 

This case really isn't big enough for everything that is going in it, but since I've come this far, I'm just going to go ahead with it for now.  I'll probably end up dropping everything in a different case in the not too distant future.

 

Here's a few not so great pics. 🙂

 

leakTesting1.thumb.jpg.2e5001f4a9f1ba1cef49ec1f5a708e1a.jpg

 

leakTesting2.thumb.jpg.ffa2fb4ea8d13b72d66d29d7f16cf7a5.jpg

 

Edit: Looking at the pics, and I just noticed that one of the hdd fans is backwards. :confused_frusty2:

Edited by tictoc
Link to comment
Share on other sites

After getting the 10 HDDs mounted, I decided to do some temperature testing to set the fan curve for the case fans.  Had a derp moment while putting the front cover back on, and now I'll be installing a new front intake fan.

 

brokenFan.thumb.jpg.dd40414e58908831feb0b7d322455518.jpg

  

It might actually be a good thing, since I ordered a Silverstone AP183 which should push quite a bit more air than the Noctua.

 

I'll be cutting out all the unnecessary mounting points behind te fan which should help to increase airflow and reduce noise.

  • Thanks 1
Link to comment
Share on other sites

1 hour ago, Diffident said:

That's crazy.  I've never broken blades off of a fan. 😇 

 

Don't worry about fan curves, servers are supposed to sound like jet engines. 😀

 The fan was spinning at 800 rpms when "someone" decided to put the front cover on the case, and might have inserted one of the cover standoffs directly into the path of the fan blades. 🙂

 

All the jet engines are going out to pasture, and I will basically end up with nearly the same overall performance and sweet whisper silence. 

Link to comment
Share on other sites

  • 2 weeks later...

Everything is up and running, and it will be getting a two week long stress test with the CPU and GPUs running 24/7, all out, for the BOINC Pentathlon. :naughty_devil2:

 

Setting up the whole software stack for the server will have to wait, so for now it's just running a minimal headless Arch setup for crunching.

 

image.png.e21d51d20467c3469b414c1b5722c14a.png

Edited by tictoc
  • Thanks 1
  • Respect 1
Link to comment
Share on other sites

One more NVMe drive.  🙂

 

image.thumb.jpeg.cd3220bc33369539403a2e17e1c214fc.jpeg

 

It took me a minute to get the second drive to show up.  The PCIe bifurcation options are indeed all working, but the labeling in the UEFI is a little misleading.  

 

Build will be complete this weekend.  Swapping out the Vega 64's for a pair of Radeon VII's.  Once that is complete I'll post up some final pics. 

Edited by tictoc
  • Respect 3
Link to comment
Share on other sites

Forgot about this build, rediscovering cool project is always fun 😅. What are you doing with this now? Definitely makes me wish I went in deeper into my TR build ha.

null

Owned

 Share

CPU: Ryzen 7700X
Full Rig Info
Link to comment
Share on other sites

  • 1 month later...

Here is the almost final build. 

 

Missing in these pics is all of the the final HDDs and SATA SSDs, and I am also going to drain the loop to add some QDCs at the GPUs.  I meant to add the QDCs in originally, and then I got in a bit of a rush to get everything up and running for the Pent and spaced it out.  One of the NVMe boot drives is under the GPUs, which cannot be removed without disconnecting the blocks due to the limited clearance between the GPUS and the radiators.

 

final1.thumb.jpg.935e3e067966b07be55ee5852cd8fc1a.jpg

 

final2.thumb.jpg.ae1366a3c55f75b05c3dbdebda4c1ef5.jpg

 

final3.thumb.jpg.d10f6ad7a802c534839a029e03830517.jpg

 

final4.thumb.jpg.c99f85ce4875d7e35acbdbb6e407ba93.jpg

  • Respect 2
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...

Important Information

This Website may place and access certain Cookies on your computer. ExtremeHW uses Cookies to improve your experience of using the Website and to improve our range of products and services. ExtremeHW has carefully chosen these Cookies and has taken steps to ensure that your privacy is protected and respected at all times. All Cookies used by this Website are used in accordance with current UK and EU Cookie Law. For more information please see our Privacy Policy