HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Building a $5,000 Machine Learning Workstation with a NVIDIA TITAN RTX and RYZEN ThreadRipper

Jeff Heaton · Youtube · 128 HN points · 0 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Jeff Heaton's video "Building a $5,000 Machine Learning Workstation with a NVIDIA TITAN RTX and RYZEN ThreadRipper".
Youtube Summary
NVIDIA was kind enough to provide my YouTube channel with a TITAN RTX. This video shows the process that I went through to plan and build a computer based on a TITAN RTX and RYZEN ThreadRipper 24-core 3.8 ghtz CPU.

** Follow Me on Social Media!
GitHub: https://github.com/jeffheaton
Twitter: https://twitter.com/jeffheaton
Instagram: https://www.instagram.com/jeffheatondotcom/
Discord: https://discord.gg/3bjthYv
Patreon: https://www.patreon.com/jeffheaton

NVIDIA TITAN RTX:
https://nvda.ws/2OoXLG7

PC-Part Picker Build:
https://pcpartpicker.com/b/h6DxFT

Windows Media Creation Tool (to install Windows):
https://www.microsoft.com/en-us/software-download/windows10ISO

My Favorite Hardware YouTubers:
https://www.youtube.com/user/Jayztwocents
https://www.youtube.com/user/LinusTechTips
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Jul 15, 2020 · 128 points, 157 comments · submitted by jeffheaton
sabalaba
Good choice on the 24 GB Titan RTX (so you can do at least batch size = 1 for Bert-Large). Not sure if that's the reason it was chosen though to be honest. If you want to do convnets only then you would do better with NVLinke'd 2080 Tis.

Secondarily, I would suggest that you guys not use windows but instead Ubuntu 18.04 or 20.04 LTS and just install Lambda Stack (https://lambdalabs.com/lambda-stack-deep-learning-software). It's a debian PPA that we maintain at Lambda to keep all of your deep learning drivers, CUDA, CuDNN, TensorFlow, and PyTorch up to date with just apt. It's free!

icelancer
Been very happy with Lambda Stack at our company!
mushufasa
how does that compare to the pop_os! nvidia drivers, default on their downstream-from-ubuntu distro?
proverbialbunny
It's a bit more straightforward on Pop OS with two lines, `sudo apt install system76-cuda-latest ;; sudo apt install system76-cudnn-10.2`
mastazi
Interesting! Is Lambda Stack going to work on 20.04? The link mentions only 16.04 and 18.04
hughdbrown
I've asked Lambda Labs 2-3 times if they are going to do an Ubuntu 20.04 stack, but I have not had a reply yet.
sabalaba
It works for 20.04. Sorry if you didn’t get a reply, I’ll look into it. Where did you email us?
sabalaba
It works for 20.04 LTS and we've updated the website to reflect this, thanks for bringing it to my attention!
neilv
If you only have $1K or less to spend, and you don't already have a sufficient PC that you can upgrade with a big GPU...

A non-Threadripper Ryzen, a big GPU, and a big PSU in a big case will go most of the way for most people, and leave you with an easy incremental upgrade path for bigger GPUs (or maybe add a second GPU).

Slightly dated info for my current ML server, which is nicely quiet in my living room, thanks to Noctua: https://www.neilvandyke.org/machine-learning/

(Side note that's not in that page: I like to use older ThinkPads with transplanted vintage keyboards for my workstations, so I needed to make a separate box for the GPU. But life would be easiler, with a lot less juggling complexity, if I simply had the big GPU in my laptop rather.)

CoolGuySteve
If you do plan on buying a GPU you should wait for the Ampere-based 3000 series to come out sometime in the next few months.

It's a process shrink so the performance gain per dollar should be comparable to the 900 -> 1000 series transition.

juped
Threadripper has its own socket type so I'd go with a cheaper or older one of those. Though I think third-gen threadripper is another socket entirely
junar
The problem is that Threadripper mobos are expensive, and as you said, 3000-series Threadripper don't use the same socket as earlier Threadrippers. Under $1000, it's better to optimize the budget for GPU and RAM, and avoid spending too much on the CPU+mobo. You can always upgrade the latter two together if needed.
rmrfstar
You can pick up a refurbished K80 for $200. Like the Titan RTX, it has ~5000 cuda cores and 24GB ram.
disgruntledphd2
I recently bought a P73 thinkpad specced out like this, and it's great.

However, putting a GPU and lots of RAM into a laptop makes it very, very heavy so it's worth thinking about if that's acceptable for you.

0xfaded
I just bought a specced out 1950x on a x399 with 64gb ram (plus the box, power, etc) for about $800, which I think is the fair price for 3 year old hardware. It needs a GPU, but for my usecase it's perfect.

I'm also in Europe, so prices are higher.

disgruntledphd2
Yeah, me too. I spent a lot of money on this machine, but amortised over about five years (which is how long my last one lasted), it's acceptable (that's what I keep telling myself anyway).
m0zg
Here's my recommendation (I've built several such machines for my own use):

1. Go with a 1600W PSU from EVGA or Corsair. Other brands are hit or miss if you ever need very high current on the rails. This will manifest in your machine suddenly powering off when all 4 GPUs are hit with data at once (as is typical at the start of an epoch)

2. Use a mobo with evenly spaced GPU slots, such as ASRock TRX40 Creator. That way you can install 4 GPUs eventually and use that 1600W PSU. You also get 10GbE for distributed training, which is nice.

3. Don't waste money on Titan RTX, get 2x2080ti's instead. Then after a while get two more. Buy blower cards which blow hot air _out_ of the case.

4. Use an extension cable to install SSD and do not install it under a GPU - it'll die eventually due to overheating.

5. Air cooling is fine

6. If you have more than 2 GPUs learn how to adjust fan speeds on GPUs. Crank them to 85-100% while training to prevent throttling.

brian_herman__
Here is their list:

PCPartPicker Part List: https://pcpartpicker.com/list/Jhyzcq

CPU: AMD Threadripper 3960X 3.8 GHz 24-Core Processor ($1348.00 @ Amazon)

CPU Cooler: be quiet! Dark Rock Pro TR4 59.5 CFM CPU Cooler ($89.90 @ Amazon)

Motherboard: MSI TRX40 PRO WIFI ATX sTRX4 Motherboard ($389.99 @ B&H)

Memory: Corsair Vengeance RGB Pro 64 GB (4 x 16 GB) DDR4-3200 CL16 Memory ($329.99 @ Amazon)

Storage: Sabrent Rocket 4.0 2 TB M.2-2280 NVME Solid State Drive ($399.98 @ Amazon)

Video Card: NVIDIA TITAN RTX 24 GB Video Card ($2499.99 @ Newegg)

Case: Corsair Crystal 570X RGB ATX Mid Tower Case ($179.99 @ B&H)

Power Supply: Corsair RMx 1000 W 80+ Gold Certified Fully Modular ATX Power Supply ($204.99 @ Best Buy)

Case Fan: Corsair LL120RGB LED 43.25 CFM 120 mm Fans 3-Pack ($120.99 @ Best Buy)

Total: $5563.82

Prices include shipping, taxes, and discounts when available

Generated by PCPartPicker 2020-07-15 11:13 EDT-0400

p1esk
You can spend half of the specified costs on every single one of the listed components with zero impact on your ML work productivity. $330 for 64gb of ram, really?
akiselev
That is high end RAM binned at 3200Mhz. Consumer RAM is mostly 2133/2400 and with servers often using 2666. RAM at 3200 (PC4-25600) gives you about 25% more peak bandwidth than RAM at 2400 (PC4-19200) and about 16% more than 2666 (PC4-21333)
p1esk
That might be true, but you don't usually care about that for ML workloads running on GPU. Bottlenecks are typically elsewhere.
formerly_proven
3200 and 3600 are the default choice on desktops these days.
junar
Not true, 3200 CL16 can be nearly as cheap as slower RAM nowadays, with multiple brands at sub-$60 per 16GB stick[1]. OP is paying extra for RGB, as another commenter points out.

[1] https://pcpartpicker.com/products/memory/#sort=price&U=4&Z=1...

fokinsean
$120 for 3 fans lmao
jeffheaton
Well, it did include the RGB controller, if that makes you feel slightly better. :-)
alfonsodev
Could you point to a more ML cost efficient build? I’m having problems finding good info resources
p1esk
https://timdettmers.com/2018/12/16/deep-learning-hardware-gu...
GaryNumanVevo
Computers are multi-purpose machines
el_oni
The RGB makes ML models train faster
Datenstrom
I don't know if they are still running the deal but Nvidia was offering $500 off the Titan RTXs if you sign up for their developer program.

Edit:

Note that they can't be used with multi-gpu builds because they (purposefully) do not have a blower configuration. Unless you can source 2080ti blowers which have the same layout or do a water cooling build it will cause thermal throttling.

ericd
I think you can do 2 if you leave a slot in between and put a bit of thought into the airflow, but 4 would be tough without blowers/water.
paol
It's worth noting that if your ML work is entirely CUDA based (as often happens), you likely won't benefit from a Threadripper CPU. Downgrading to a Ryzen 9 or even 7 will reduce costs by a good bit. The savings can be pocketed or put toward a second Titan RTX + NVLink (48Gb usable VRAM).
Alupis
> It's worth noting that if your ML work is entirely CUDA based (as often happens), you likely won't benefit from a Threadripper CPU

Perhaps for the actual ML part, yes, but a ton of work must be done first to organize and filter the data, which is where all those cores would come in handy.

csdreamer7
Most Ryzen consumer motherboards have a limit of 128 gigs of RAM and 16-20 direct to the CPU pcie lanes. Is 128 gigs of ram and x8 pcie lanes for dual GPUs, a bottleneck for ML workloads?

I can see the lanes not being an issue for the next gen Titans, that will likely use pcie 4.0, but that is months away.

Asking as someone outside the ML field.

paol
The reduction from 16x to 8x PCIe lanes is usually not a bottleneck for ML. Still, it's always a good idea to benchmark and validate the configuration, especially if you're planning to spend a lot of money on a bunch of identical systems.

As for RAM, only you can know how big your datasets are. But if you're training models on GPUs the bottleneck is almost certainly going to be GPU RAM, not system RAM.

proverbialbunny
In order the bottleneck is: gpu ram, cpu ram, then pci-e lanes.

There is a big delay moving memory from ram to vram to run a task on the gpu, so much so that you'd be better off running the task on the cpu if you can't fit it all in the gpu, or are very clever in how data is buffered, which isn't an option for neural networks. Because of this, the pci-e lane is not saturated except when first sending the data to vram. PCI-E 3.0 x8 runs at 7880MB/s, so if your gpu has 16gb of vram, the difference between x8 and x16 is 1 second, when a task can typically take 8+ hours to complete.

topspin
Yes, thats about $1000 savings. Also, the 80+ Gold power supply is an inefficient choice given then lack of a second GPU; without the second Titan that 1000W power supply will never see 50% load. If you're over-buying power supplies for future expansion then use an efficient titanium rated supply which will waste less power at low loads. The price difference is $80.
Alupis
I thought the 80+ Certifications were about how efficient the PSU was at not converting electricity into heat, ie. loss? Perhaps I was wrong?
topspin
> Perhaps I was wrong?

No, you're not wrong. Not sure how what I wrote conflicts with that. I don't think it does.

Alupis
> Also, the 80+ Gold power supply is an inefficient choice given then lack of a second GPU

That's what tripped me up, I think. Even without a second GPU, 80+ Gold or better is a good choice.

Your next sentence makes sense though, 1KW PSU's are usually overkill, even if you like to oversize your PSU like I do.

topspin
> 80+ Gold or better is a good choice.

The selection was "gold", specifically. And that's not as bad as it might be, but titanium is better across the board and much better at low load. A titanium supply is more efficient at 20% load than a gold supply at 50%, for instance.

If you're over-sizing your power supply by ~60% (as is the case here) then this is significant.

Alupis
I'll keep that in mind on my next build. The pricing steps up quire radically though, it seems.

But, you build enough "rigs", you learn not to skimp on certain components like PSU's, Cases and Motherboards... which is normally where new builders cut corners.

freeqaz
IIRC PC power supplies are most efficient at around 80% utilization. Below that they are not able to hit their "rates" efficiencies.
Alupis
Hmm, interesting. I've always oversized by PSU's as a matter of course, since I've always thought working at 60% capacity is better than 90% or whatever.

I usually drop a 750 watt 80+ Gold into most of my builds, even though a 500 watt or even a 450 watt would be sufficient with a single GPU, and have no plans for a second GPU.

paulmd
aiming for 50-60% capacity during typical operation is the standard recommendation.

Efficiency usually starts tapering off below 50% and below 30% it falls off a cliff - however, that just means instead of an ideal 10W power consumption you're actually pulling 30W or something like that, it is usually not a big deal in absolute terms.

(there are also some exceptions, some of the platium/titanium PSUs actually can hold pretty decent efficiencies right down into the basement.)

750W is a good "standard" recommendation, that's enough for any one GPU on the market.

The rule of thumb is really more to guide people not to buy 1600W or 2000W monster PSUs just because "bigger number is better!".

(Although those giant PSUs do have the advantage that they can often run completely passively under load, they won't kick fans on until 50% or 60% load, which for a 1600W PSU means you can comfortably run a high-end GPU and a high-end CPU completely passively.)

Klinky
Not really that inefficient, like a 3 - 5% difference between Gold & Titanium[1]. Additionally the Corsair RM1000x actually breaks into 80+ Platinum territory in testing[2]. Also I am skeptical of a 1KW Titanium rated PSU for under $300 that's actually in stock.

1. https://en.wikipedia.org/wiki/80_Plus#Efficiency_level_certi...

2. https://www.jonnyguru.com/blog/2015/10/25/corsair-rm1000x-10...

dodobirdlord
A caveat is that if you’re going to use multiple GPUs it’s essential to get something like a Threadripper or a Xeon that has the pcie lanes to provide the full 16 lanes or at least 8 lanes to each GPU.
ericd
I’ve found that it’s really nice for things like image augmentation, and running RL environments in parallel. But maybe I should be doing augmentation in Dali.
confuseshrink
It depends on how intensive your pre-processing pipeline is. With a really fast accelerator you can quite easily start to be bottlenecked by your CPU.
paol
True, but Threadrippers start at 24 cores and go up from there. That's got to be some intense pre-processing. Not impossible I'm sure, but it would be unusual.
paulmd
Threadripper is the only way to get more than the standard 20 PCIe lanes (and really only 16 lanes to the slots, on all but one board). It's possible that OP would have gone with a lower core count version if one existed, but the minimum buy-in on Threadripper 3000 series is the 24 core model.

tbh this is kind of one of the ideal use-cases for Epyc. And with the way AMD has set up their pricing, it's actually no longer cheaper to use the workstation processors, in some situations it's significantly more expensive, they are really ripping you for the clock speed, and removing a bunch of other features in the process (RDIMM/LRDIMM support, etc). I strongly encourage everyone doing homelab and home ML rigs and similar stuff to really think about whether they want Threadripper, bearing in mind that threadripper is often more expensive than Epyc. It's no longer an obvious choice that server processors are for servers and home users can only afford workstation, it is the other way around.

AMD offers some low-core-count single-socket Epycs that are ideal for "lighting up the platform" tasks like this. Like, 7232P is a $450 processor and the 7402P is $1150. And they don't offer anything like that on Threadripper. They clock slower, sure, but they're not really using the CPU anyway. And that gets you a full 128 PCIe lanes, octochannel memory and RDIMM/LRDIMM support so they can stack in the memory.

If they want to game on it in their spare time then sure, Threadripper is probably the way to go.

p1esk
Threadripper Pro is an ideal processor (higher clocked Epyc). Unfortunately it's OEM only at this point.
fomine3
+1. EPYC 7282/7302P/7402P is cheaper than we expect and gets massive RAM/IO capabilities. M/B is also not so expensive.+1. EPYC 7282/7302P/7402P is cheaper than we expect and gets massive RAM/IO capabilities. M/B is also not so expensive. Downside is that higher clock SKU of EPYC is expensive.
colincooke
Should note (from someone who has a few of these systems at my lab) unfortunately the consumer RTX cards don't do memory pooling. This means that although NVLINK is good for inter-GPU comms it doesn't actually allow you to run giant models that need the entire 48GB of memory for a backwards pass (treat the combined cards as "one card"). Not typically a problem for most people but worth mentioning
paol
From https://www.nvidia.com/en-us/deep-learning-ai/products/titan...:

"NVIDIA TITAN RTX NVLink Bridge

The TITAN RTX NVLink™ bridge connects two TITAN RTX cards together over a 100 GB/s interface. The result is an effective doubling of memory capacity to 48 GB, so that you can train neural networks faster, process even larger datasets, and work with some of the biggest rendering models."

p1esk
That’s not the type of memory pooling they are talking about.
colincooke
Yeah you're not wrong, but it's a bit misleading. This allows you to run faster, but it does it by allowing you to use a larger batch size (arguably not best practice but your mileage will vary). Memory pooling is a bit different in that you can treat the combined cards as a single card from TF/pytorch.
ivalm
But batch size is prob least problem since you can do data parallelism (send half batch to each gpu, combine on cpu).

I think only model bigger than gpu mem is where you really wish for nvlink on v100s.

sabalaba
Memory pooling is irrelevant for DL training. 24 GB is enough to run batch size of 1 for Bert-Large so honestly this is a good choice. Some folks are saying that 2x 2080 Tis would have been better and that's true if you're doing convnets but any large scale language model fine-tuning you'll want to have at least 24 GB of vRAM.
p1esk
You contradict yourself. Memory pooling is precisely what would allow you to train your bert large on two 2080ti.
sabalaba
No, my comment says that the two 2080 Tis would be better for convnets / situations where you don’t need to train Bert-Large. If you’re sure about memory pooling looking working for DL, please share code and examples, we would love to see one.
WrtCdEvrydy
Yeah, Quadros ... the cocaine of the ML world.
colincooke
I think the nice Volta cards (V100) does it "properly". But out of reach for most small scale setups (academic labs, prosumer, independent researcher, etc.).

Unfortunately the best case for high-mem use-cases is to just rent from GCP.

p1esk
None of the ML frameworks support memory pooling so unless you write cuda code yourself this point is moot.
_5659
I'm a bit concerned the build uses a Gold Certified power supply unit?

Even for cheaper builds for non-ML workstations I would still only use Platinum and nothing less. I've been told Titanium is excessive but I mean I leave these things on for a while and power is expensive.

For the DIY enthusiast or the WFH researcher, also the amount of heat involved can be a considerable cost in cooling or utility cost which varies substantially by floor of a building. It's probably not good, but not that bad to aircool this many GPUs as I've done in the past but it definitely means I'm paying a lot for A/C in the summer but almost nothing in the winter.

I think Smerity even said he heated his small bedroom through the San Francisco winter off of one GPU while researching YOLO.

Point: These things get hot. They require a lot of electricity. You should be concerned about a good PSU even for smaller builds. My energy cost for a 6GPU rig ran me about 1/3 of my total rent for a small apartment. That's electricity BEFORE I calculated my A/C bill which was separate and also substantial. My landlord hates me because I initially talked him into including it with my rent.

All in all, it still makes sense to keep investing in local workstations, on-premises builds. No security concerns about a cloud, no futzing around with integrated notebooks, you own it you control it, and the price point up front is extremely attractive compared to base rates for cloud computing even on specialized hardware like a TPU.

The numbers I come up with for batches still have a wide gap of several thousand USD most of the time, and then there's how much time it takes and how likely their service breaks.

So kudos for the person who put in the effort to put this together and share. Any and all efforts towards making ML/DS affordable and DIY rises the tide for all boats.

Question to the audience: Does anyone build GPU rigs like this for cryptocurrency anymore? I was only able to build a workstation once the price for GPU cards crashed.

Covzire
Not sure about the latest PSUs, but in the past high efficiency PSU's had to make trade offs with increased ripple/noise so they weren't ideal where maximum stability was desired.
_5659
That's my understanding as well, that Platinum is more of a sweet spot compared to Titanium or Gold for these tradeoffs.
CyberDildonics
> he heated his small bedroom through the San Francisco winter off of one GPU while researching

He must not have needed much heat, since a huge GPU would still be 1/6th of a space heater.

juped
San Francisco winters are exactly the same as San Francisco summers: in the 50s or 60s.
tedunangst
What is the actual difference in expenses for running 500W load for a year with a gold or platinum power supply?
darkarmani
Quick approximation: maybe $10 if it really is 92% to 94% efficiency? (at a low $0.10/kw). Although other commenters say it is a bigger efficiency difference.
gameswithgo
The efficiency delta between Gold and Titanium is really small. Optimizing that for heat reasons would be optimizing less than 1% of total heat output. And most cases keep the power supply thermally separate from the rest of the stuff anyway.

This guy has a very oversized Gold power supply, the efficiency would be ~92% with gold vs 94% with platinum. Maybe a smaller titanium one would be a better overall choice I guess.

_5659
Generally I think going jumbo is good for cooling, because honestly the weight for this build probably doesn't matter as I imagine it's not getting transported often.

Conversely, this is not a build for overclocking per se. However I think it's a safe assumption we are running at capacity of over 90% for multiple days or weeks even. If batches run into over a month, probably time to get a server rack instead?

It is worthwhile to note you're not going to save any money in efficiency for computation you don't use.

tedunangst
There's no way this build draws 900W.
svnpenn
What do people use ML for these days? I do computer programming, and I have done some work with video encoding, but this just seems like a huge investment money wise. So I am curious what use it is.

For my needs the most intensive thing I do is compile some large programs or encode some large video, which you can get a computer for that for like $800.

aunty_helen
Here's an example for what I'm using it for: https://news.ycombinator.com/item?id=23608360

I explained the technical details in the sub comment.

I was looking at buying one of these Titan cards a few weeks back but then nvidia announced the next gen processors were coming out so have decided to wait until they refresh the 2 year old titan line instead of paying full prices for an almost out of date card.

When training models for the object detection, the current algo we're using isn't focused on memory efficiency. So the 8gb card we currently use to train models is unable to process images at the correct resolution. We have to down scale about half to get it to fit. With the Titan RTX you get 22gb which is enough.

On another note, the titan cards aren't the same as the normal geforce cards. Nvidia have gone to great lengths to ensure product differentiation so they can charge power users with business budgets more than people sitting at home playing games. One of the good things about the titan cards is they have a dual memory controller so you can write and read at the same time which improves your fill rate.

p1esk
People use ML for pretty much anything these days, including compiling programs [1] and encoding videos [2]

[1] https://arxiv.org/abs/1805.03441

[2] https://arxiv.org/abs/1904.12462

proverbialbunny
ML is typically used to find correlations in data. If something happens over and over again, there is a high chance it will happen again. Having such an algorithm that has identified this correlation allows it to identify when it will happen again. This allows for what is called predictive analytics.

This can be as simple as identifying when a customer will end their service with a business, as there might be a pattern before previous customers have left, predicting when new customers are going to leave, and giving them a coupon or similar right before they would otherwise leave. This problem is called customer churn.

It can be as complex as identifying when hardware will fail ahead of time, or even bio-ware. For example, I did a project that predicted when people were falling into depression before they could tell they were with a high accuracy rate. I also predicted other future medical issues ahead of time, like the probability an elderly person is going to fall over within the next handful of days.

On the business side there are a lot of use cases for ML, but it falls more into analytics than engineering, as it's about predictive insight.

darknoon
Now is a particularly bad time to build a rig, since new NVIDIA cards are launching in a couple months. The value of a used 2080Ti (Turing) will tank, because Ampere cards will be available with similar performance for half the price.
CarbyAu
Agreed. I need to update my gaming rig. Waiting for - Ryzen 3 - next round of GPUs from both vendors (although ML folks likely stay nVidia of course) - with luck, a better pcie4 SSD will be out by then too.

I really wouldn't build one now unless I had to.

a2h
Interesting video, thanks for sharing. Just curious if you have one with tests or benchmarks for the completed build and/or temps at high loads? Would be cool to see :)
jeffheaton
Those will be coming!
dodo6502
I think that tape-like piece that you removed from the SSD compartment is actually the thermal pad that makes contact between the SSD and the MSI heat sink cover so you may actually want that!
highfrequency
Thanks for the video! Could you comment on the differences between the Titan RTX and the V100? I am a bit confused because the V100 is significantly more expensive ($7k on Amazon even for the 16GB version) and has a slower clock speed, yet it is the standard in ML research papers. I see that it has ~10% more CUDA cores, but it doesn't seem like this would warrant a 3x price increase.
smabie
It's price discrimination. There's no reason why anyone would want a V100 besides that Nvidia doesn't "allow" you to use GeForce cards for ML research on servers, assuming you're big enough.
DoctorOetker
how exactly is this enforced?

I don't have a ML box yet, let alone a load of servers, but I am contemplating assembling my first rig for ML. If I buy (perhaps secondhand) some GPU, do I risk the thing refusing to work if it incorrectly thinks I'm a server farm?

I have no idea how this could work, or is it just limited to 1 GPU per box? or the proprietary driver phones home? or certain CPU / mobo chipsets are detected and it refuses to run, even if its your only box?

smabie
Nah, you're good. Don't worry about it.
plasticchris
Don't worry, it's gated by a few features (just Google for it) but mostly by contacts. Building one at home won't trigger those.
freeqaz
It's price segmentation. If you NEED the slight increase in power (and specific features like float8 at full speed) then NVIDIA charges significantly more for that. Gamers are more price sensitive than ML developers.
Uehreka
The other commenter mentioned the “pro-level” tradeoffs, but there’s something else too: Nvidia’s licensing won’t let you use GeForce cards in the cloud. If you’re building a datacenter, you have to use the Teslas.
highfrequency
Very interesting. Do you have a link that describes this policy?
p1esk
2x 2080ti would be faster than titan rtx, provide the same amount of memory, and would be cheaper.
SloopJon
If NVIDIA gave me a Titan RTX for free, I would use it too.
formerly_proven
I'd even buy my own waterblock in that case.
Sholmesy
Lots of drawbacks with this approach: - More heat - More power consumption - More noise - The GPU memory isn't addressable as a single unit
paol
> provide the same amount of memory

Are you sure? The last time I checked the situation with NVLink memory pooling with 2080ti cards was very unclear.

colincooke
Unfortunately multi-GPU training doesn't scale linearly yet [1] so it's often a better call to get a larger card then two smaller ones, at least for the single-model case.

[1] https://github.com/keras-team/keras/issues/9204

BadInformatics
(non-TF) Keras has notoriously bad multi-GPU support though (and was generally not well optimized. Case in point, the latest version just re-exports/forwards to tf.keras).

Looking at something like https://lambdalabs.com/deep-learning/gpu-benchmarks or https://github.com/tensorpack/benchmarks/tree/master/other-w..., multi-gpu scaling on 2080tis seems pretty darn close to linear. Plus, there are benefits to having more than one accelerator handy on a local workstation. For one, it's much easier to have multiple experiments running simultaneously or to run parallel training (e.g. hyperparameter search or RL episodes). Given that only the uber-expensive enterprise cards have proper virtualization/time sharing, trying this workflow on a Titan RTX will most likely be suboptimal unless you always run models that can make use of most of the memory and compute (no RNNs, no Neural ODEs, etc.)

zmmmmm
I am curious about the opposite end of the spectrum. What is the smallest and cheapest self contained setup that can be a serviceble development box for someone doing ML / AI type work? Does not need to run the production load, but has to be capable enough to allow local development activity that is still representative enough.

So far the best I have identified is Intel NUC8 + nVidia GPU via Thunderbird. But it is still $1000 at least by the time you have it all together.

NB: I know lots of people will say, just do it with cloud, but I work in a setting where much of my data cannot be put in the cloud, and also where the cost structure of funding well allows for fixed capital expenditure but not variable cloud costs.

p1esk
This entirely depends on the specific ML work you want to do. Smallest and cheapest could be something like Raspberry Pi or Jetson Nano.

By the way, $5k ML workstation is still on the cheaper end of the spectrum. An 8x A100 machine will set you back at least $100k. And even that won't be enough to finetune GPT-3.

fomine3
Buy used ATX tower desktop PC on Skylake gen (or buy new parts for Ryzen 3500 build), buy GPU (2070 SUPER for budget/perf?), buy new 750W PSU, put these parts.

GPU via Tunderbolt looks like most expensive way.

plasticchris
Just buy a case, motherboard, cpu, GPU, ram, psu, and build it. At the extreme low end you can buy a refurb Dell tower and drop in a new GPU.
Jestar342
With the size of air-coolers these days, and how they all have integrated heat pipes, I'm beginning to wonder if we've crossed the distinction barrier with liquid-coolers.

Holy moly is that a big heatsink.

gameswithgo
Yeah ultimately the benefit of water cooling is just to route the heat to where you have room for a bigger radiator. If you squish a big enough radiator on top of the cpu and have good airflow through the case there, you can generally get about the same cooling performance as most sane liquid cooling solutions.
CarbyAu
^^ This ^^ I like water cooling, but there is no replacement for displacement.

You can have higher tech radiators with better fins along with better fans and airflow modelling etc of course.

But it is always better if there is more of it.

capableweb
To be fair, not all air-coolers are of that size. Person who built the computer is probably overdoing most of the things, including the cooling, even if you're running it constantly.

But also depends on where in the world you live and what the climate is like.

gameswithgo
That isn't overkill size for a 24 core cpu. However that particular heat sink is not great on threadrippers. It is probably ok for the 24 core TR because it doesn't have chiplets on the edges. The DarkRock TR model doesn't have heat pipes on the edge of the heat plate. They just made the heat plate wider, and it suffers on the 3990x for it. The Noctua equivalent is a better choice for these cpus.

For the normal desktop cpus, I like the Dark Rock better. Cooling is the same as noctua but was quieter for me.

mjayhn
Yeah I actually stopped doing water cooling (usually Corsair AIOs) this year and went with this HSF just for ease of complexity, not that I ever had any problems with my AIOs.

I should note it's WAY louder than I thought it'd be and it's made getting to my NVME drives a bit difficult.

I didn't do much research beyond "best analog HSF vs watercoolers" and when it showed up I couldn't believe how big it was.

whywhywhywhy
I went from AIO to air cooling quite recently after the pump on my AIO failed. Honestly not sure if I was just going from a bad AIO to a good cooler (smaller version of the Dark Rock Pro) but I actually found it to be quieter and the performance difference.

Honestly the best bit though was the piece of mind that I don't have liquid inside my computer anymore. The idea of the pump leaking down onto my GPUs when I'm not around was stressful.

jhloa2
It does seem like watercooling might be a better option for these high-TDP CPUs. I went back to air cooling on my latest build because I went with a Ryzen 3600 which only puts out around 65W under load. IMO, the cost/benefit of watercooling just isn't there for normal use cases because of the risk of leakage.

Building an open water loop would be awesome for the GPU cooling. I just can't justify the expense.

Alupis
Liquid cooling excels in flattening temperature spikes, usually from bursting CPU frequencies, since it has more thermal mass and can absorb spikes without much change in coolant temperature.

Air cooling excels mostly everywhere else.

Neither can cool your system to below ambient room temperature.

mamcx
Are not liquid-coolers less noise? This is the bit I worry more.
CarbyAu
The AIO are not really worth it. Custom loops can be if you do it right. My basic rule is - There is no replacement for displacement. get a big, like proper big 200mm(in both directions) or bigger radiator and a single big fan.

Ultimately, you want more surface area to blow air across to cool things. "Water cooling" is really "water transports heat to fins for cooling". Just a bigger version of the heatpipe really.

If the fins are not better suited to airflow cooling then it won't help you.

The AIO radiators don't add that much more fin area than a good air cooler if at all. But you can put them somewhere outside the otherwise warm case if that helps. Maybe if you live some cold and your desk is just right you could dangle it out the window?

Also, for the noise conscious, two(often on the 280 AIOs) or more fans "beat frequency" is a thing. Also why I don't like "push-pull" configs on air coolers.

jrockway
Air cooling is wonderful. I didn't watch this video but I use a giant Noctua heatsink on my system (Intel 6950X, not quite as demanding as Threadripper but hardly a cold-running CPU) and I have zero regrets. I have used all-in-one watercoolers, and ... they just die after a few years. I have owned three and all three of them died in less than 2 years each. Each time, I very sadly woke up in the morning with the realization that I'm not going to be using my computer today. AIOs have not, in my experience, been quieter or yielded better temps than air cooling. So I am not sure it's worth it.

Obviously a homebuilt watercooling rig is going to be way better than AIOs, but it requires extensive maintenance. It's water. Stuff will grow in there. It evaporates. Joints loosen over time and could end up shooting water all over your $6000 workstation.

To me, it's not worth it. Threadripper requires a big heatsink. So be it.

CyberDildonics
My experience is the opposite. I'm sure all in one water cooling varies, but water cooling can be done mostly with cheap parts from a hardware store. The most exotic part is the water block, followed by the pump. Radiators can be used, but a lot of computers can probably cooled by using soft copper spools and a 200mm fan.

The pump does not need to have air flow to do its job and airflow is what carries noise. The water just has to be able to flow through the loop, pretty much any rate of flow will be pushing enough water to move heat away. If you hold your hand to a water block with the pump turned off, let it warm up and turn it on, it feels like the water is going over your hand. The temperature change is immediate. If you turn off the pump while the CPU is running, the temperature will rise a few degrees each second. Once the pump goes on the temperature drops down immediately.

There is no reason that the liquid should evaporate unless there are leaks and joints don't necessarily loosen over time. Also leaks are unlikely to manifest in something bursting and water spraying, even if you had enough pressure from your pump, which you don't (a pump can probably only pump water four or five feet higher than the pump itself, even if it is more powerful than the loop needs).

I have never had to maintain or fix something after it is together. Use vinyl tubing from the hardware store over the barbs and use small pipe clamps on top of them.

CarbyAu
I am a fan of water cooling too. Built gaming rig for my brother. He had an issue and posted it over 3000km (2000 miles) while full of water.

Was fine. I fixed it, refilled it and posted it back! Still running 5 years later.

But I do go a little more paranoid and insist on more expensive silicone tubing(some $$$). Fits nicer, feels nicer, less concern over perishing.

smabie
Linus Tech Tips did a video in which they were unable to beat a high-end air cooled nuctua set-up with any sort of water cooling solution. And by unable to beat, I mean both temp-wise and sound-wise. Maybe, just maybe, you might be able to beat air cooling on one dimension with a very expensive custom loop, but it's totally not worth it.

Water cooled PCs do pretty look cool though.

mywittyname
Those air cooled units with the heat pipes are effectively heat pumps, and heat pumps are crazy efficient at cooling. They rely on evaporative cooling, which soaks up substantially more energy, so much so that it's possible to to get temperatures noticeably below ambient using the method.

Converting water to steam takes about (from the top of my head) 5x more energy than it takes to increase it by one degree. I imagine the coolant used in these heat sinks is at least this efficient, if not more, and is something that evaporates well below 100°C.

If heat sinks were like using cars and roads to transport people. A water cooled setup would be like expanding the size of the roads to increase the number of cars that can transport people. While those heat pipe coolers are like replacing cars with trains, so you can do a lot more the same amount of space.

nkurz
> Converting water to steam takes about (from the top of my head) 5x more energy than it takes to increase it by one degree.

I think this greatly underestimates the potential of evaporative cooling. The "specific heat of water" is 1 calorie/gram °C --- that is, one calorie can heat one gram of water by one degree Celsius if no phase change is involved. The "heat of vaporization of water" is more than 500 calorie/gram at 100°C. That is, the energy necessary to convert a given quantity of liquid water from just below boiling to steam is not 5x the energy necessary to raise that amount of water 1°C, but 500x! You are probably remembering that this is 5x the energy necessary to take liquid water all the way from 0°C (almost frozen) to 100°C (almost boiling).

andrewon
When he said training on Google colab took one day and on his computer took 20 mins, did he compare with google colab CPU? The difference seems too large.
colordrops
Being unfamiliar with ML work, when does it make sense to build one of these vs spinning up some instances on AWS or gcloud?
bob1029
I think it really depends on how much you care about ML and how performant you actually need it to be. If you are a hobbyist or prototyping something speculatively for work, perhaps a cloud instance is prudent. If ML is your life's work, I'd probably consider throwing down for a proper rig so you don't get killed on cloud hosting fees.
mikece
Does anyone do a measure of how long it would take such a workstation to pay for itself (including some nominal amount of operational cost for electricity) compared to simply doing ML on AWS/Azure/GCP? Seems like such a metric could be a useful measure for comparing such machines.
CoolGuySteve
A comparable workstation costs about a month of on-demand EC2 time or 3 months of spot instance time.

AWS GPU instances are really expensive.

The most cost effective imo is to build a workstation for development and then deploy to AWS spot if you need a cluster.

If you can't use a workstation for whatever reason, then use the new AWS feature to "stop" spot instances and use the spot instance as your workstation while being conscious of the high hourly cost and shutting it down when you're not working.

FridgeSeal
Azure ML/GPU instanced are also really expensive.

I did the maths recently and figured out I could put together a machine with a couple of 2080 Ti’s and have it pay for itself in a couple of months.

I’m very seriously considering doing it, especially as I’m the only data scientist, if I had a team I’d be more in favour of going to the effort of setting up cloud-based training jobs etc

CoolGuySteve
That's what my partners and I did. But we bought refurbished 1080 cards for about $300 each and Ryzen 9 hardware.

We're waiting for the 3000 series to come out which should be a large performance/dollar improvement over the current gen cards due to the smaller transistor size.

mpfundstein
i have a threadripper 1920x with 2x2080ti.

When running cpuburn, i get around 65 tdie temp and with gpuburn, the upper card gets to around 86 and the lower one to 81.

i have right now a water cooler for the cpu, 3 inlet fans (bottom back) and 2 outlet fans through the water cooler on top. i was wondering what temperatures I should aim for and what an optimal fan configuration is. i have a couple of fans laying around.

the case is lian li O11 Air and the mobo is a taichi x399.

anyone any tips?

also I would want to use SLI. but then I would have to remove the fans on the gpu. Do I need then to water cool the gpu or what is the solution?

if anyone of the moddibg pros here can help that would be awesome :-)

potiuper
Please fix title Tiitan typo.
dzink
A bigger box helps reduce cooling and power expenses. I built a ThreadRipper tower with 2080TI last year and used BeQuiet 900 for it with very nice results.
peterpost2
Did not expect a video that wholesome.
rcgorton
Did you consider NOT using NVidia? Both the capital cost and operational costs are huge compared to fairly high end Radeon cards (NO, I"m not an AMD employee - I merely choose based upon my personal budget)
gameswithgo
If you go with air cooling on a threadripper, I suggest going with a Noctua cooler instead of Dark Rock. Dark Rock extended the size of the heat plate to match the TR cpu size, but they didn't cover it with heat pipes, Noctua did. Cooling performance really suffers on the 3990X because there are chiplets on the edge of the cpu. the 32 and 24 core models it may not matter so much.

See: https://www.kitguru.net/components/cooling/luke-hill/threadr...

On non threadripper cpus I actually like Dark Rock better. Cooling is the same as Noctua but it looks cooler and was quieter for me.

wincy
Also consider the new Icegiant ProSiphon Elite [0]. Mine is arriving in September (due to Coronavirus related delays) but initial tests of prototypes by LTT [1] and others showed better cooling than AIO water cooling. Also since it uses dielectric fluid there’s no risk of leakage and frying your expensive computer. It’s $169 MSRP but based on what I’ve seen seems worth it. I’m not associated with them in any way just think it looks like a cool new product!

[0] https://www.icegiantcooling.com/

[1] https://m.youtube.com/watch?v=M13dWRL9qkc

p1esk
Yes, if it is as good as it seems, I don’t see any reason to use water cooling anymore.
ADent
If you believe their copy it won’t work on most computers in a tower case, since the CPU isn’t on the bottom and it uses a gravity feed to return the liquid.
p1esk
This used to be the main problem with this technology and they claim they solved it.
fomine3
Noctua's cooler is most nice look for me so there's no problem
wincy
One drawback of AIOs that I've read about is that you're no longer air cooling the VRMs surrounding the socket, which can decrease the life of your motherboard.

They also recently decided to include 4 120mm fans in the package, two push and two pull, which is a pretty crazy amount of air cooling.

bob1029
Noctua is an automatic default for me now. I've got the NH-U14S on my 2950X and a NH-D15 on my 1800X. Never have any problems with these. Easy to install and maintain. Will probably reuse both when I upgrade my CPUs.
logjammin
I think this is good advice in general -- you can't really ever go wrong with Noctua.

I built a Threadripper workstation last year but went with liquid cooling. However, I put 3 Noctua fans on the radiator and haven't looked back. Terrific company.

trzeci
My addition is that Dark Rock Pro TR4 is pretty bulky and I have a problem with Asus Zenith Extreme (Please note it's the older generation for 2950X) and radiator covers my PCIE#1 slot, so that I can't put graphics card there.
fsociety
Noctura is even more bulky, unless you get the slim version.
keeganpoppen
yep, i went with noctua w/ the dual fans in a push/pull configuration on a 3960x, and it's worked great so far. except that the two fans aren't quite the same color, sigh.
bicknyers
Also if you go air cooling and are confident with your abilities, consider delidding to drop temps. more (5 to 20C). If you go air cooling I would assume it is on the basis of long term stability, so don't use liquid metal either. Also invest in a nice PSU (gold minimum) with your peak load pulling only 75% of the rated max wattage

Edit: Like most things look at components real-world testing figures, in this case, wattage, as opposed to TDP when planning

eightysixfour
Ryzen CPUs are soldered, delidding has minimal impact.
bicknyers
My mistake, I thought Ryzen CPUs were still bonded with silicone and a ??? grade paste on the IHS
formerly_proven
To be honest I can't recall any AMD CPUs that weren't soldered... apart from those that didn't have an IHS in the first place ;)

Intel was just like "we can save 4 $ BOM cost on a 500 $ part there" with their 4th-9th generation of CPUs. No thermal headroom? No problem.

formercoder
0.8%? Could be worth billions.
formerly_proven
TR is a nine die CPU that is pretty difficult to delid and the improvement is minimal, nowhere near "5 to 20 C". It's not a 7700K.
bicknyers
Yeah to be fair I was basing my numbers on past projects namely Intel. I haven't looked at TR specifically just offered up the suggestion for those who have never heard of it to look into
switchbak
I went with the U14S for my 24 core TR, thinking I was crazy due to AMD's recommendation for robust water cooling.

I was worried at first when running heavy multicore benchmarks because the heat spiked so quickly. Turns out my workload scales quite poorly (boo), so I'm rarely pushing the temp envelope at all.

I did notice that a two fan setup on this was pretty noisy though, too much to bear sitting next to, so I threw it in the garage and ran some cables. Nice for summer temps and no AC in the house too!

I'm plenty happy with it now, even if the Noctua doesn't quite fit my case.

paulmd
water cooling isn't really that much more efficient than giant air coolers.

at the end of the day, dissipating heat is a function of the radiator fin area and air movement, the water is just a working fluid to move the heat to the radiator fins. Apart from having more thermal mass (takes longer to heat up/cool down) it isn't inherently more efficient than a heatpipe. It's water moving the heat either way, and in some ways the heatpipe is actually more efficient (evaporation moves more heat than raising the temperature of the water a few degrees).

People don't realize it, but a D15 (dual tower cooler, a bit bigger than your U12S) is about on par with a 240mm or 280mm AIO. A 120mm or 140mm AIO is worse than your U12S.

qes
> Apart from having more thermal mass (takes longer to heat up/cool down)

_Much_ more thermal mass, and takes much longer to heat up and in most set ups will never reach the same temperature as the heatpipes in an air cooler - the waterblock will be cooler at idle and under load. I would think that larger temp delta allows heat to conduct away from the cpu package more quickly.

My 3960X sees 125MHz higher all core full load, almost 100MHz all core average work load, and 50MHz higher single core boost clocks under a custom loop (280+140 rads & Optimus TR3+ block) than under a Noctua NH-U14S TR4-SP3.

derefr
> at the end of the day, dissipating heat is a function of the radiator fin area and air movement, the water is just a working fluid to move the heat to the radiator fin

...in a fluid-recycling system (as all consumer PC cooling is.)

Water cooling is a lot more efficient than air cooling, in a fluid exchange system. Like if your HPC data-center is on the ocean, and you can just pump in cold ocean water, "spend" it by heating it up, and then pump the hot water to an outtake far-enough away that it's not heating up your intake water.

(An even-denser fluid would work even better, but we don't have oceans of even-denser fluids laying about.)

m3at
You're right, however one advantage of water cooling your cpu –especially for multi gpu builds that are common in ML setup– is that it moves the heat dissipation away from a crowded area.
m463
I always thought water cooling was about noise.

You could move the heat to a giant radiator, which could have a giant fan. The bigger fan would move exponentially more air with exponentially less noise.

Meanwhile air cooling seems to require a lot more engineering, and you end up with something that looks like an oil refinery in the center of your motherboard.

ben-schaaf
This is only partially true. A larger radiator needs a faster/larger pump, leading to more pump noise.

Interestingly most air cooling solutions are really pump-less liquid cooling, as the liquid in the heat pipes is what transfers most of the heat away from the cpu.

paulmd
It comes down to how much surface area your cooler has (whether that's radiator or a cooler tower) and how much air is moving over it.

There is no magic about a water cooler that reduces noise. It is a thermodynamically determined process. The fin stack has a certain temperature, there is a certain amount of surface area, and a certain amount of air moving over it (at a certain temperature). A radiator really gives you no significant advantage in any of those areas. In fact a lot of radiators actually have less surface area than some of the really big air coolers like NH-D15. They are only maybe a couple centimeters thick, the D15 is about the same size as a 140mm cooler but the fins are four times as thick.

On top of that you have pump noise. An expensive custom loop with a nice quiet pump can reduce that significantly, but AIOs in particular are just never going to be quiet because of the pump. The "pump" on an air cooler is evaporation itself - the working fluid vaporizes on the coldplate (and cools it) and then condenses on the cooler part of the heatplate where the fins are cooling it. This is basically a "silent pump".

The best you can say about a radiator is that (a) by putting it on an intake you can guarantee that the air it's sucking is slightly cooler rather than being warmed by the ambient heat of the case, albeit with the downside that you are now pumping hot air over your other components, and (b) you can put the radiator in a more convenient location than sticking straight out of your motherboard.

(you can of course skip fans entirely! check out the HDPlex H5, it is a cool case where the entire thing is a heatsink, it uses heatpipes to move the heat to the chassis and the chassis itself is finned for dissipation. I have no imminent need for it but I lust for it anyway, it's just so damn cool. https://hdplex.com/hdplex-h5-fanless-computer-case.html )

Retric
Your working from an overly simplistic model of cooling. The delta between the fin temperature and CPU temperature makes a huge difference in cooling efficiency. Two sets of fins may be moving the same amount of heat, but the CPU’s are at different temperatures due to differences in plumbing. Also, the way your working fluid flows through a radiator is a big deal, ideally you want the coldest part of the radiator on a return loop rather than heating the fluid as it’s passed to the CPU block.

Now, sure under ideal conditions and stock settings it’s not a big deal. But, in practice things get tricky.

cjbprime
Or for an even prettier one, Streacom DB4!

https://fabiensanglard.net/the_beautiful_machine/index.html

HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.