🧮

How Much RAM Do You Need for Local LLMs?

Complete guide to RAM requirements for running LLMs locally. How much system RAM and GPU VRAM you need for 7B, 13B, 30B, and 70B models with different quantizations.

Last updated: February 7, 2026

🎯 Why This Matters

RAM is the gatekeeper of local AI. If your model doesn't fit in memory (GPU VRAM or system RAM), it won't run — or it'll be painfully slow due to disk swapping. Understanding RAM requirements saves you from buying hardware that can't handle your target models. The rule of thumb: GPU VRAM for speed, system RAM for flexibility.

🏆 Our Recommendations

Tested and ranked by real-world AI performance

💚 Budget

32GB DDR5-5600 Kit (2x16GB)

$79

Specs2x16GB DDR5-5600 CL36, dual-channel, Intel XMP / AMD EXPO

PerformanceEnables 7B-13B CPU inference at 8-14 tok/s

Best For7B-13B models, GPU offloading, general use

✅ Pros

Cheapest useful amount for LLMs
Handles 7B models fully in RAM
13B Q4 fits with room to spare
Good for GPU + CPU split inference

❌ Cons

Can't run 30B+ models
No room for 13B at higher quantization
May need upgrade if you get serious about AI

Check Price on Amazon →

💙 Mid-Range

64GB DDR5-5600 Kit (2x32GB)

$159

Specs2x32GB DDR5-5600 CL36, dual-channel

PerformanceEnables 30B CPU inference at 5-8 tok/s, 13B at 10-14 tok/s

Best For30B models on CPU, comfortable 13B usage, future-proofing

✅ Pros

Fits 30B Q4 models in RAM
Comfortable headroom for 13B at any quantization
Good sweet spot for enthusiasts
DDR5 bandwidth helps CPU inference

❌ Cons

30B CPU inference is still slow (5-8 tok/s)
Overkill if you have a 16GB+ GPU
Slightly more expensive per GB than 32GB kits

Check Price on Amazon →

💜 High-End

128GB DDR5-5600 Kit (2x64GB or 4x32GB)

$329

Specs128GB DDR5-5600, quad or dual-channel depending on config

PerformanceEnables 70B CPU inference at 3-5 tok/s, 30B at 7-10 tok/s

Best For70B models on CPU, AI server builds, maximum flexibility

✅ Pros

Can run ANY open-source model
70B Q4 fits with room to spare
Great for AI server builds
Future-proof for years

❌ Cons

$329 is significant investment
70B on CPU is slow (3-5 tok/s)
Need motherboard with 4 DIMM slots for some configs
Overkill unless you need 70B

Check Price on Amazon →

💡 Prices may vary. Links may earn us a commission at no extra cost to you. We only recommend products we'd actually use.

🤖 Compatible Models

Models you can run with this hardware

DeepSeek R1 14B

14B

10 GB min VRAM · DeepSeek

DeepSeek R1 32B

32B

20 GB min VRAM · DeepSeek

DeepSeek R1 70B

70B

40 GB min VRAM · DeepSeek

Gemma 2 27B

27B

18 GB min VRAM · Google

DeepSeek R1 7B

6 GB min VRAM · DeepSeek

Mistral 7B

6 GB min VRAM · Mistral AI

Llama 3.3 70B

70B

40 GB min VRAM · Meta

Phi-4

14B

10 GB min VRAM · Microsoft

Qwen 2.5 14B

14B

10 GB min VRAM · Alibaba

Qwen 2.5 72B

72B

44 GB min VRAM · Alibaba

❓ Frequently Asked Questions

GPU VRAM vs System RAM — which matters more?

GPU VRAM is 10-20x faster for inference. If your model fits in VRAM, use that. System RAM is the fallback for CPU inference or partial GPU offloading. Ideally, have enough VRAM for your model AND enough system RAM as a buffer (at least model size + 4-8GB for OS).

How much RAM does each model size need?

At Q4 quantization: 7B → 6GB, 13B → 10GB, 30B → 20GB, 70B → 40GB. At FP16 (full precision): roughly double those numbers. Always add 2-4GB overhead for the inference engine and KV cache.

Is DDR5 worth it over DDR4 for AI?

Yes, for CPU inference. DDR5-5600 gives ~50% more bandwidth than DDR4-3200, which directly translates to faster token generation on CPU. If building new, always go DDR5. If upgrading existing DDR4, the difference isn't worth a platform change unless you're also upgrading CPU.

Can I mix RAM sizes?

Technically yes, but it may disable dual-channel mode, cutting bandwidth in half. For AI workloads where bandwidth matters, always use matched pairs (2x16GB, 2x32GB, etc.).

Ready to build your AI setup?

Pick your hardware, install Ollama, and start running models in minutes.

Browse Models → More Hardware Guides