๐Ÿงฎ

How Much RAM Do You Need for Local LLMs?

Complete guide to RAM requirements for running LLMs locally. How much system RAM and GPU VRAM you need for 7B, 13B, 30B, and 70B models with different quantizations.

Last updated: February 7, 2026

๐ŸŽฏ Why This Matters

RAM is the gatekeeper of local AI. If your model doesn't fit in memory (GPU VRAM or system RAM), it won't run โ€” or it'll be painfully slow due to disk swapping. Understanding RAM requirements saves you from buying hardware that can't handle your target models. The rule of thumb: GPU VRAM for speed, system RAM for flexibility.

๐Ÿ† Our Recommendations

Tested and ranked by real-world AI performance

๐Ÿ’š Budget

32GB DDR5-5600 Kit (2x16GB)

$79
Specs2x16GB DDR5-5600 CL36, dual-channel, Intel XMP / AMD EXPO
PerformanceEnables 7B-13B CPU inference at 8-14 tok/s
Best For7B-13B models, GPU offloading, general use

โœ… Pros

  • Cheapest useful amount for LLMs
  • Handles 7B models fully in RAM
  • 13B Q4 fits with room to spare
  • Good for GPU + CPU split inference

โŒ Cons

  • Can't run 30B+ models
  • No room for 13B at higher quantization
  • May need upgrade if you get serious about AI
Check Price on Amazon โ†’
๐Ÿ’™ Mid-Range

64GB DDR5-5600 Kit (2x32GB)

$159
Specs2x32GB DDR5-5600 CL36, dual-channel
PerformanceEnables 30B CPU inference at 5-8 tok/s, 13B at 10-14 tok/s
Best For30B models on CPU, comfortable 13B usage, future-proofing

โœ… Pros

  • Fits 30B Q4 models in RAM
  • Comfortable headroom for 13B at any quantization
  • Good sweet spot for enthusiasts
  • DDR5 bandwidth helps CPU inference

โŒ Cons

  • 30B CPU inference is still slow (5-8 tok/s)
  • Overkill if you have a 16GB+ GPU
  • Slightly more expensive per GB than 32GB kits
Check Price on Amazon โ†’
๐Ÿ’œ High-End

128GB DDR5-5600 Kit (2x64GB or 4x32GB)

$329
Specs128GB DDR5-5600, quad or dual-channel depending on config
PerformanceEnables 70B CPU inference at 3-5 tok/s, 30B at 7-10 tok/s
Best For70B models on CPU, AI server builds, maximum flexibility

โœ… Pros

  • Can run ANY open-source model
  • 70B Q4 fits with room to spare
  • Great for AI server builds
  • Future-proof for years

โŒ Cons

  • $329 is significant investment
  • 70B on CPU is slow (3-5 tok/s)
  • Need motherboard with 4 DIMM slots for some configs
  • Overkill unless you need 70B
Check Price on Amazon โ†’

๐Ÿ’ก Prices may vary. Links may earn us a commission at no extra cost to you. We only recommend products we'd actually use.

๐Ÿค– Compatible Models

Models you can run with this hardware

โ“ Frequently Asked Questions

GPU VRAM vs System RAM โ€” which matters more?

GPU VRAM is 10-20x faster for inference. If your model fits in VRAM, use that. System RAM is the fallback for CPU inference or partial GPU offloading. Ideally, have enough VRAM for your model AND enough system RAM as a buffer (at least model size + 4-8GB for OS).

How much RAM does each model size need?

At Q4 quantization: 7B โ†’ 6GB, 13B โ†’ 10GB, 30B โ†’ 20GB, 70B โ†’ 40GB. At FP16 (full precision): roughly double those numbers. Always add 2-4GB overhead for the inference engine and KV cache.

Is DDR5 worth it over DDR4 for AI?

Yes, for CPU inference. DDR5-5600 gives ~50% more bandwidth than DDR4-3200, which directly translates to faster token generation on CPU. If building new, always go DDR5. If upgrading existing DDR4, the difference isn't worth a platform change unless you're also upgrading CPU.

Can I mix RAM sizes?

Technically yes, but it may disable dual-channel mode, cutting bandwidth in half. For AI workloads where bandwidth matters, always use matched pairs (2x16GB, 2x32GB, etc.).

Ready to build your AI setup?

Pick your hardware, install Ollama, and start running models in minutes.