How Much RAM Do You Need for Local LLMs?
Complete guide to RAM requirements for running LLMs locally. How much system RAM and GPU VRAM you need for 7B, 13B, 30B, and 70B models with different quantizations.
Last updated: February 7, 2026
๐ฏ Why This Matters
RAM is the gatekeeper of local AI. If your model doesn't fit in memory (GPU VRAM or system RAM), it won't run โ or it'll be painfully slow due to disk swapping. Understanding RAM requirements saves you from buying hardware that can't handle your target models. The rule of thumb: GPU VRAM for speed, system RAM for flexibility.
๐ Our Recommendations
Tested and ranked by real-world AI performance
32GB DDR5-5600 Kit (2x16GB)
โ Pros
- Cheapest useful amount for LLMs
- Handles 7B models fully in RAM
- 13B Q4 fits with room to spare
- Good for GPU + CPU split inference
โ Cons
- Can't run 30B+ models
- No room for 13B at higher quantization
- May need upgrade if you get serious about AI
64GB DDR5-5600 Kit (2x32GB)
โ Pros
- Fits 30B Q4 models in RAM
- Comfortable headroom for 13B at any quantization
- Good sweet spot for enthusiasts
- DDR5 bandwidth helps CPU inference
โ Cons
- 30B CPU inference is still slow (5-8 tok/s)
- Overkill if you have a 16GB+ GPU
- Slightly more expensive per GB than 32GB kits
128GB DDR5-5600 Kit (2x64GB or 4x32GB)
โ Pros
- Can run ANY open-source model
- 70B Q4 fits with room to spare
- Great for AI server builds
- Future-proof for years
โ Cons
- $329 is significant investment
- 70B on CPU is slow (3-5 tok/s)
- Need motherboard with 4 DIMM slots for some configs
- Overkill unless you need 70B
๐ก Prices may vary. Links may earn us a commission at no extra cost to you. We only recommend products we'd actually use.
๐ค Compatible Models
Models you can run with this hardware
DeepSeek R1 14B
14B10 GB min VRAM ยท DeepSeek
DeepSeek R1 32B
32B20 GB min VRAM ยท DeepSeek
DeepSeek R1 70B
70B40 GB min VRAM ยท DeepSeek
Gemma 2 27B
27B18 GB min VRAM ยท Google
DeepSeek R1 7B
7B6 GB min VRAM ยท DeepSeek
Mistral 7B
7B6 GB min VRAM ยท Mistral AI
Llama 3.3 70B
70B40 GB min VRAM ยท Meta
Phi-4
14B10 GB min VRAM ยท Microsoft
Qwen 2.5 14B
14B10 GB min VRAM ยท Alibaba
Qwen 2.5 72B
72B44 GB min VRAM ยท Alibaba
โ Frequently Asked Questions
GPU VRAM vs System RAM โ which matters more?
GPU VRAM is 10-20x faster for inference. If your model fits in VRAM, use that. System RAM is the fallback for CPU inference or partial GPU offloading. Ideally, have enough VRAM for your model AND enough system RAM as a buffer (at least model size + 4-8GB for OS).
How much RAM does each model size need?
At Q4 quantization: 7B โ 6GB, 13B โ 10GB, 30B โ 20GB, 70B โ 40GB. At FP16 (full precision): roughly double those numbers. Always add 2-4GB overhead for the inference engine and KV cache.
Is DDR5 worth it over DDR4 for AI?
Yes, for CPU inference. DDR5-5600 gives ~50% more bandwidth than DDR4-3200, which directly translates to faster token generation on CPU. If building new, always go DDR5. If upgrading existing DDR4, the difference isn't worth a platform change unless you're also upgrading CPU.
Can I mix RAM sizes?
Technically yes, but it may disable dual-channel mode, cutting bandwidth in half. For AI workloads where bandwidth matters, always use matched pairs (2x16GB, 2x32GB, etc.).
Ready to build your AI setup?
Pick your hardware, install Ollama, and start running models in minutes.