Cluster IIThe Core
Local computation and virtualization run-books — hypervisor operations, OS isolation, and local language models.
Rig Progression Specs
Workshop Primary — Black & White Build
| Subsystem | Component | Notes |
|---|---|---|
| CPU | Ryzen 9 7950X3D | PBO -25 mV curve, 95°C limit |
| GPU | RTX 4090 24 GB | Reserved for LLM offload |
| RAM | 96 GB DDR5-6000 CL30 | 2x48 GB — EXPO profile 1 |
| Storage (OS) | 2 TB NVMe Gen4 | Boot, hot models |
| Storage (Cold) | 8 TB SATA | GGUF archive, image sets |
| PSU | 1200 W Platinum | Single-rail, fully modular |
Local LLM Runbooks
Quantization Matrix
Practical VRAM footprints for common model sizes against quantization format. Field-validated; figures are approximate.
| Model | Format | VRAM | Context |
|---|---|---|---|
| Llama-3 8B | GGUF Q5_K_M | ~6.5 GB | 8 K |
| Llama-3 70B | GGUF Q4_K_M | ~42 GB | 8 K |
| Qwen2 14B | EXL2 5.0bpw | ~10 GB | 32 K |
| Mixtral 8x7B | GGUF Q4_K_M | ~26 GB | 32 K |
llama.cpp — Headless Server Launch
# Run a quantized model with extended context on local GPU
./llama-server \
--model ./models/Meta-Llama-3-70B-Instruct.Q4_K_M.gguf \
--ctx-size 8192 \
--n-gpu-layers 81 \
--threads 16 \
--port 8080 \
--host 127.0.0.1 \
--metrics
Hypervisor Notes
Workshop hypervisor runs Proxmox VE on the bench rig. GPU passthrough is reserved for the inference VM; build VMs use virtual SR-IOV.
# Proxmox - pass NVIDIA GPU to LLM VM
echo "options vfio-pci ids=10de:2684,10de:22ba" > /etc/modprobe.d/vfio.conf
update-initramfs -u
qm set 101 -hostpci0 01:00,pcie=1,x-vga=1