Cluster IIThe Core

Local computation and virtualization run-books — hypervisor operations, OS isolation, and local language models.

Rig Progression Specs

Workshop Primary — Black & White Build

Last revision: 2026-03-02

Subsystem	Component	Notes
CPU	`Ryzen 9 7950X3D`	PBO -25 mV curve, 95°C limit
GPU	`RTX 4090 24 GB`	Reserved for LLM offload
RAM	`96 GB DDR5-6000 CL30`	2x48 GB — EXPO profile 1
Storage (OS)	`2 TB NVMe Gen4`	Boot, hot models
Storage (Cold)	`8 TB SATA`	GGUF archive, image sets
PSU	`1200 W Platinum`	Single-rail, fully modular

Local LLM Runbooks

Quantization Matrix

Practical VRAM footprints for common model sizes against quantization format. Field-validated; figures are approximate.

Model	Format	VRAM	Context
Llama-3 8B	`GGUF Q5_K_M`	~6.5 GB	8 K
Llama-3 70B	`GGUF Q4_K_M`	~42 GB	8 K
Qwen2 14B	`EXL2 5.0bpw`	~10 GB	32 K
Mixtral 8x7B	`GGUF Q4_K_M`	~26 GB	32 K

llama.cpp — Headless Server Launch

# Run a quantized model with extended context on local GPU
./llama-server \
  --model       ./models/Meta-Llama-3-70B-Instruct.Q4_K_M.gguf \
  --ctx-size    8192 \
  --n-gpu-layers 81 \
  --threads     16 \
  --port        8080 \
  --host        127.0.0.1 \
  --metrics

Hypervisor Notes

Workshop hypervisor runs Proxmox VE on the bench rig. GPU passthrough is reserved for the inference VM; build VMs use virtual SR-IOV.

# Proxmox - pass NVIDIA GPU to LLM VM
echo "options vfio-pci ids=10de:2684,10de:22ba" > /etc/modprobe.d/vfio.conf
update-initramfs -u
qm set 101 -hostpci0 01:00,pcie=1,x-vga=1