Cluster IIThe Core

Local computation and virtualization run-books — hypervisor operations, OS isolation, and local language models.

Rig Progression Specs

Workshop Primary — Black & White Build

Last revision: 2026-03-02

SubsystemComponentNotes
CPURyzen 9 7950X3DPBO -25 mV curve, 95°C limit
GPURTX 4090 24 GBReserved for LLM offload
RAM96 GB DDR5-6000 CL302x48 GB — EXPO profile 1
Storage (OS)2 TB NVMe Gen4Boot, hot models
Storage (Cold)8 TB SATAGGUF archive, image sets
PSU1200 W PlatinumSingle-rail, fully modular

Local LLM Runbooks

Quantization Matrix

Practical VRAM footprints for common model sizes against quantization format. Field-validated; figures are approximate.

ModelFormatVRAMContext
Llama-3 8BGGUF Q5_K_M~6.5 GB8 K
Llama-3 70BGGUF Q4_K_M~42 GB8 K
Qwen2 14BEXL2 5.0bpw~10 GB32 K
Mixtral 8x7BGGUF Q4_K_M~26 GB32 K

llama.cpp — Headless Server Launch

# Run a quantized model with extended context on local GPU
./llama-server \
  --model       ./models/Meta-Llama-3-70B-Instruct.Q4_K_M.gguf \
  --ctx-size    8192 \
  --n-gpu-layers 81 \
  --threads     16 \
  --port        8080 \
  --host        127.0.0.1 \
  --metrics

Hypervisor Notes

Workshop hypervisor runs Proxmox VE on the bench rig. GPU passthrough is reserved for the inference VM; build VMs use virtual SR-IOV.

# Proxmox - pass NVIDIA GPU to LLM VM
echo "options vfio-pci ids=10de:2684,10de:22ba" > /etc/modprobe.d/vfio.conf
update-initramfs -u
qm set 101 -hostpci0 01:00,pcie=1,x-vga=1