Author

Brian

Staff Technical Writer

3 posts

Brian educates on using software and tools effectively. He is pragmatic in each of his articles and provides value in terms of work that can actually be done today.

LoRA vs QLoRA vs full fine-tuning compared by VRAM use, quality, and when each method wins

AI & Machine Learning

LoRA vs. QLoRA vs. Full Fine-Tuning: Which Method Should You Use?

Compare LoRA, QLoRA, and full fine-tuning by VRAM, quality, and use case. Learn which LLM fine-tuning method fits your GPU budget.

Brian Jul 6, 2026 15 min read

$GGUF, GPTQ, AWQ, EXL2 quantization formats compared: how model weights, runtime overhead, and KV cache stack up in memory$

AI & Machine Learning

GGUF, GPTQ, AWQ, EXL2: How LLM Quantization Formats Actually Use Memory

Compare GGUF, GPTQ, AWQ, and EXL2 memory use, from Q4_K_M file size to KV cache growth and runtime overhead.

Brian Jul 2, 2026 12 min read

Unified memory explained: discrete GPU memory requires a copy across PCIe between system RAM and VRAM, while unified memory is one shared pool the CPU and GPU both access directly

AI & Machine Learning

What Is Unified Memory, and Why Does It Let a Mini PC Run a 235B Model?

Unified memory lets a compact AI PC load 235B-class models no single 24-32GB GPU can hold. What it is, why it works, and why bigger doesn't mean faster.

Brian Jul 2, 2026 11 min read