AI & Machine Learning
GGUF, GPTQ, AWQ, EXL2: How LLM Quantization Formats Actually Use Memory
Compare GGUF, GPTQ, AWQ, and EXL2 memory use, from Q4_K_M file size to KV cache growth and runtime overhead.
Brian 12 min read
Pick a country to see Cloudzy in your language.
Author
Staff Technical Writer
2 posts
Brian educates on using software and tools effectively. He is pragmatic in each of his articles and provides value in terms of work that can actually be done today.
Compare GGUF, GPTQ, AWQ, and EXL2 memory use, from Q4_K_M file size to KV cache growth and runtime overhead.
Unified memory lets a compact AI PC load 235B-class models no single 24-32GB GPU can hold. What it is, why it works, and why bigger doesn't mean faster.