AI Optimization
SIMD, GPU architecture, and quantization — making AI fast on real hardware.
- SIMD & Vectorization SSE, AVX-512, NEON — processing multiple data elements per instruction for ML and HPC workloads.
- GPU Architecture Warps, SMs, memory hierarchy — how GPUs achieve massive parallelism for AI workloads.
- Quantization for Inference INT8, FP8, GPTQ, AWQ — shrinking model weights for faster and cheaper inference.