AI Performance Design Guide

Dmitry Volkov
@the-dsvolk the.dsvolk@aiperf.run in/dsvolk
© 2025

A comprehensive collection of system design resources, architectural patterns, and implementation guides.

Performance

Performance analysis, optimization techniques, profiling tools, and fundamental performance principles governing distributed systems.

Model Training and Inference

Machine learning model training workflows, memory management, and inference optimization techniques.

Model Optimization

Quantization, sparsity, and optimization techniques for accelerating LLM and diffusion model inference with NVIDIA TensorRT-LLM.

AI Infrastructure

High-performance computing, GPU architectures, and ML system infrastructure.

CUDA Programming

CUDA programming concepts, memory hierarchy, execution models, and development resources.

PyTorch

PyTorch framework optimization, memory management, and performance tuning for training and inference.

Profiling

GPU profiling tools, trace capture techniques, and performance analysis workflows for containerized ML workloads.