Inference-Optimization
All articles
Quantization
How INT8 and INT4 quantization compress neural network models for faster inference and lower memory usage with …Pruning
How structured and unstructured pruning reduce neural network size by removing redundant weights, neurons, or …Knowledge Distillation
How teacher-student training compresses large models into smaller, faster ones while preserving most of the …
Open source projects