GPU-Optimization

1 article
Flash Attention How Flash Attention makes transformer self-attention memory-efficient by restructuring computation to minimize …