Flash Attention
How Flash Attention makes transformer self-attention memory-efficient by restructuring computation to minimize GPU memory reads and writes.
How Flash Attention makes transformer self-attention memory-efficient by restructuring computation to minimize GPU memory reads and writes.