AI
Hidden Winning Tickets in Transformer Attention
Ever heard of the “lottery ticket” idea in AI? It says big neural nets hide small subnetworks that can perform just as well. This paper proves a strong version of that for the heart of Transformers: multi-head attention (MHA). * The big claim: Inside a randomly initialized MHA, there exists a