> And in 2024 we still have no competing NLP architecture.
No, we do. State space models are both faster and scale just as well. E.g., RWKV and Mamba.
> Transformers aren't merely some lucky passengers riding the coattails of compute scale.
Err... they are, though. They were just the type of model the right researchers were already using at the time, probably for translating between natural languages.
No, we do. State space models are both faster and scale just as well. E.g., RWKV and Mamba.
> Transformers aren't merely some lucky passengers riding the coattails of compute scale.
Err... they are, though. They were just the type of model the right researchers were already using at the time, probably for translating between natural languages.