Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Generative AI and the big buzz about small language models (the-decoder.com)
13 points by milliondreams on March 1, 2024 | hide | past | favorite | 5 comments


As we see these systems evolving, I have come to believe specialist small language models with an MoE framework are the future of the industry.


Does anyone know if this is using the Mamba architecture[1] instead of transformers? It looks like it uses a state space model (SSM) layer.

[1]: https://arxiv.org/abs/2312.00752


We covered state space models in a blog post here - https://blog.dragonscale.ai/state-space-models/

It gives overview of Mamba And StrypedHyna.


It came earlier than Mamba. It uses hyena hierarchy blocks, which are considered SSM but not the same as Mamba.


Piece with less detail than the source linked from the article: https://www.together.ai/blog/stripedhyena-7b




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: