Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Real-time" is a very vague descriptor. I get 7-8 tok/s for 70b model inference on my M1 Mac - that's pretty real-time to me. Even Professor-155b runs "good enough" (~3 tok/s) for what I'd consider real-time chat in English.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: