Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Could we bank on the Lottery Ticket Hypothesis, distillation, or other model compression algorithms to make these models smaller?


I would guess so, but compressing it by 1/3rd it's size (ie. distilgpt) would still be quite large. To be fair, I don't know if distillation scales like that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: