Could we bank on the Lottery Ticket Hypothesis, distillation, or other model com... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		whymauri on May 29, 2020 \| parent \| context \| favorite \| on: GPT-3: Language Models Are Few-Shot Learners Could we bank on the Lottery Ticket Hypothesis, distillation, or other model compression algorithms to make these models smaller?

aquajet on May 29, 2020 [–]

I would guess so, but compressing it by 1/3rd it's size (ie. distilgpt) would still be quite large. To be fair, I don't know if distillation scales like that.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact