Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
whymauri
on May 29, 2020
|
parent
|
context
|
favorite
| on:
GPT-3: Language Models Are Few-Shot Learners
Could we bank on the Lottery Ticket Hypothesis, distillation, or other model compression algorithms to make these models smaller?
aquajet
on May 29, 2020
[–]
I would guess so, but compressing it by 1/3rd it's size (ie. distilgpt) would still be quite large. To be fair, I don't know if distillation scales like that.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: