Hacker Newsnew | past | comments | ask | show | jobs | submit | toxy's commentslogin

GPT included a picture of the variation of the transformer model that they made.

GPT2 outlined the changes they made to the model in an acceptably moderate detail.

GPT3 references another paper saying "we use alternating dense and locally banded sparse attention patterns in the layers of the transformer, similar to the Sparse Transformer" with no detail added on the changes they made.

How are you to reproduce these results at all? You could attempt to include the changes as they references the sparse transformer paper, but you could possibly do it in a different way, and there would be no way to verify the results that they gave whatsoever due to changes in implementation.

A bit disappointing.


The full model of GPT-2 is available for inspection and retraining, if you so desire. GPT-3 will likely be released soon as well.


Likely, but in a released paper, there should be a bit more quality from a research standpoint.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: