toxy's comments

toxy · on May 29, 2020

GPT included a picture of the variation of the transformer model that they made.

GPT2 outlined the changes they made to the model in an acceptably moderate detail.

GPT3 references another paper saying "we use alternating dense and locally banded sparse attention patterns in the layers of the transformer, similar to the Sparse Transformer" with no detail added on the changes they made.

How are you to reproduce these results at all? You could attempt to include the changes as they references the sparse transformer paper, but you could possibly do it in a different way, and there would be no way to verify the results that they gave whatsoever due to changes in implementation.

A bit disappointing.

canjobear · on May 29, 2020

The full model of GPT-2 is available for inspection and retraining, if you so desire. GPT-3 will likely be released soon as well.

toxy · on May 29, 2020

Likely, but in a released paper, there should be a bit more quality from a research standpoint.