Obviously, not the best plot to use according to Data Visualization theory and common practice, but I think it candidly conveys the point anyway.
As someone else points, the data is the worrying aspect, as it points towards state-of-the-art models not being able of making more than 0 consecutive steps without errors.
I was just thinking "these guys will talk about this graph for the rest of their lives", it's the best graph you could ever hope to put into a paper. Loved it.
In case you want to know what’s going on in the left side of that chart, they gave a log scale in appendix a. I was thinking it was silly to not just use that version on the top, but I guess log scales make big differences ’feel’ smaller.
A log scale is actually appropriate in this context from a first-principles perspective. Per scaling laws (and also general behavior of epsilon-probability of failure multiplied N times), you would generally expect more vs. less effective techniques to have multiplicatively greater or fewer steps until failure, not additively greater/fewer. Figure 1 is comical, but the appendix figure is the more scientifically appropriate one.