Due to the small question bank, it's very easy for a model to go from 0% to 100% in some category between model versions just by flipping their answer to 1 or 2 questions, especially if they refuse to answer yes/no to one or more questions in that category.
It's hard to take away much from this without a large, diverse question bank.