I had a similar problem previously and considered automating it by comparing frames. It's not easy and would have obvious false positives/negatives, but YouTube does help by publishing key frames and it could at least be used to help rank results.
I commented on your parent post before reading this...
Nice catch on the thumbnails that YouTube already captures. A histogram comparison between the second and third auto-generated thumbnails from the lyrics video was mostly equivalent when I ran one. That would be a good sign that it's not the actual music video.
Take this example music video:
https://www.youtube.com/watch?v=6vopR3ys8Kw
Frames:
https://img.youtube.com/vi/6vopR3ys8Kw/0.jpg
https://img.youtube.com/vi/6vopR3ys8Kw/1.jpg
https://img.youtube.com/vi/6vopR3ys8Kw/2.jpg
https://img.youtube.com/vi/6vopR3ys8Kw/3.jpg
And this lyrics video version (more interesting because it's SOMEWHAT changing):
https://www.youtube.com/watch?v=FoAqHxm5dpo
Frames:
https://img.youtube.com/vi/FoAqHxm5dpo/0.jpg
https://img.youtube.com/vi/FoAqHxm5dpo/1.jpg
https://img.youtube.com/vi/FoAqHxm5dpo/2.jpg
https://img.youtube.com/vi/FoAqHxm5dpo/3.jpg
Simply finding the differences between frames would give the first video a higher score than the second.