It once again completely fails on an extremely simple test: look at a screenshot of sheet music, and tell me what the notes are. Producing a MIDI file for it (unsurprisingly) was far beyond its capabilities.
Interpreting sheet music images is very complex, and I’m not surprised general-purpose LLMs totally fail at it. It’s orders of magnitude harder than text OCR, due to the two-dimensional-ness.
https://chatgpt.com/share/68954c9e-2f70-8000-99b9-b4abd69d1a...
This is not anywhere remotely close to general intelligence.