Yes, it's worse for screenreaders, I listed that next to other drawbacks which I acknowledged. I don't intend to apply this method anywhere else due to these drawbacks, because accessibility matters.
It's a proof of concept, and maybe a starting point for somebody else who wants to tackle this problem.
Can LLMs detect and decode the text? Yes, but I'd wager for the case that data cleaning doesn't happen to the extent that it decodes the text after scraping.
It's a proof of concept, and maybe a starting point for somebody else who wants to tackle this problem.
Can LLMs detect and decode the text? Yes, but I'd wager for the case that data cleaning doesn't happen to the extent that it decodes the text after scraping.