Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is there anything on Linux not based on tesseract for the OCR?

It's not very good. I miss being able to copy/paste from blurry or deformed screenshots of youtube on Windows.



Tesseract might be "not very good" but it is still state-of-the-art, often available, with many languages supported.

The special sauce - what you need to get a better result - is good, adaptive thresholding (something more advanced that raw naive binary thresholding you get feeding naive color/grayscale images to OCR).

As far as I know, once you get that nailed it doesn't matter that much what OCR you use - as long as it's available and supports your target language.


As others mentioned, Tesseract is SOTA in FOSS OCR. It also still is being developed, improving slow but constantly.

The main issue for a use-case like NormCap are the trained models: they are optimized for images of _printed_ text and layouts, which is different from on-screen-text in many aspects. Unfortunately, I don't have the resources to train my own models.

Cuneiform was a long time competitor, but afaik development there is stalled.


Is there any development on Tesseract, or at least on updating the trained models out there? Just curious.


I was just using tesseract.js and the repo looks active. Tesseract is still crap, but it's the free crap, so I'll just put up with it. Grayscale seems to improve the OCR. I'm sure there are tons of other techniques to improve the result


I can't find anything backing this up at the moment but I was under the impression that Google had been upstreaming some development to the project. Open Sans recognition in particular got noticeably more reliable sometime in the last few years.


Why not use a proprietary OCR tool like mathpix.com?


Keras-ocr




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: