remotely related, but I have yet to find a solution for page classification in a... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		VeejayRampay 3 months ago \| parent \| context \| favorite \| on: Unified Line and Paragraph Detection by Graph Conv... remotely related, but I have yet to find a solution for page classification in a document for tables, i.e. a classifier that returns the index of pages containing tables in a document that is reliable solutions using things like img2table or pymupdf are really bad (pymupdf is not even reliable for text pdfs)

djoldman 3 months ago [–]

In my experience, this task is incredibly difficult for generality.

Handcrafting based on the dataset is the only way to get high performance.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact