remotely related, but I have yet to find a solution for page classification in a document for tables, i.e. a classifier that returns the index of pages containing tables in a document that is reliable
solutions using things like img2table or pymupdf are really bad (pymupdf is not even reliable for text pdfs)
solutions using things like img2table or pymupdf are really bad (pymupdf is not even reliable for text pdfs)