This is an automated archive.

The original was posted on /r/mysql by /u/nacquatella on 2023-08-05 17:44:34+00:00.


Hello there!

I’m a Project Manager, actually performing a project post-mortem looking over +6000 emails, and hundreds of hard copies, scanned documents into PDF.

Email information lookup is easy, as them are managed and integrated in Outlook/Exchange Server and the search queries are mostly successful when looking for information, files, etc.

However, when looking at the document’s binders, hard copies of communications, most of them scanned, it’s a tedious work to look for information, as the only guidance here is the document reference id, usually the same name given to the stored file.

I kindly ask for any of you for a maybe already known/built solution/platform that allows the upload of an scanned file, let’s say a two page pdf, extract/OCR the text content: date, reference title, text body, etc. and store such information in a searchable database - that in order for example to look up information give a few keywords within the database, a date period, and obtain a few results of documents that contain such information.

I can only image there should be an online platform or software application already build on this matter.

Thanks for your comments in advance.

Neil