Hello, I’m starting a new course and the materials are all in PDF viewable only, for comody sake i use it a lot for online services to convert image to text, even ChatGpt 4 does it, does somebody knows some king of self hosted ocr converter? To convert screenshots into text?

Tnx

  • DaHunni@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    paperless-ngx has built in ocr but I don’t think it would fit your needs

    • t1nk3rz@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 months ago

      Didn’t know that,i use flameshot for screenshots,i will take a look thnx

  • BadGroundbreaking243@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    You could spin up paperless-ngx. Or use pdf24 creator. Beware paperless consume will delete the file.

    I used paperless-ngx before and it works pretty good.

    • t1nk3rz@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 months ago

      I will check it up, i have Stirlingpdf and I see it also has ocr support

  • henry_tennenbaum@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    I’m not sure I understand you correctly. Do you want to apply OCR to PDFs or to Screenshots?

    For PDFs there’s the excellent ocrmypdf which paperless-ngx uses under the hood.

  • lilolalu@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    Nextcloud AIO (all-in-one) comes with full text search installed, which brings tesseract to nextcloud. so you can let tesseract-ocr run over all documents and then they will be searchable with Elasticsearch.