How do you encode your paper scans?

Atemu@lemmy.ml · 1 year ago

How do you encode your paper scans?

Saigonauticon@voltage.vn · 1 year ago

I use JPEGs in a PDF. They can be mediocre quality. Using an OK scanner makes a big difference. It’s good enough!

I’m required by law to keep physical paper copies for 35 years. So my parallel solution is a cursed filing cabinet, and several crates that describe the content of the filing cabinet. Its ugly, but saves me tons on data archiving, I guess?

kyle@infosec.pub · edit-2 1 year ago

I’ve never used paperless but just checked it out and it looks pretty neat. My first thought would be to scan documents in a higher resolution, let the OCR happen, then convert the file to a JPEG or something smaller after you’ve extracted the text.

I spent a few minutes looking at their wiki and it looks like it might be possible.

Like I said though, no experience with this software so I’m not sure that’d actually work.

Atemu@lemmy.ml · 1 year ago

Interesting idea but I think I’d like to retain similar to original quality in case I wanted to redo OCR if/when Paperless’ OCR improves in the future.

surewhynotlem@lemmy.world · 9 months ago

By ‘paperless’, y’all mean this one? https://docs.paperless-ngx.com/

Atemu@lemmy.ml · 9 months ago

Correct. That’s the currently maintained paperless project.

surewhynotlem@lemmy.world · 9 months ago

Thanks! There’s a very interesting trail of dead projects to follow. But I got ngx working and it’s great so far.

Atemu@lemmy.ml · 9 months ago

I for one am still waiting for paperless-ngnxn2-next-3.0_hypr.

lemming007@lemm.ee · 1 year ago

PDF/A

Atemu@lemmy.ml · 1 year ago

And how do you encode the images of the scan contained in the PDF/A? That’s the crux here.

lemming007@lemm.ee · 1 year ago

I’m not sure I understand. I just scan anything and let my software spit out PDF/A

Atemu@lemmy.ml · 1 year ago

PDF/A is not an image format. As a document, it may contain images.

lemming007@lemm.ee · edit-2 1 year ago

My PDF/A documents contain all kinds of content, including text and images. To me, it doesn’t matter what format the encoded images are, as long as I can open them 20 years from now. Why would one care one way or another?

Atemu@lemmy.ml · 1 year ago

I care that the text remains readable (both to me and also software) and that I don’t balloon my storage out of control.

JPEG (even at higher levels) subjectively degrades text in particular to a degree that I worry about the former and PNG makes me worry about the latter.

My current plan is to go with the latter since storage is a relatively cheap issue to fix while data loss is pretty much permanent.