The quest to go paperless – workflow – Work in progess

I’ve had a duplex automatic document fed scanner for close to a decade. Original a FUJISU Snapscan M1500, which Fuijisu ended support for a few years ago.

Friend helped me obtain a Panasonic scanner, same form factor, but newer, faster. Network based.

My original workflow consided of:

  1. scan documents, which automatically converted to OCR sandwiched PDFs
  2. Hazel (noodlesoft) would run some heuristic rules over the documents and rename and move the PDF to their appropriate spots. Leaving unhandled ones in a folder for manual processing. Though still leaving some manual tasks to handles.

Since using the panasonic, I lost the luxury of automatic OCR. Abbyfine reader does an okay job, but bulk jobs, it crashes a lot. Spent the money, not particularly fond of it. I’m on MAC, and didn’t want to spin up a windows machine to just do this with the Panasonic included software.

So, I’m in the process of recreating my workflow, in a more streamlined manner, with a combination of Linux and mac software.

  1. The Panasonic scanner is capable of scanning the documents as multi-image TIFF files, and store them on an SFTP.
  2. On the SFTP hosting server itself use inotify / fswatch to trigger:
    1. pdfsandwich
    2. move a original to “handled”
    3. move ORCed to “ORCed”
    4. move failed to ORC to “Failed”
      1. email me, to notify me of an error.
  3. Run heuristic rules against the content of the files.
    1. Search for dates that match any of the patterns
      1. yyyy-mm-dd / mmm d yyyy,
      2. “bill”/”invoice”/”printed” date
      3. etc…
    2. Look for key IDs
      1. account #s
      2. customer #s
      3. These IDs would be known, and be renamed to their formats, or move to their corresponding folders.
    3. look for any dollar amounts
      1. Total due
      2. $\d+\.\d+ (eg $313.23)
      3. $\d+ ($123)
      4. may need to look for negative amounts as well.
    4. Update any google sheet/CSV to include details of importance.
      1. For tax purposes.
        1. cell phone bills, amounts, for
        2. gas receipts
        3. restaurant receipts
        4. etc…
      2. include scan date, filename, and amounts.
    5. If no matching heuristics found – email me.

Stage 1, and 2 are complete… stage 3 is more involved… need to work out some sort of pattern, for processing.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *