I’ve had a duplex automatic document fed scanner for close to a decade. Original a FUJISU Snapscan M1500, which Fuijisu ended support for a few years ago.
Friend helped me obtain a Panasonic scanner, same form factor, but newer, faster. Network based.
My original workflow consided of:
- scan documents, which automatically converted to OCR sandwiched PDFs
- Hazel (noodlesoft) would run some heuristic rules over the documents and rename and move the PDF to their appropriate spots. Leaving unhandled ones in a folder for manual processing. Though still leaving some manual tasks to handles.
Since using the panasonic, I lost the luxury of automatic OCR. Abbyfine reader does an okay job, but bulk jobs, it crashes a lot. Spent the money, not particularly fond of it. I’m on MAC, and didn’t want to spin up a windows machine to just do this with the Panasonic included software.
So, I’m in the process of recreating my workflow, in a more streamlined manner, with a combination of Linux and mac software.
- The Panasonic scanner is capable of scanning the documents as multi-image TIFF files, and store them on an SFTP.
- On the SFTP hosting server itself use inotify / fswatch to trigger:
- pdfsandwich
- move a original to “handled”
- move ORCed to “ORCed”
- move failed to ORC to “Failed”
- email me, to notify me of an error.
- Run heuristic rules against the content of the files.
- Search for dates that match any of the patterns
- yyyy-mm-dd / mmm d yyyy,
- “bill”/”invoice”/”printed” date
- etc…
- Look for key IDs
- account #s
- customer #s
- These IDs would be known, and be renamed to their formats, or move to their corresponding folders.
- look for any dollar amounts
- Total due
- $\d+\.\d+ (eg $313.23)
- $\d+ ($123)
- may need to look for negative amounts as well.
- Update any google sheet/CSV to include details of importance.
- For tax purposes.
- cell phone bills, amounts, for
- gas receipts
- restaurant receipts
- etc…
- include scan date, filename, and amounts.
- For tax purposes.
- If no matching heuristics found – email me.
- Search for dates that match any of the patterns
Stage 1, and 2 are complete… stage 3 is more involved… need to work out some sort of pattern, for processing.
Leave a Reply