Monday, November 12, 2007

Recognize my Optical Character, dammit!

Every now & then, we're faced with a day of grunt work. Today, mine is OCRing a stack of e-docs, double quick, to make them searchable within my online document repositories.

A joy of OCRing using Abobe Acrobat 7 is revealed when the documents contain 'renderable text'. This type of text prevents the OCRing process from completing. The quickest solution I've discovered thus far, but one which is highly labour intensive, is to TIFF each document to strip out the problem coding, then re-assembling the doc into .PDF for OCRing. It's dirty and time consuming, but it works. At least I can Batch OCR once all the files are reassembled.

Now, I have 53 contracts of 50 pages each waiting to be processed in the afore-mentioned manner. Time to hire a tech.

A resource I've found useful with this matter is Acrobat for Legal Professionals / Troubleshooting Acrobat OCR. I will retain the homepage link on my sidebar for future interest.

Life as a librarian: "Bringing easy access to you!"

1 comment:

Anonymous said...

Hey, I just want to drop by and say thanks for your tips. It really helped a lot. I was having a lot of trouble looking for the "renderable text" solution. I came into your page 2 days after you posted the solution. So lucky you're around to help the uninformed out.

Peter Ho Tak Chan
Vancouver, BC, Canada