OCR Test 1: Recognize text in screenshot A New York Times article served as benchmark for OCR recognition quality in tests 1-4. The test results were not what you would expect. Six documents that are gradually more difficult to recognize serve as OCR benchmark: A screenshot, two scans, a mobile phone camera picture and, as highlight, text of an XKCD comic and readings from the image of a gas meter. This market overview is all about finding the best online service to convert images to plain text. Usability, speed, formatting, non-English language support are not rated. No-name OCR beats Google Docs OCR is just one of the surprising test results.
What is the best free optical character recognition (OCR) service to convert (text in) images to plain, editable text? This review compares the recognition accuracy of free and commercial cloud OCR offerings.
Oh, did you download my version of your document? If so I'll be deleting it from my computer's Dropbox.Update May 1, 2015: (a9t9) launched its very own free and open-source Online OCR service - try it out and let us know how it compares. It's not as good as SilverFast but for company software, it's not bad at all. It's worth a check to see if they have updated your scanning software for your scanner.
Oh, I share your issues with Epson on legacy scanners but occasionally they do update things. Unfortunately there is no going back on these operations. If others are supplying the PDFs that you are inserting into the master document, please share with them the issue of JPG degradation. I can check on that if you'd like (no time at this moment).
I think you had to work with contrast which is a bit trickier. If my memory serves me well, I think one of the dynamics that VueScan had that I was not pleased with was that you didn't have the Levels adjustment that I prefer when working with printed documents: it's fast and easy. However, that would be more critical if you were doing a lot of image scanning but since your primary focus is on documents, the nuances that SilverFast provides are not that critical.Īlso, just about ANY software is better than Apple's "Image Capture." I've been using Macs since '85 so I obviously like them but Image Capture is bad on so many accounts I do not wish to waste any time with it. SilverFast is probably one of the best 3rd party scanning software out there but it's not cheap AND it has a bit of a learning curve AND the support documents are not really as good as they should be. It's been a LONG time since I've looked at it as I've used SilverFast for many years. VueScan is good software and for the price its excellent. Sorry for any delay in getting back to you, I was on a bike ride all morning.
Lastly, here is a blog I wrote on how to get clean scanned documents using Acrobat: If you wish to see your file after I processed it, here it is: Dropbox - 2018_09_29-30_LNC_Minutes-approved2.pdf I would appreciate your letting us know what is your scanning-processing approach and we might be able to help you avoid these issues in the future. Most of the above has nothing to do with your original question but perhaps it might lead to some of the issues you may be having. When I scan my documents and save as TIFs, each page is typically around 8-9 MB but after saving them as PDFs and creating searchable documents, the page sizes are around 40-50 kb. FWIW, the size of the document during scanning will have NO BEARING on the size of the final PDF. But this was heavy heavy compression, perhaps set as low as 30%. In a word, generally never use JPG as a format for saving unless you are using zero compression. This degradation will also degrade the quality of the OCR process as it can confuse the optical reading of the text. That is JPG degradation that is caused by the JPG lossy process. If you look at the screenshot below and look at the black on white text, you'll see a lot of gray splotches all over the place. I am curious as to what your process for making this document because things you have on the top of a page (e.g., Appendix F Secretary's Report) are obviously digital text, the text below that is scanned text that is highly JPGed with lots of degradation. It took about 10 minutes but I know have the complete document fully searchable.
So, I opened the document on my Mac and ran Text Recognition from the "Enhanced Scans" tab of tools. Nonetheless, I downloaded the document and verified that yes, not all of the document was converted into searchable text. You do not say how this was scanned, nor using which version of Acrobat on what kind of computer nor with which scanner. First off, thanks for supplying the document, it helped.