![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhBEQYtXlxkUzl7rIXZ22EK-FbdggnsIBrt0oPnantZT63WdRUHWVpSFeCNQ_bsy_WoNy-0Pr-jIth3MIO6NGYsPJfvSAswex4AYwQZm_pfdjgetbNaHbRSzrNNpwWftuKFjgrrucqweiGq/s400/before+free+OCR.jpg)
I have been a fan of optical character recognition since the early days of the technology. I remember about 15 years ago using a hand-held scanner to try to digitize my files of journal articles and magazine clipppings. The process required a lot of "cleanup" and ultimately proved to be not worth the time.
So it was with some interest that I saw a notice for "free online OCR" - Whether you have a scanned document or a photo, NewOCR.com can analyze the text in any image file that you upload, and then convert the text from the image into text that you can easily edit on your computer. I bookmarked the site and used it several days later when I encountered the story above in a pdf file. I took a screen shot of the page and uploaded the image to the free online OCR site. This was how the image was rendered...
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihdhb-XMYGWzLze3bATBkZevJwluxSmN0D_B6WHHUFuT4e4-M5qfZYydCT1Su4gMpFpkwNR9YPHhRszuJ5-1K3t5FZUpCv72U_Ug1FWupJw-f4KmbSL14i7_-7h2b8kXhwDibMs3T24bz0/s400/after+free+OCR.jpg)
Click to enlarge both for comparison, but the rendering needs work, shall we say. I suppose you get what you pay for.
In all fairness there are certain fonts that are intrinsically hard for OCR to interpret, and the test image had poor black/white contrast. It likely will do better with other challenges.
Although I'm no big fan of Microsoft Office, its OneNote program has a pretty good OCR feature. I guess that's not free, or online, but I've had good results scanning faxes and other typed and handwritten materials, so I thought I'd pass it on.
ReplyDeleteI have a scanner with software that does this (HP printer/scanner) -- it works a LOT better than whatever that online thing is, but it still isn't the best accuracy and doesn't do well with certain characters. If I remember correctly, it sends the text to a separate .txt file.
ReplyDeleteIf you have a print-out of a Word document and lost the file, and it's all written in Times New Roman, that feature is GREAT!!
I've been reading your blog for a few months now. :) I especially loved the new post about the victims of acid attacks -- very touching and sad...