23 November 2009

Free Online OCR

I have been a fan of optical character recognition since the early days of the technology. I remember about 15 years ago using a hand-held scanner to try to digitize my files of journal articles and magazine clipppings. The process required a lot of "cleanup" and ultimately proved to be not worth the time.

So it was with some interest that I saw a notice for "free online OCR" - Whether you have a scanned document or a photo, NewOCR.com can analyze the text in any image file that you upload, and then convert the text from the image into text that you can easily edit on your computer. I bookmarked the site and used it several days later when I encountered the story above in a pdf file. I took a screen shot of the page and uploaded the image to the free online OCR site. This was how the image was rendered...

Click to enlarge both for comparison, but the rendering needs work, shall we say. I suppose you get what you pay for.

In all fairness there are certain fonts that are intrinsically hard for OCR to interpret, and the test image had poor black/white contrast. It likely will do better with other challenges.


  1. Although I'm no big fan of Microsoft Office, its OneNote program has a pretty good OCR feature. I guess that's not free, or online, but I've had good results scanning faxes and other typed and handwritten materials, so I thought I'd pass it on.

  2. I have a scanner with software that does this (HP printer/scanner) -- it works a LOT better than whatever that online thing is, but it still isn't the best accuracy and doesn't do well with certain characters. If I remember correctly, it sends the text to a separate .txt file.

    If you have a print-out of a Word document and lost the file, and it's all written in Times New Roman, that feature is GREAT!!

    I've been reading your blog for a few months now. :) I especially loved the new post about the victims of acid attacks -- very touching and sad...


Related Posts Plugin for WordPress, Blogger...