Using Google for OCR

Amit Agarwal has posted a tip on his blog about using Google to convert PDF to text.  For some reason, he suggest putting all your PDFs documents on the web:

Create a folder in your website (say abc.com/pdf) and upload all the PDF images to that folder. Now create a public web page that links to all the PDF files. Wait for the Google bots to spider your stuff.

Once done, type the query “site:abc.com/pdf filetype:pdf” to see the PDF documents as HTML.

Why would you want your documents to be accessible by anyone? Why wait for Google to index your page?

There’s a much easier way I’ve been using, and one of the commentators on Agawal’s blog points it out:

You can upload the Scanned PDFs to Gmail and sent it you only. Then Open your Inbox and the mail sent from you, you have an option to View as HTML. That will solve the Hosting problem.

One thought on “Using Google for OCR

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s