Javascript OCR Pdf-to-text


PDF-to-Text is an OCR, Pure Javascript by tesseract.js api, mobile-ready that convert PDF text-image to text.


PDF-to-Text uses a number of open source projects to work properly:

  • [JavaScript] – awesome!
  • [HTML] – HTML enhanced for web apps!
  • [CSS] – Fence!
  • [Magic] – that”s nice!


PDF-to-Text requires Node.js v4+ or any server enviroment to run.

Start the server.

$ npm install http-server -g
$ cd pdf-to-text-master
$ http-server

Secret of Google Web-Based OCR Service

Introduction to Optical Character Recognition

How to OCR Documents for Free in Google Drive

Google Drive makes it painless to go paperless. Its collaborative documents, spreadsheets, and presentations already help curtail paper usage, but its OCR feature helps curb the paper mess even more.

OCR, or Optical Character Recognition, is the most important tech to help you go paperless. Scanned documents on their own are only glorified pictures of your documents, but let your computer recognize the text and they instantly become a ton more useful. We’ve already looked at how to OCR documents in Adobe Acrobat:

We’ve looked at turning PDF files into documents you can edit in Word as well:

Now if you don’t have a copy of Acrobat or Word, there’s an even better option: Google Drive. It includes a little-known free OCR tool that is a powerful, easy to use image to text converter.

In this tutorial, we’ll look at what is Google Drive’s OCR process and simple steps to begin working with it. I’ll show you how to use Google Drive to quickly convert your scanned images and PDF documents into editable text files online.