Powerful image optical character recognition (OCR) for over 20 languages and with machine-readable-zone support. Perfect for receipt and invoice scanning as well as general image-based text extraction.
Optical Character Recognition (OCR) is one of the way to connect reality world and virtual word. First OCR system is introduced in late 1920s. The objective of OCR is recognising text from image. However, it is very challenge to achieve a very high accuracy due to lots of factors. In the following story, I will introduce how Google build solution which is one of the Google Cloud Vision API to tackle this problem.
Talking about OCR, tesseract is one of the famous open source library that everyone can leverage it to execute OCR. Tesseract is found by HP and development has been sponsored by Google since 2006. Tesseract 3.x model is old version while 4.x version is built by deep learning (LSTM). If you want to understand difference between 3.x and 4.x, you can visit sharing for more detail.
As tesseract is implemented by C++, we cannot invoke it as other python library. Indeed, we can invoke C-API in python but it is not quite user friendly. Therefore, python wrapper, pytesseract, is introduced to make our life easier.
Google Drive makes it painless to go paperless. Its collaborative documents, spreadsheets, and presentations already help curtail paper usage, but its OCR feature helps curb the paper mess even more.
OCR, or Optical Character Recognition, is the most important tech to help you go paperless. Scanned documents on their own are only glorified pictures of your documents, but let your computer recognize the text and they instantly become a ton more useful. We’ve already looked at how to OCR documents in Adobe Acrobat:
We’ve looked at turning PDF files into documents you can edit in Word as well:
Now if you don’t have a copy of Acrobat or Word, there’s an even better option: Google Drive. It includes a little-known free OCR tool that is a powerful, easy to use image to text converter.
In this tutorial, we’ll look at what is Google Drive’s OCR process and simple steps to begin working with it. I’ll show you how to use Google Drive to quickly convert your scanned images and PDF documents into editable text files online.
Learn how to perform optical character recognition (OCR) on Google Cloud Platform. This tutorial demonstrates how to upload image files to Google Cloud Storage, extract text from the images using the Google Cloud Vision API, translate the text using the Google Cloud Translation API, and save your translations back to Cloud Storage. Google Cloud Pub/Sub is used to queue various tasks and trigger the right Cloud Functions to carry them out.
Extract, translate and save text contained in uploaded images.
Visualizing the flow of data
The flow of data in the OCR tutorial application involves several steps:
An image that contains text in any language is uploaded to Cloud Storage.
A Cloud Function is triggered, which uses the Vision API to extract the text and detect the source language.
The text is queued for translation by publishing a message to a Pub/Sub topic. A translation is queued for each target language different from the source language.
If a target language matches the source language, the translation queue is skipped, and text is sent to the result queue, another Pub/Sub topic.
A Cloud Function uses the Translation API to translate the text in the translation queue. The translated result is sent to the result queue.
Another Cloud Function saves the translated text from the result queue to Cloud Storage.
The results are found in Cloud Storage as txt files for each translation.