tensorflow/lite/g3doc/examples/optical_character_recognition/overview.md
Optical character recognition (OCR) is the process of recognizing characters from images using computer vision and machine learning techniques. This reference app demos how to use TensorFlow Lite to do OCR. It uses a combination of text detection model and a text recognition model as an OCR pipeline to recognize text characters.
If you are new to TensorFlow Lite and are working with Android, we recommend exploring the following example application that can help you get started.
<a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/optical_character_recognition/android">Android example</a>
If you are using a platform other than Android, or you are already familiar with the TensorFlow Lite APIs, you can download the models from TF Hub.
OCR tasks are often broken down into 2 stages. First, we use a text detection model to detect the bounding boxes around possible texts. Second, we feed processed bounding boxes into a text recognition model to determine specific characters inside the bounding boxes (we also need to do Non-Maximal Suppression, perspective transformation and etc. beforing text recognition). In our case, both models are from TensorFlow Hub and they are FP16 quantized models.
Performance benchmark numbers are generated with the tool described here.
<table> <thead> <tr> <th>Model Name</th> <th>Model size </th> <th>Device </th> <th>CPU</th> <th>GPU</th> </tr> </thead> <tr> <td> <a href="https://tfhub.dev/sayakpaul/lite-model/east-text-detector/fp16/1">Text Detection</a> </td> <td>45.9 Mb</td> <td>Pixel 4 (Android 10)</td> <td>181.93ms*</td> <td>89.77ms*</td> </tr> <tr> <td> <a href="https://tfhub.dev/tulasiram58827/lite-model/keras-ocr/float16/2">Text Recognition</a> </td> <td>16.8 Mb</td> <td>Pixel 4 (Android 10)</td> <td>338.33ms*</td> <td>N/A**</td> </tr> </table>* 4 threads used.
** this model could not use GPU delegate since we need TensorFlow ops to run it
The text detection model accepts a 4-D float32 Tensor of (1, 320, 320, 3) as
input.
The text recognition model accepts a 4-D float32 Tensor of (1, 31, 200, 1) as
input.
The text detection model returns a 4-D float32 Tensor of shape (1, 80, 80, 5)
as bounding box and a 4-D float32 Tensor of shape (1,80, 80, 5) as detection
score.
The text recognition model returns a 2-D float32 Tensor of shape (1, 48) as
the mapping indices to the alphabet list '0123456789abcdefghijklmnopqrstuvwxyz'
The current text recognition model is trained using synthetic data with English letters and numbers, so only English is supported.
The models are not general enough for OCR in the wild (say, random images taken by a smartphone camera in a low lighting condition).
So we have chosen 3 Google product logos only to demonstrate how to do OCR with TensorFlow Lite. If you are looking for a ready-to-use production-grade OCR product, you should consider Google ML Kit. ML Kit, which uses TFLite underneath, should be sufficient for most OCR use cases, but there are some cases where you may want to build your own OCR solution with TFLite. Some examples are: