When to use AI-OCR?

When to use AI-OCR?

What is OCR?

OCR, or Optical Character Recognition, is one of the traditional computer vision tasks. It allows computers to analyze printed or handwritten documents and convert the text data to digital formats so that computers can understand and process.

Human eyes can recognize various patterns, fonts and styles. While with computers, any scanned document is a graphic file. OCR software needs to localize, detect and recognize characters on the scanned image and turns data into a computer text file. After that steps, data becomes meaningful information. Texts in a machine-readable form can then be used for different purposes. They can be used in search of patterns, used to generate meaningful reports, fed into excel spreadsheets, sent to another computer software such as ERP or accounting etc.


Traditional OCR

The traditional OCR solution solved the manual data entry steps required for capturing data. It works very well with high accuracy when the document is low variable. The way traditional OCR works is similar to biometric device. Photo sensor technology was used to collect the match points of physical attributes and then convert it into known data types. The known types are stored in the mapping database. Thus it requires a setup process to define mapping rules and templates. This approach works well if the text layout within the scanned image matches the layout coded in the template. However the rules definition and templates setup can be annoying and time-consuming.


One example of using OCR technology is the postal system. OCR can be designed to mark codes for easier sorting with high-speed sorting machines in the postal distribution system. Thanks to the low variability with letters and packages, the process is much easier and the output is highly accurate comparing to other applications.


Difficulties with Traditional OCR

When it comes to OCR, rules and templates are required in order for the technology to actually capture the necessary data. This means a long and expensive set up process because each individual alteration requires a new rule. There are also streams of errors that can arise such as false positives from having zero flexibility in regards to document variability. OCR technology cannot be totally automated - there will always need to be more rules set up. For instance, when it comes to invoice digitization, every field needs an individual rule.

Traditional OCR will be problematic when dealing with a large number of document layouts or in cases the business frequently encounters new types of documents. One example is for invoice processing system which receives new types of invoices from different suppliers. The template approach may work well initially but when the number of suppliers grows and changes, the number of rules and templates will be increased rapidly and become unmanageable.


What is Artificial Intelligence (AI)?

Artificial intelligence (AI) is branch of computer science that dealing with the creation of intelligent machines that work and react like humans. According to the Merriam-Webster dictionary, AI is “a branch of computer science dealing with the simulation of intelligent behavior in computers” with “the capability of a machine to imitate intelligent human behavior”.

As of today, AI is used in almost every aspect of our lives. According to Forbes, of companies interviewed, 61% see developing AI as an urgent issue, while only 50% have implemented some kind of AI. At the same time, 83% of respondents believe that AI is a strategic priority for businesses today.

The famous examples of AI are autonomous vehicles such as auto-driving cars or drones. Other examples are online assistants (such as Siri, Cortana or chatbot), healthcare (medical diagnostics), searching (google), face recognition (Facebook tagging), or data digitization (AI-OCR).

What is AI-OCR?

Applying AI technology can help OCR to improve the accuracy in many scenarios. AI-OCR not only looks at individual letter contours in an attempt to guess the meaning. Instead, after performing scanning step, AI can check dictionaries for words and looking at the context to make sure that the selected combination matches the surrounding information. Like human, it has a learning capability so that the more data it works on, the smarter the system becomes.

The AI-OCR uses a trained neuron networks which encodes thousands of rules for determining the meaning of the data. The model is generally trained using a combination of supervised and unsupervised learning methods. A trained model can fine-tune itself as more training data is collected and ingested into the system. The machine learning approach is much more scalable across languages and across different types of documents. This approach requires significant initial effort to build high-quality training models and entity recognition models, but not like traditional OCR, the system once is built, it scales faster and better.

When to use AI-OCR

In case of processing regular mix of documents with different types, such as invoices from different suppliers, the AI-OCR is highly accurate and can adapt to all kinds of layouts. Given the fast developing of AI software, both open-source and commercial, in practice, now software developers can extract data from documents using AI technology in only a just few lines of code. With its learning capability, the AI-OCR will continue to advance and improve, the accuracy and data digitization are getting better and better.

There are some simple cases where traditional OCR may make more sense - specifically, when all invoices come in the same format or in a few fixed formats, because setting up the rules is pretty straightforward. Another example is extracting highly specific information that is always encoded in the same format such as passports or driver licenses - training the AI may be challenging as it does not benefit from the data gathered across all users of the solutions.