What is OCR?
OCR, or Optical Character Recognition, is one
of the traditional computer vision tasks. It allows computers to analyze
printed or handwritten documents and convert the text data to digital formats so
that computers can understand and process.
Human eyes can recognize various patterns,
fonts and styles. While with computers, any scanned document is a graphic file.
OCR software needs to localize, detect and recognize characters on the scanned image
and turns data into a computer text file. After that steps, data becomes meaningful
information. Texts in a machine-readable form can then be used for different
purposes. They can be used in search of patterns, used to generate meaningful reports,
fed into excel spreadsheets, sent to another computer software such as ERP or
The traditional OCR solution solved the manual
data entry steps required for capturing data. It works very well with high
accuracy when the document is low variable. The way traditional OCR works is similar
to biometric device. Photo sensor technology was used to collect the match
points of physical attributes and then convert it into known data types. The
known types are stored in the mapping database. Thus it requires a setup
process to define mapping rules and templates. This approach works well if the
text layout within the scanned image matches the layout coded in the template.
However the rules definition and templates setup can be annoying and
One example of using OCR technology is the
postal system. OCR can be designed to mark codes for easier sorting with
high-speed sorting machines in the postal distribution system. Thanks to the low
variability with letters and packages, the process is much easier and the
output is highly accurate comparing to other applications.
Difficulties with Traditional OCR
When it comes to OCR, rules and templates are
required in order for the technology to actually capture the necessary data.
This means a long and expensive set up process because each individual
alteration requires a new rule. There are also streams of errors that can arise
such as false positives from having zero flexibility in regards to document
variability. OCR technology cannot be totally automated - there will always
need to be more rules set up. For instance, when it comes to invoice digitization,
every field needs an individual rule.
Traditional OCR will be problematic when dealing
with a large number of document layouts or in cases the business frequently
encounters new types of documents. One example is for invoice processing system
which receives new types of invoices from different suppliers. The template
approach may work well initially but when the number of suppliers grows and
changes, the number of rules and templates will be increased rapidly and become
What is Artificial Intelligence (AI)?
Artificial intelligence (AI) is branch of
computer science that dealing with the creation of intelligent machines that
work and react like humans. According to the Merriam-Webster dictionary, AI is “a branch of computer
science dealing with the simulation of intelligent behavior in computers” with
“the capability of a machine to imitate intelligent human behavior”.
As of today, AI is used in almost every aspect of our lives. According
to Forbes, of companies interviewed, 61% see developing AI as an
urgent issue, while only 50% have implemented some kind of AI. At the same
time, 83% of respondents believe that AI is a strategic priority for businesses
The famous examples of AI are autonomous
vehicles such as auto-driving cars or drones. Other examples are online
assistants (such as Siri, Cortana or chatbot), healthcare (medical
diagnostics), searching (google), face recognition (Facebook tagging), or data
What is AI-OCR?
Applying AI technology can help OCR to improve
the accuracy in many scenarios. AI-OCR not only looks at individual letter
contours in an attempt to guess the meaning. Instead, after performing scanning
step, AI can check dictionaries for words and looking at the context to make
sure that the selected combination matches the surrounding information. Like
human, it has a learning capability so that the more data it works on, the
smarter the system becomes.
The AI-OCR uses a trained neuron networks
which encodes thousands of rules for determining the meaning of the data. The
model is generally trained using a combination of supervised
learning methods. A trained model can fine-tune itself as more
training data is collected and ingested into the system. The machine learning
approach is much more scalable across languages and across different types of
documents. This approach requires significant initial effort to build
high-quality training models and entity recognition models, but not like traditional
OCR, the system once is built, it scales faster and better.
When to use AI-OCR
In case of processing regular mix of documents
with different types, such as invoices from different suppliers, the AI-OCR is
highly accurate and can adapt to all kinds of layouts. Given the fast
developing of AI software, both open-source and commercial, in practice, now software
developers can extract data from documents using AI technology in only a just few
lines of code. With its learning capability, the AI-OCR will continue to
advance and improve, the accuracy and data digitization are getting better and
There are some simple cases where traditional
OCR may make more sense - specifically, when all invoices come in the same
format or in a few fixed formats, because setting up the rules is pretty
straightforward. Another example is extracting highly specific information that
is always encoded in the same format such as passports or driver licenses -
training the AI may be challenging as it does not benefit from the data
gathered across all users of the solutions.