An obvious source of productivity, automatic document processing is a major issue for companies. For decades, document capture technologies have been constantly improving. The latest development, Deep Learning, has enabled us to take a decisive step forward, particularly in the processing of unstructured documents. This sub-branch of Machine Learning, itself a sub-category of Artificial Intelligence, makes it possible to process previously intractable documents with unrivalled reliability. Explanations.

Document capture: the limits of a template-based approach

Legacy EDM and document processing solutions are based on a template and/or rule-based approach . This means that, for a given document model, an expert sets the machine’s parameters by giving it instructions: top right is the supplier’s name, left is the customer’s name, here is the amount before tax, and so on. This approachworks well for ultra-standardized, perfectly scanned documents such as Cerfa forms, but quickly showed its limitations:

In an open world where you can receive an invoice, purchase order or quotation from any service provider in the world, it’s impossible to set up the solution on every model in the world.
The slightest micro-change requires days of re-parameterization. If your supplier changes the layout of its invoices, historical solutions based on a template approach lack robustness and require you to start from scratch to create a new template.
Annual reports, financial documents, rows of tables… Some documents simply can’t be processed automatically, due to their complexity or variability.

Recent advances in Deep Learning are changing the game, to dramatically improve document capture performance.

Deep Learning: definition

Deep Learning is a branch of Machine Learning, which is itself a sub-discipline of Artificial Intelligence. Unlike standard Machine Learning, Deep Learning is based on deep neural networks. This type of algorithm takes the form of a multi-layer network. The first layer ingests the data, the next layers draw conclusions from the ingested data, while the last layer assigns a probability to each conclusion.

So, rather than operating by rule, the approach used in Deep Learning consists in having no presuppositions and letting the machine identify on its own what will be interesting for it, by providing it with examples in the form of raw data.

Deep Learning VS Machine Learning: what are the differences?

Thanks to its deep algorithms, the Deep Learning model will automatically determine which attributes to apply to obtain a result. Also known as “discriminating elements”, attributes designate the parameters to be taken into account when solving a problem. For example, if you’re trying to predict a hotel’s rating on TripAdvisor, the attributes to take into account could be distance from the city center, room price or the number of historic sites nearby. In standard Machine Learning, it is necessary for a human to tell the machine that it must rely on these attributes. In Deep Learning, on the other hand, all you have to do is let it work on the raw hotel data, and it will come to this conclusion on its own. This example-based approach is particularly useful in fields where discriminating elements are not known, such as sight, for example.

Deep Learning has produced excellent results in computer vision and image recognition, but has also proved its worth in language. This is known as natural language processing (NLP). Trained on huge corpora of texts written by humans, unsupervised Deep Learning algorithms have shown themselves to be context-sensitive, synonym-robust and multilingual.

How Deep Learning is revolutionizing document capture

OCR, LAD/RAD… If the method used for document capture remains unchanged, depending on whether the technology behind it is standard Machine Learning or Deep Learning, performance changes greatly.

reciTAL, the first LAD/RAD player to be awarded the Deep Tech label, uses Deep Learning algorithms to automate the capture, categorization and search processes for all types of document. The strength of reciTAL lies in its multimodal approach, combining computer vision and natural language processing (NLP). In other words, using both visual and linguistic approaches for intelligent document processing. After all, that’s what we humans do to understand a document! We read the text, of course, but we also rely on graphic layout elements to help us understand the content and hierarchy of information.

The use of these multimodal Deep Learning algorithms is a radical game-changer for document capture. Now you can :

Process any type of document, regardless of layout.
Train the machine on a small number of examples: we need only confront it with around 50 examples of one type of document, annotating the data points of interest to us, so that it knows how to process all documents of this type.
Be autonomous to improve the model. Unlike previous black-box solutions, you have full control over reciTAL’s pre-trained model. Just give him a few dozen examples and he’ll be up and running on new documents.
Process complex documents previously inaccessible to automatic processing. It’s now possible to extract financial data, contained in rows of tables, from documents hundreds of pages long, without any effort. Complex extractions and group extractions are now a reality.
Significantly improve reliability scores. Where we used to limit automation to 65-75%, reciTAL allows us to far exceed this figure.

Put to the service of document capture, recent advances in Deep Learning are paving the way for a whole new era of automatic document processing. No matter how varied or complex the documents, the results of the model developed by reciTAL are simply amazing! There’s only one way to convince yourself:
ask for a demo!