Intelligent Document Processing: the dawn of a new era in document capture
Intelligent Document Processing is a relatively new concept that combines three main functions:
- Categorize: Recognize a document and assign it to a document category (invoice, contract, report, receipt, etc.).
- Capture: Be able to read a document thanks to automatic character recognition
- Search: be able to find specific information in a wide range of documents
Until now, the market has been able to rely on traditional LAD/RAD and OCR technologies to carry out these three operations required for enterprise document processing. Without going into too much technical detail, to achieve relatively satisfactory results with this method, a number of rules had to be described, in order to understand the internal logic of the document. For example: the name of the issuer of an invoice is usually at the top left, the amount before tax is usually on the right, on the same line as “TOTAL before tax”, etc. The challenge here was to use recurring patterns to derive general rules.
Of course, the limitations of such a model quickly become apparent: as soon as the document typology changes, you have to start all over again from scratch. The work to be done is therefore time-consuming and inefficient.
Document capture: no longer reasoning with rules, but with examples
Using the latest AI technologies, reciTAL does not require any rules to be defined in order to read and understand a document. The technology used by reciTAL simply consists in providing a limited number of examples to the machine, which then learns to capture information from new documents (short, long, unstructured). Where “standard” Machine Learning required training on documentary corpora of several thousand examples, thanks to Deep Learning, reciTAL drastically reduces the input cost, requiring only a hundred examples to train the machine.
Financial reports, invoice lines, pay slips, performance tables… This new approach makes it possible to extract information from any document. This is a real plus for both software publishers and DMS integrators, who are constantly on the lookout for better document capture bricks to enhance their solutions. This hitherto unrivalled level of performance is also of great interest to analysts, who have to sift through hundreds of pages to extract a precise piece of information.
“The extraordinary progress made in NLP (Natural Language Processing) over the last three years has opened up a new chapter, enabling machines to understand and read documents of all kinds. reciTAL was born with this revolution, and we have chosen to accompany it, by offering a platform that adapts and makes available to everyone the state-of-the-art in Intelligent Document Processing. We believe that we are experiencing a key stage in the digital transformation of companies, comparable in impact to what ERP or CRM brought 20 years ago.”says Frédéric Allary, co-founder of reciTAL.
With reciTAL, capturing any document is just a click away
Invoices, quotes, expense claims, ID cards, passports, driving licenses… To meet the most common document capture requirements, reciTAL offers its customers a library of pre-trained templates. In this case, all the pre-configuration work has already been done, and you have a turnkey solution. CRM, EDM, messaging, RPA tools… A complete set of APIs enables you to connect to all your solutions.
If your business requires the ability to handle specific types of documents, for example in insurance or finance, don’t worry: reciTAL offers tailor-made templates. In this case, all you have to do is give the machine a hundred examples and annotate them on the reciTAL platform. More concretely, during the annotation phase, you define the data points that interest you and show the machine 100 times where the information you want to extract is located. Once this work has been done by hand, the machine is familiar with the type of document submitted and will automatically extract the data in the future.
Good to know: On average, it takes one day to annotate, train and test a custom model. You can do the annotation work yourself, or entrust it to one of our annotation partners.
reciTAL: performance like no other!
In December 2021, reciTAL carried out a benchmark of the main competing solutions. The result? With 82% reliability, reciTAL offers the best performance on the market, ahead of 5 of its competitors. Thanks to our table extraction and paragraph extraction plug-ins, our solution also delivers excellent performance on complex elements.
The little extra that makes all the difference? For each extraction performed, reciTAL provides a confidence score. This makes it possible to automate en masse and check only when necessary. Depending on the type of document, you can program the desired level of automation. Above a certain threshold, you can decide to automate all your processes, and below, to send the extractions made to the video-coding interface.
Easy to implement and use, reciTAL gives you access to unprecedented processing power. Thanks to its internationally recognized R&D, reciTAL has become a benchmark in the world of automatic document processing. Already in daily use by over 80,000 users, our solution opens up a whole new world of document capture possibilities. Ready to jump on the bandwagon? Contact us and ask for a demo!