reciTAL R&D dedicated to EMNLP'21 with five accepted papers

reciTAL is proud to announce that five of its scientific publications have been accepted at EMNLP’21 (Empirical Methods in Natural Language Processing), one of the world’s most renowned language processing conferences. reciTAL thus confirms its position as technological leader in the field.

EMNLP, an internationally renowned scientific event

Founded in 1996, EMNLP is a leading international scientific event in the field of natural language processing (NLP) and artificial intelligence (AI). EMNLP is one of the two main conferences in the field worldwide, along with the Association for Computational Linguistics (ACL). Researchers from all over the world submit their research, which is then reviewed and selected by their peers.

reciTAL, a player at the cutting edge of NLP

Since its creation in 2017, reciTAL has invested heavily in R&D and contributed to the development of the state of the art in NLP. reciTAL now boasts 6 Ph.D.s. and 3 PhD students in AI.

Their work focuses in particular on techniques for improving Intelligent Document Processing devices. These include multimodal language models and text generation techniques.

This year, five papers, written and/or co-authored by members of the reciTAL research team, attracted the attention of EMNLP.

“It’s a great honor to have our research team’s work published at this internationally renowned conference. This confirms reciTAL ‘s leadership in the NLP field. “says Thomas Scialom, Research Scientist at reciTAL.

Articles accepted at EMNLP'21

Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering
A Riabi, T Scialom, R Keraron, B Sagot, D Seddah, J Staiano

This publication focuses on the topic of multilingual Question Answering. Deep learning has enabled major advances in natural language processing, and machines are now capable of reading text and answering questions. However, the performance of the models is significantly lower when used in languages other than English. To improve the performance of answers to multilingual questions, our researchers, in collaboration with INRIA, are proposing a method based on question generation to produce multilingual synthetic data. This method significantly outperforms English-only trained models.

Skim-Attention: Learning to Focus via Document Layout
L Nguyen, T Scialom, J Staiano, B Piwowarski

Transformer learning algorithms have proved effective in a number of document comprehension tasks. Despite this success, learning models are computationally and memory-intensive. Motivated by strategies inspired by human reading, this publication introduces Skim-Attention, a new attention mechanism that takes advantage of document structure and layout. Skim-Attention only deals with the spatial position of words in a document. Our researchers’ experiments show that Skim-Attention improves on the state of the art, while being more efficient in terms of resources (both memory and computation). Skim-Attention can be combined with Transformers for efficient processing of long documents.

Questeval: Summarization asks for fact-based evaluation
T Scialom, PA Dray, P Gallinari, S Lamprier, B Piwowarski, J Staiano

Abstract evaluation is an unsolved research problem: current metrics such as ROUGE are known to be limited and poorly reflect human judgment. To get around this problem, recent work has proposed evaluation measures based on Question Answering models to determine whether a summary contains all the relevant information in its source document. Although promising, these approaches have so far failed to improve RED. In this article, written in collaboration with New York University, the researchers extend previous approaches and propose a unified framework, named QuestEval. Unlike established metrics such as ROUGE or BERTScore, QuestEval requires no reference data, while significantly improving correlation with human judgments.

Data-QuestEval: A Referenceless Metric for Data to Text Semantic Evaluation
C Rebuffel, T Scialom, L Soulier, B Piwowarski, S Lamprier, J Staiano, …

In this fourth paper, written in collaboration with CRITEO and BNP, the authors explore how QuestEval, a Text-to-Text metric, can be adapted to the evaluation of Data-to-Text generation systems. QuestEval is a reference-free metric that compares predictions directly to structured input data (e.g. tables) by automatically asking and answering questions. Adapting to Data-to-Text requires multimodal Question Generation & Question Answering systems. To this end, the authors propose to build a multimodal synthetic corpus for training multimodal HQ/AQ models. The resulting reference-free, multimodal metric achieves state-of-the-art correlations with human judgment on the E2E and WebNLG benchmarks.

QACE: Asking Questions to Evaluate an Image Caption
H Lee, T Scialom, S Yoon, F Dernoncourt, K Jung

In this paper, written in collaboration with Adobe Research and Seoul University, the researchers propose QACE (Question Answering for Caption Evaluation), a new metric based on answering questions for evaluating captions (of images, figures…). QACE generates questions on the evaluated caption and verifies its content by asking questions on the reference caption or source image. The researchers first developed the QACE-Ref, which compares the responses of the legend being evaluated to its reference, and reported competitive results against existing metrics. Then they came up with QACE-Img, which asks questions directly on the image, rather than on the reference text..