The reciTAL research team honored at EMNLP’21 with five accepted papers

reciTAL is proud to announce that five of its scientific publications have been accepted at EMNLP’21 (Empirical Methods in Natural Language Processing), one of the most prestigious AI/NLP conferences in the world. reciTAL thus confirms its position as a technological leader in the field.

EMNLP, a renowned international scientific event

Established in 1996, EMNLP is a leading international scientific event in the area of natural language processing (NLP) and artificial intelligence (AI). EMNLP is one of the two best conferences in the field, along with the Association for Computational Linguistics (ACL). Researchers from all over the world submit their research, reviewed and selected by their peers.

reciTAL, a state-of-the art NLP company

Since its creation in 2017, reciTAL has heavily invested in R&D and has contributed to the development of the state of the art in the field of NLP. Currently, reciTAL has 6 Ph.D. and 3 doctoral students in AI.

Their work focuses, with a particular attention, on techniques for improving systems for automatic document processing (Intelligent Document Processing). Among these techniques, multimodal language models and text generation techniques occupy a central place.

This year, five papers, written and / or co-written by members of the reciTAL research team, caught the attention of EMNLP.

“It’s a great honor that the work of our research team has been published at this internationally renowned conference. This confirms the leadership of reciTAL in the field of NLP. » – says Thomas Scialom, Research Scientist at reciTAL.

The papers accepted at EMNLP’21 :

Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering
A Riabi, T Scialom, R Keraron, B Sagot, D Seddah, J Staiano

Coupled with the availability of large scale datasets, deep learning architectures have enabled rapid progress on the Question Answering task. However, most of those datasets are in English, and the performances of state-of-the-art multilingual models are significantly lower when evaluated on non-English data. Due to high data collection costs, it is not realistic to obtain annotated data for each language one desires to support. The autors propose a method to improve the Cross-lingual Question Answering performance without requiring additional annotated data, leveraging Question Generation models to produce synthetic samples in a cross-lingual fashion. They show that the proposed method allows to significantly outperform the baselines trained on English data only. They report a new state-of-the-art on four multilingual datasets: MLQA, XQuAD, SQuAD-it and PIAF (fr).

Skim-Attention: Learning to Focus via Document Layout
L Nguyen, T Scialom, J Staiano, B Piwowarski

Transformer-based pre-training techniques of text and layout have proven effective in a number of document understanding tasks. Despite this success, multimodal pre-training models suffer from very high computational and memory costs. Motivated by human reading strategies, this paper presents Skim-Attention, a new attention mechanism that takes advantage of the structure of the document and its layout. Skim-Attention only attends to the 2-dimensional position of the words in a document. Their experiments show that Skim-Attention obtains a lower perplexity than prior works, while being more computationally efficient. Skim-Attention can be further combined with long-range Transformers to efficiently process long documents. They also show how Skim-Attention can be used off-the-shelf as a mask for any Pre-trained Language Model, allowing to improve their performance while restricting attention. Finally, the autors show the emergence of a document structure representation in Skim-Attention.

Questeval: Summarization asks for fact-based evaluation
T Scialom, PA Dray, P Gallinari, S Lamprier, B Piwowarski, J Staiano

Summarization evaluation remains an open research problem: current metrics such as ROUGE are known to be limited and to correlate poorly with human judgments. To alleviate this issue, recent work has proposed evaluation metrics which rely on question answering models to assess whether a summary contains all the relevant information in its source document. Though promising, the proposed approaches have so far failed to correlate better than ROUGE with human judgments. In this paper, the autors extend previous approaches and propose a unified framework, named QuestEval. In contrast to established metrics such as ROUGE or BERTScore, QuestEval does not require any ground-truth reference. Nonetheless, QuestEval substantially improves the correlation with human judgments over four evaluation dimensions (consistency, coherence, fluency, and relevance), as shown in the extensive experiments they report.

Data-QuestEval: A Referenceless Metric for Data to Text Semantic Evaluation
C Rebuffel, T Scialom, L Soulier, B Piwowarski, S Lamprier, J Staiano, …

In this paper, the autors explore how QuestEval, which is a Text-vs-Text metric, can be adapted for the evaluation of Data-to-Text Generation systems. QuestEval is a reference-less metric that compares the predictions directly to the structured input data by automatically asking and answering questions. Its adaptation to Data-to-Text is not straightforward as it requires multi-modal Question Generation and Answering (QG \& QA) systems. To this purpose, the autors propose to build synthetic multi-modal corpora that enables to train multi-modal QG/QA. The resulting metric is reference-less, multi-modal; it obtains state-of-the-art correlations with human judgement on the E2E and WebNLG benchmark.

QACE: Asking Questions to Evaluate an Image Caption
H Lee, T Scialom, S Yoon, F Dernoncourt, K Jung

In this paper, the autors propose QACE, a new metric based on Question Answering for Caption Evaluation. QACE generates questions on the evaluated caption and checks its content by asking the questions on either the reference caption or the source image. the autors first develop QACE-Ref that compares the answers of the evaluated caption to its reference, and report competitive results with the state-of-the-art metrics. To go further, they propose QACE-Img, which asks the questions directly on the image, instead of reference. A Visual-QA system is necessary for QACE-Img. Unfortunately, the standard VQA models are framed as a classification among only a few thousand categories. Instead, the autors propose Visual-T5, an abstractive VQA system. The resulting metric, QACE-Img is multi-modal, reference-less, and explainable. Their experiments show that QACE-Img compares favorably w.r.t. other reference-less metrics. The autors will release the pre-trained models to compute QACE.