MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval

News

To submit your QA test predictions, please upload your predictions to Codabench: MC QA, Open QA

For NER, we welcome submissions via email to mrlworkshop2024@gmail.com. We evaluate them using conlleval script.

The test sets have been released and accessible on the following link: test sets

The validation data have been released and accessible on the following links: QA validation sets, NER validation sets

Description and Objectives

A primary goal of MRL is to promote the development of more linguistically comprehensive multilingual representation models. The MRL edition in 2022 and 2023 involved the organization of new shared tasks providing a new means for evaluating large multi-lingual models in terms of contemporary tasks that are relevant and important to the current research in computational linguistics, with a focus on learning and generalization of morphosyntactic structures across languages in 2022 and natural language understanding in 2023, resulting two new multilingual evaluation benchmarks. In 2024, we hope to continue the organization of shared tasks where additional capabilities of large multi-lingual models can be analyzed in new settings for better understanding their limitations and applicability in different languages and settings.

Tasks and Evaluation

With the advancement of language models accessing and processing tons of information in different formats and languages, it has become of great importance to be able to assess the capabilities to access and provide the right information useful to different audiences. In this shared task, we provide a multi-task evaluation format that assesses information retrieval capabilities of language models in terms of two subtasks: named entity recognition and question answering.

Named Entity Recognition (NER) is a classification task that identifies phrases in a text that refer to entities or predefined categories (such as dates, person, organization and location names) and it is an important capability for information access systems that perform entity look-ups for knowledge verification, spell-checking or localization applications. The objective of the system is to tag the named entities in a given text, either as a person (PER), organization (ORG), or location (LOC) (Our tag set uses $$ as delimiter). The output tags should have the CONLL format described more in detail in this link.

Question answering (QA) is an important capability that enables responding to natural language questions with answers found in text. Here we focus on the information-seeking scenario where questions can be asked without knowing the answer—it is the system’s job to locate a suitable answer passage (if any). The information-seeking question-answer pairs tend to exhibit less lexical and morphosyntactic overlap between the question and answer since they are written separately, which is a more suitable setting to evaluate typologically-diverse languages. Here, the system is given a question, title, and a passage and the system must pick the right answer among a list of 4 different potential options.

Evaluation in the generative task is measured in terms of the accuracy in the multi-choice answering task, and the F1 accuracy in the NER task. We obtain a final score by averaging the scores of QA and NER.

Data and Languages

The training and validation data sets that can be used for building multi-task information retrieval systems are directly accessible on the XTREME-UP repository. The test sets for official evaluation will be released ten days before the submission date, and will be in the following languages: Igbo, Swiss German, Turkish, Uzbek, Azerbaijani, and Yoruba. We also anticipate that there will be one or two surprise languages in the final test sets.

Participation

Interested parties are invited to contact mrlworkshop2024@gmail.com or join the google group mrl-2024@googlegroups.com@googlegroups.com to be involved in the competition. All participating systems will be evaluated together with our baselines against the same held-out test set, to be released shortly before evaluation. Submitted systems can compete in some or all sub-tasks. Participating teams will be invited to submit a paper describing their work to the MRL workshop and to present it in a special session in the workshop. Paper submissions must follow the EMNLP paper format before the paper submission deadline.

Important dates

June 15, 2024: Release of validation data
August 1, 2024: Release of testing data
August 20, 2024: Deadline to commit external data and resources used in systems
September 1, 2024: Deadline for submission of system outputs
September 20, 2024: Release of rankings and results
September 30, 2024: Deadline for submitting system description papers
November 15-16, 2024: Workshop

Ranking

The systems will be evaluated based on the global ranking on all benchmark languages. Participants can submit systems that are language-specific (monolingual) and their systems will be evaluated as a partial submission to the specific language their system is trained on.

Additional resources

The shared task allows participants to use external resources or tools as long as they are openly available and can be, in theory, used by other participants for research purposes. In case participants decide to use external resources and data in their system they should contact the organizers in case the specific resources would be permitted, in such cases specific information on the used resources and how they can be obtained should be shared via email by September 15th, 2024.

Organizers

David Adelani, UCL and Google Deepmind
Duygu Ataman, New York University
Mammad Hajili, Microsoft
Francesco Tinner, University of Amsterdam

Shared Task Prize

The winning team will receive an award of 500 USD and will be given a presentation during the workshop.