MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval


Description and Objectives


A primary goal of MRL is to promote the development of more linguistically comprehensive multilingual representation models. The MRL edition in 2022 and 2023 involved the organization of new shared tasks providing a new means for evaluating large multi-lingual models in terms of contemporary tasks that are relevant and important to the current research in computational linguistics, with a focus on learning and generalization of morphosyntactic structures across languages in 2022 and natural language understanding in 2023, resulting two new multilingual evaluation benchmarks. In 2024, we hope to continue the organization of shared tasks where additional capabilities of large multi-lingual models can be analyzed in new settings for better understanding their limitations and applicability in different languages and settings.

Tasks and Evaluation


With the advancement of language models accessing and processing tons of information in different formats and languages, it has become of great importance to be able to assess the capabilities to access and provide the right information useful to different audiences. In this shared task, we provide a multi-task evaluation format that assesses information retrieval capabilities of language models in terms of two subtasks: named entity recognition and question answering.


Named Entity Recognition (NER) is a classification task that identifies phrases in a text that refer to entities or predefined categories (such as dates, person, organization and location names) and it is an important capability for information access systems that perform entity look-ups for knowledge verification, spell-checking or localization applications. The objective of the system is to tag the named entities in a given text, either as a person (PER), organization (ORG), or location (LOC) (Our tag set uses $$ as delimiter).


Question answering (QA) is an important capability that enables responding to natural language questions with answers found in text. Here we focus on the information-seeking scenario where questions can be asked without knowing the answer—it is the system’s job to locate a suitable answer passage (if any). The information-seeking question-answer pairs tend to exhibit less lexical and morphosyntactic overlap between the question and answer since they are written separately, which is a more suitable setting to evaluate typologically-diverse languages. Here, the system is given a question, title, and a passage and the system must pick the right answer among a list of 4 different potential options.


Evaluation in the generative task is measured in terms of the accuracy in the multi-choice answering task, and the F1 accuracy in the NER task. We obtain a final score by averaging the scores of QA and NER.

Data and Languages


The training and validation data sets that can be used for building multi-task information retrieval systems are directly accessible on the XTREME-UP repository. The test sets for official evaluation will be released ten days before the submission date, and will be in the following languages: Igbo, Indonesian, Swiss German, Turkish, Uzbek, Yoruba. We also anticipate that there will be one or two surprise languages in the final test sets.


Participation


Interested parties are invited to contact mrlworkshop2024@gmail.com or join the google group mrl-shared-task-2023@googlegroups.com to be involved in the competition. All participating systems will be evaluated together with our baselines against the same held-out test set, to be released shortly before evaluation. Submitted systems can compete in some or all sub-tasks. Participating teams will be invited to submit a short paper describing their work to the MRL workshop and to present it in a special session in the workshop. Paper submissions must follow the EMNLP paper format and sent to Softconf Conference Link of MRL 2023 before the paper submission deadline.

Important dates


June 1, 2024: Release of validation data
August 1, 2024: Release of testing data
August 20, 2024: Deadline to commit external data and resources used in systems
September 1, 2024: Deadline for submission of system outputs
September 20, 2024: Release of rankings and results
September 30, 2024: Deadline for submitting system description papers
November 15-16, 2024: Workshop

Ranking


The systems will be evaluated based on the global ranking on all benchmark languages. Participants can submit systems that are language-specific (monolingual) and their systems will be evaluated as a partial submission to the specific language their system is trained on.

Additional resources


The shared task allows participants to use external resources or tools as long as they are openly available and can be, in theory, used by other participants for research purposes. In case participants decide to use external resources and data in their system they should contact the organizers in case the specific resources would be permitted, in such cases specific information on the used resources and how they can be obtained should be shared via email by September 15th, 2023.

Organizers


David Adelani, UCL and Google Deepmind
Duygu Ataman, New York University
Mammad Hajili, Microsoft
Inder Khatri, New York University
Francesco Tinner, University of Amsterdam

Shared Task Prize


The winning team will receive an award of 500 USD and will be given a presentation during the workshop.

Sponsors



Interested in being a Sponsor? Contact us!