4TH MULTILINGUAL REPRESENTATION LEARNING (MRL) WORKSHOP @EMNLP 2024

4TH MULTILINGUAL REPRESENTATION LEARNING (MRL) WORKSHOP

CO-LOCATED WITH EMNLP IN MIAMI, NOVEMBER 16 2024

Keynote Speakers

Karen Livescu, TTI at Chicago

Sebastian Ruder, Cohere

Hila Gonen, University of Washington

Workshop Schedule

09:00 - 09:10	Opening remarks
09:10 - 09:50	Invited talk by Karen Livescu
09:50 - 10:30	Invited talk by Sebastian Ruder
10:30 - 11:00	Coffee Break
11:00 - 12:30	Poster Session
12:30 - 14:00	Lunch Break
14:00 - 14:30	Shared Task Session • Findings Paper • Winning team presentation
14:30 - 15:30	Best Paper Session • Best Paper • Honorable Mentions
15:30 - 16:00	Coffee Break
16:00 - 16:50	Invited talk by Hila Gonen
16:50 - 17:00	Closing remarks

Workshop Description

Multi-lingual representation learning methods have recently been found to be extremely efficient in learning features useful for transfer learning between languages and demonstrating potential in achieving successful adaptation of natural language processing (NLP) models into languages or tasks with little to no training resources. On the other hand, there are many aspects of such models which have the potential for further development and analysis in order to prove their applicability in various contexts. These contexts include different NLP tasks and also understudied language families, which face important obstacles in achieving practical advances that could improve the state-of-the-art in NLP of various low-resource or underrepresented languages.

This workshop aims to bring together the research community consisting of scientists studying different aspects in multilingual representation learning, currently the most promising approach to improve the NLP in low-resource or underrepresented languages, and provide the rapidly growing number of researchers working on the topic with a means of communication and an opportunity to present their work and exchange ideas. The main objectives of the workshop will be:

• To construct and present a wide array of multi-lingual representation learning methods, including their theoretical formulation and analysis, practical aspects such as the application of current state-of-the-art approaches in transfer learning to different tasks or studies on adaptation into previously under-studied context;
• To provide a better understanding on how the language typology may impact the applicability of these methods and motivate the development of novel methods that are more generic or competitive in different languages;
• To promote collaborations in developing novel software libraries or benchmarks in implementing or evaluating multi-lingual models that would accelerate progress in the field.

By allowing a communication means for research groups working on machine learning, linguistic typology, or real-life applications of NLP tasks in various languages to share and discuss their recent findings, our ultimate goal is to support rapid development of NLP methods and tools that are applicable to a wider range of languages.

Accepted Papers

Congratulations to all the accepted papers:

• SambaLingo: Teaching Large Language Models New Languages
Zoltan Csaki, Bo Li, Jonathan Lingjie Li, Qiantong Xu, Pian Pawakapan, Leon Zhang, Yun Du, Hengyu Zhao, Changran Hu and Urmish Thakker

• What an Elegant Bridge: Multilingual LLMs are Biased Similarly in Different Languages
Viktor Mihaylov and Aleksandar Shtedritski

• Adapting Open-Source Generative Large Language Models for Low-Resource Languages: A Case Study for Turkish
Cagri Toraman

• An Efficient Approach for Studying Cross-Lingual Transfer in Multilingual Language Models
Fahim Faisal and Antonios Anastasopoulos

• Are You Sure? Rank Them Again: Repeated Ranking For Better Preference Datasets
Peter Devine

• Tagengo: A Multilingual Chat Dataset
Peter Devine

• Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization
Alexandra Chronopoulou, Jonas Pfeiffer, Joshua Maynez, Xinyi Wang, Sebastian Ruder and Priyanka Agrawal

• Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural Priming
Demi Zhang, Bushi Xiao, Chao Gao, Sangpil Youm and Bonnie J Dorr

• Recipe for Zero-shot POS Tagging: Is It Useful in Realistic Scenarios?
Zeno Vandenbulcke, Lukas Vermeire and Miryam De Lhoneux

• Gender-specific Machine Translation with Large Language Models
Eduardo Sánchez, Pierre Andrews, Pontus Stenetorp, Mikel Artetxe and Marta R. Costa-jussà

• Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever
Han Xiao, Bo Wang and Rohan Jha

• Cross-Lingual Named Entity Recognition for Low-Resource Languages: A Hindi-Nepali Case Study Using Multilingual BERT Models
Dipendra Yadav, Sumaiya Suravee, Tobias Strauß and Kristina Yordanova

• Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR
Abhishek Gupta, Amruta Parulekar, Sameep Chattopadhyay and Preethi Jyothi

• Towards Cross-Linguistic Semantic Grounding using Dictionary Graph Analysis
Ethan Eschrich and Zoey Liu

• Vikhr: Constructing a State-of-the-art Bilingual Open-Source Instruction-Following Large Language Model for Russian
Aleksandr Nikolich, Konstantin Korolev, Sergei Bratchikov, Igor Kiselev and Artem Shelmanov

• Mitigating the Linguistic Gap with Phonemic Representations for Robust Cross-lingual Transfer
Haeji Jung, Changdae Oh, Jooeon Kang, Jimin Sohn, Kyungwoo Song, Jinkyu Kim and David R Mortensen

• Leveraging Adapters for Improved Cross-lingual Transfer for Low-Resource Creole MT
Marcell Richard Fekete, Ernests Lavrinovics, Nathaniel Romney Robinson, Heather Lent, Raj Dabre and Johannes Bjerva

• Evaluating Multilingual Long-Context Models for Retrieval and Reasoning
Ameeta Agrawal, Andy Dang, Sina Bagheri Nezhad, Rhitabrat Pokharel and Russell Scheinberg

• Community OSCAR: A Community Effort for Multilingual Web dd
Manuel Brack, Malte Ostendorff, Pedro Ortiz Suarez, José Javier Saiz, Iñaki Lacunza Castilla, Jorge Palomar-Giner, Alexander Shvets, Patrick Schramowski, Georg Rehm, Marta Villegas and Kristian Kersting

• Leveraging LLMs for Translating and Classifying Mental Health Data
Konstantinos Skianis, A. Seza Doğruöz and John Pavlopoulos

• Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking
Emre Can Acikgoz, Mete Erdogan and Deniz Yuret

• Unsupervised Text Representation Learning via Instruction-Tuning for Zero-Shot Dense Retrieval
Qiuhai Zeng, Zimeng Qiu, Dae Yon Hwang, Xin He and William M. Campbell

• Language Bias in Multilingual Information Retrieval: The Nature of the Beast and Mitigation Methods
Jinrui Yang, Fan Jiang and Timothy Baldwin

• Representational Isomorphism and Alignment of Multilingual Large Language Models
Di Wu, Yibin Lei, Andrew Yates and Christof Monz

• Generalization Measures for Zero-Shot Cross-Lingual Transfer
Saksham Bassi, Duygu Ataman and Kyunghyun Cho

• Detecting and Translating Language Ambiguity with Multilingual LLMs
Behrang Mehrparvar and Sandro Pezzelle

• MLT-DR: Multi-Lingual/Task Demonstration RetrievalAn Attempt towards Generalized Retriever for In-Context Learning
Kazuma Hashimoto, Arjun Reddy Akula, Karthik Raman and Michael Bendersky

Main Topics

Topics of interest include, but are not limited to:

• Understanding the learning dynamics of multi-lingual representation learning methods

• Multilingual pretraining for discriminative and generative downstream tasks

• Probing and analysis of multilingual representations

• New methods for multi-lingual representation learning

• New approaches to language adaptation of NLP systems

• Zero-shot and few-shot learning for multilingual NLP

• Investigating and understanding transfer learning methods for adaptation of NLP systems into previously under-studied languages, such as morphologically-rich languages

• Data sets, benchmarks or libraries for implementing and evaluating multi-lingual models

Submissions

Research papers: We invite all potential participants to submit their novel research contributions in the related fields as long papers following the EMNLP 2023 long paper format (anonymized with 8 pages excluding the references, and an additional page for the camera-ready versions for the accepted papers). All accepted research papers will be published as part of our workshop proceedings and presented either as oral or poster presentations. Our research paper track will accept submissions through our own submission system available at MRL 2024 OpenReview Submission.

Extended abstracts: Besides long paper submissions, we also invite previously published or ongoing and incomplete research contributions to our non-archival extended abstract track. All extended abstracts can use the same EMNLP template with a 2-page limit, excluding the bibliography.

Shared Task

MRL 2024 continues the 2nd shared task on Multi-task Multi-lingual Information Retrieval, which provides a new multilingual evaluation benchmark for assessment of large scale representation learning models in a diverse set of under-represented languages in a range of predictive and generative tasks.

Important Dates

• Sep. 1, 2024: Paper Due Date

• Oct. 3, 2024: Notification of Acceptance

• Oct. 10, 2024: Camera-ready papers due

• Nov. 15-16, 2024: Workshop

• Nov. 12-14, 2024: Main conference

(All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”))

Organizers

David Ifeoluwa Adelani, McGill University and Mila	Duygu Ataman, New York University	Mammad Hajili, Microsoft	Abraham Owodunni, OSU and Masakhane	Jonne Sälevä, Brandeis University	David Stap, University of Amsterdam	Francesco Tinner, University of Amsterdam