4TH MULTILINGUAL REPRESENTATION LEARNING (MRL) WORKSHOP

CO-LOCATED WITH EMNLP IN MIAMI, NOVEMBER 16 2024



Keynote Speakers





Workshop Schedule

09:00 - 09:10  Opening remarks
09:10 - 09:50  Invited talk by Karen Livescu
09:50 - 10:30  Invited talk by Sebastian Ruder
10:30 - 11:00  Coffee Break
11:00 - 12:30  Poster Session
12:30 - 14:00  Lunch Break
14:00 - 14:30  Shared Task Session
  • • Findings Paper
  • • Winning team presentation
14:30 - 15:30  Best Paper Session
  • • Best Paper
  • • Honorable Mentions
15:30 - 16:00  Coffee Break
16:00 - 16:50  Invited talk by Hila Gonen
16:50 - 17:00  Closing remarks

Workshop Description

Multi-lingual representation learning methods have recently been found to be extremely efficient in learning features useful for transfer learning between languages and demonstrating potential in achieving successful adaptation of natural language processing (NLP) models into languages or tasks with little to no training resources. On the other hand, there are many aspects of such models which have the potential for further development and analysis in order to prove their applicability in various contexts. These contexts include different NLP tasks and also understudied language families, which face important obstacles in achieving practical advances that could improve the state-of-the-art in NLP of various low-resource or underrepresented languages.

This workshop aims to bring together the research community consisting of scientists studying different aspects in multilingual representation learning, currently the most promising approach to improve the NLP in low-resource or underrepresented languages, and provide the rapidly growing number of researchers working on the topic with a means of communication and an opportunity to present their work and exchange ideas. The main objectives of the workshop will be:

  •    • To construct and present a wide array of multi-lingual representation learning methods, including their theoretical formulation and analysis, practical aspects such as the application of current state-of-the-art approaches in transfer learning to different tasks or studies on adaptation into previously under-studied context;
  •    • To provide a better understanding on how the language typology may impact the applicability of these methods and motivate the development of novel methods that are more generic or competitive in different languages;
  •    • To promote collaborations in developing novel software libraries or benchmarks in implementing or evaluating multi-lingual models that would accelerate progress in the field.


By allowing a communication means for research groups working on machine learning, linguistic typology, or real-life applications of NLP tasks in various languages to share and discuss their recent findings, our ultimate goal is to support rapid development of NLP methods and tools that are applicable to a wider range of languages.




Accepted Papers

Congratulations to all the accepted papers:

  • • SambaLingo: Teaching Large Language Models New Languages 
    Zoltan Csaki, Bo Li, Jonathan Lingjie Li, Qiantong Xu, Pian Pawakapan, Leon Zhang, Yun Du, Hengyu Zhao, Changran Hu and Urmish Thakker

  • • What an Elegant Bridge: Multilingual LLMs are Biased Similarly in Different Languages 
    Viktor Mihaylov and Aleksandar Shtedritski

  • • Adapting Open-Source Generative Large Language Models for Low-Resource Languages: A Case Study for Turkish 
    Cagri Toraman

  • • An Efficient Approach for Studying Cross-Lingual Transfer in Multilingual Language Models 
    Fahim Faisal and Antonios Anastasopoulos

  • • Are You Sure? Rank Them Again: Repeated Ranking For Better Preference Datasets 
    Peter Devine

  • • Tagengo: A Multilingual Chat Dataset 
    Peter Devine

  • • Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization 
    Alexandra Chronopoulou, Jonas Pfeiffer, Joshua Maynez, Xinyi Wang, Sebastian Ruder and Priyanka Agrawal

  • • Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural Priming 
    Demi Zhang, Bushi Xiao, Chao Gao, Sangpil Youm and Bonnie J Dorr

  • • Recipe for Zero-shot POS Tagging: Is It Useful in Realistic Scenarios? 
    Zeno Vandenbulcke, Lukas Vermeire and Miryam De Lhoneux

  • • Gender-specific Machine Translation with Large Language Models 
    Eduardo Sánchez, Pierre Andrews, Pontus Stenetorp, Mikel Artetxe and Marta R. Costa-jussà

  • • Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever 
    Han Xiao, Bo Wang and Rohan Jha

  • • Cross-Lingual Named Entity Recognition for Low-Resource Languages: A Hindi-Nepali Case Study Using Multilingual BERT Models 
    Dipendra Yadav, Sumaiya Suravee, Tobias Strauß and Kristina Yordanova

  • • Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR 
    Abhishek Gupta, Amruta Parulekar, Sameep Chattopadhyay and Preethi Jyothi

  • • Towards Cross-Linguistic Semantic Grounding using Dictionary Graph Analysis 
    Ethan Eschrich and Zoey Liu

  • • Vikhr: Constructing a State-of-the-art Bilingual Open-Source Instruction-Following Large Language Model for Russian 
    Aleksandr Nikolich, Konstantin Korolev, Sergei Bratchikov, Igor Kiselev and Artem Shelmanov

  • • Mitigating the Linguistic Gap with Phonemic Representations for Robust Cross-lingual Transfer 
    Haeji Jung, Changdae Oh, Jooeon Kang, Jimin Sohn, Kyungwoo Song, Jinkyu Kim and David R Mortensen

  • • Leveraging Adapters for Improved Cross-lingual Transfer for Low-Resource Creole MT 
    Marcell Richard Fekete, Ernests Lavrinovics, Nathaniel Romney Robinson, Heather Lent, Raj Dabre and Johannes Bjerva

  • • Evaluating Multilingual Long-Context Models for Retrieval and Reasoning 
    Ameeta Agrawal, Andy Dang, Sina Bagheri Nezhad, Rhitabrat Pokharel and Russell Scheinberg

  • • Community OSCAR: A Community Effort for Multilingual Web dd 
    Manuel Brack, Malte Ostendorff, Pedro Ortiz Suarez, José Javier Saiz, Iñaki Lacunza Castilla, Jorge Palomar-Giner, Alexander Shvets, Patrick Schramowski, Georg Rehm, Marta Villegas and Kristian Kersting

  • • Leveraging LLMs for Translating and Classifying Mental Health Data
    Konstantinos Skianis, A. Seza Doğruöz and John Pavlopoulos

  • • Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking 
    Emre Can Acikgoz, Mete Erdogan and Deniz Yuret

  • • Unsupervised Text Representation Learning via Instruction-Tuning for Zero-Shot Dense Retrieval 
    Qiuhai Zeng, Zimeng Qiu, Dae Yon Hwang, Xin He and William M. Campbell

  • • Language Bias in Multilingual Information Retrieval: The Nature of the Beast and Mitigation Methods 
    Jinrui Yang, Fan Jiang and Timothy Baldwin

  • • Representational Isomorphism and Alignment of Multilingual Large Language Models 
    Di Wu, Yibin Lei, Andrew Yates and Christof Monz

  • • Generalization Measures for Zero-Shot Cross-Lingual Transfer 
    Saksham Bassi, Duygu Ataman and Kyunghyun Cho

  • • Detecting and Translating Language Ambiguity with Multilingual LLMs 
    Behrang Mehrparvar and Sandro Pezzelle

  • • MLT-DR: Multi-Lingual/Task Demonstration RetrievalAn Attempt towards Generalized Retriever for In-Context Learning 
    Kazuma Hashimoto, Arjun Reddy Akula, Karthik Raman and Michael Bendersky


Main Topics

Topics of interest include, but are not limited to:

   • Understanding the learning dynamics of multi-lingual representation learning methods

   • Multilingual pretraining for discriminative and generative downstream tasks

   • Probing and analysis of multilingual representations

   • New methods for multi-lingual representation learning

   • New approaches to language adaptation of NLP systems

   • Zero-shot and few-shot learning for multilingual NLP

   • Investigating and understanding transfer learning methods for adaptation of NLP systems into previously under-studied languages, such as morphologically-rich languages

   • Data sets, benchmarks or libraries for implementing and evaluating multi-lingual models




Submissions

Research papers: We invite all potential participants to submit their novel research contributions in the related fields as long papers following the EMNLP 2023 long paper format (anonymized with 8 pages excluding the references, and an additional page for the camera-ready versions for the accepted papers). All accepted research papers will be published as part of our workshop proceedings and presented either as oral or poster presentations. Our research paper track will accept submissions through our own submission system available at MRL 2024 OpenReview Submission.


Extended abstracts: Besides long paper submissions, we also invite previously published or ongoing and incomplete research contributions to our non-archival extended abstract track. All extended abstracts can use the same EMNLP template with a 2-page limit, excluding the bibliography.





Shared Task

MRL 2024 continues the 2nd shared task on Multi-task Multi-lingual Information Retrieval, which provides a new multilingual evaluation benchmark for assessment of large scale representation learning models in a diverse set of under-represented languages in a range of predictive and generative tasks.




Important Dates

   • Sep. 1, 2024: Paper Due Date

   • Oct. 3, 2024: Notification of Acceptance

   • Oct. 10, 2024: Camera-ready papers due

   • Nov. 15-16, 2024: Workshop

   • Nov. 12-14, 2024: Main conference

(All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”))




Organizers

David Ifeoluwa Adelani, McGill University and Mila Duygu Ataman, New York University Mammad Hajili, Microsoft Abraham Owodunni, OSU and Masakhane Jonne Sälevä, Brandeis University David Stap, University of Amsterdam Francesco Tinner, University of Amsterdam




Sponsors

Interested in being a Sponsor? Contact us!