3RD MULTILINGUAL REPRESENTATION LEARNING (MRL) WORKSHOP

CO-LOCATED WITH EMNLP IN SINGAPORE, DEC 7 2023



Keynote Speakers




Workshop Schedule

Workshop Location: L1 in Leo 3&4

Poster Session Location: East Foyer located on B2


09:00 - 09:10  Opening remarks
09:10 - 09:50  Invited talk by Sunayana Sitaram
09:50 - 10:30  Invited talk by Katharina Kann
10:30 - 11:00  Coffee Break
11:00 - 12:30  Poster Session
12:30 - 14:00  Lunch Break
14:00 - 14:30  Shared Task Session
  • • Findings Paper
  • • Winning team presentation
14:30 - 15:30  Best Paper Session
  • • Best Paper
  • • Honorable Mentions
15:30 - 16:00  Coffee Break
16:00 - 16:50  Invited talk by Orhan Firat
16:50 - 17:00  Closing remarks

Best Paper Award

• Embedding Structure Matters: Comparing Methods to Adapt Multilingual Vocabularies to New Languages
C.M. Downey, Terra Blevins, Nora Goldfine and Shane Steinert-Threlkeld

Honorable Mentions

• Meta-learning For Vision-and-language Cross-lingual Transfer
Hanxu Hu and Frank Keller
• Adapt and Prune Strategy for Multilingual Speech Foundational Model on Low-resourced Languages
Hyeon Soo Kim, Chung Hyeon Cho, Hyejin Won and Kyung Ho Park



Workshop Description

Multi-lingual representation learning methods have recently been found to be extremely efficient in learning features useful for transfer learning between languages and demonstrating potential in achieving successful adaptation of natural language processing (NLP) models into languages or tasks with little to no training resources. On the other hand, there are many aspects of such models which have the potential for further development and analysis in order to prove their applicability in various contexts. These contexts include different NLP tasks and also understudied language families, which face important obstacles in achieving practical advances that could improve the state-of-the-art in NLP of various low-resource or underrepresented languages.

This workshop aims to bring together the research community consisting of scientists studying different aspects in multilingual representation learning, currently the most promising approach to improve the NLP in low-resource or underrepresented languages, and provide the rapidly growing number of researchers working on the topic with a means of communication and an opportunity to present their work and exchange ideas. The main objectives of the workshop will be:

  •    • To construct and present a wide array of multi-lingual representation learning methods, including their theoretical formulation and analysis, practical aspects such as the application of current state-of-the-art approaches in transfer learning to different tasks or studies on adaptation into previously under-studied context;
  •    • To provide a better understanding on how the language typology may impact the applicability of these methods and motivate the development of novel methods that are more generic or competitive in different languages;
  •    • To promote collaborations in developing novel software libraries or benchmarks in implementing or evaluating multi-lingual models that would accelerate progress in the field.


By allowing a communication means for research groups working on machine learning, linguistic typology, or real-life applications of NLP tasks in various languages to share and discuss their recent findings, our ultimate goal is to support rapid development of NLP methods and tools that are applicable to a wider range of languages.




Accepted Papers

Congratulations to all the accepted papers:

  • • UniBriVL: Robust Audio Representation and Generation of Audio Driven Diffusion Models 
    Sen Fang, Bowen Gao, Yangjian Wu and TeikToe Teoh

  • • Meta-learning For Vision-and-language Cross-lingual Transfer 
    Hanxu Hu and Frank Keller

  • • Counterfactually Probing Language Identity in Multilingual Models 
    Anirudh Srinivasan, Venkata Subrahmanyan Govindarajan and Kyle Mahowald

  • • A General-Purpose Multilingual Document Encoder 
    Onur Galoğlu, Robert Litschko and Goran Glavaš

  • • Zero-Shot Cross-Lingual Sentiment Classification under Distribution Shift: an Exploratory Study 
    Maarten De Raedt, Semere Kiros Bitew, Fréderic Godin, Thomas Demeester and Chris Develder

  • • To token or not to token: A Comparative Study of Text Representations for Cross-Lingual Transfer 
    Md Mushfiqur Rahman, Fardin Ahsan Sakib, Fahim Faisal and Antonios Anastasopoulos

  • • Adapt and Prune Strategy for Multilingual Speech Foundational Model on Low-resourced Languages 
    Hyeon Soo Kim, Chung Hyeon Cho, Hyejin Won and Kyung Ho Park

  • • Multilingual Word Embeddings for Low-Resource Languages using Anchors and a Chain of Related Languages 
    Viktor Hangya, Silvia Severini, Radoslav Ralev, Alexander Fraser and Hinrich Schütze

  • • Explicit Representation Alignment Enables Cross-Lingual Transfer 
    Tom Sherborne, Tom Hosking and Mirella Lapata

  • • TalaMT: Multilingual Machine Translation for Cabécar-Bribri-Spanish 
    Alex Jones, Rolando Coto-Solano and Guillermo González Campos

  • • Mergen: The First Manchu-Korean Machine Translation Model Trained on Augmented Data 
    Jean Seo, Sungjoo Byun, Minha Kang and Sangah Lee

  • • Improving Cross-Lingual Transfer for Open Information Extraction with Linguistic Feature Projection 
    Youmi Ma, Bhushan Kotnis, Carolin Lawrence, Goran Glavaš and Naoaki Okazaki

  • • Geographic and Geopolitical Biases of Language Models 
    Fahim Faisal and Antonios Anastasopoulos

  • • Task-Based MoE for Multitask Multilingual Machine Translation 
    Hai Pham, Young Jin Kim, Subhabrata Mukherjee, David P. Woodruff, Barnabas Poczos and Hany Hassan

  • • Empowering Cross-lingual Abilities of Instruction-tuned Large Language Models by Translation-following demonstrations 
    Leonardo Ranaldi and Giulia Pucci

  • • CAPIVARA: Cost-Efficient Approach for Improving Multilanguage CLIP Performance on Low-Resource Languages 
    Gabriel Oliveira dos Santos, Diego Alysson Moreia, Alef Iury Ferreira, Jhessica Silva, Luiz Pereira, pedro bueno, Thiago Sousa, Helena Maia, Nádia Da Silva, Esther Colombini, Helio Pedrini and Sandra Avila

  • • Code-switching as a cross-lingual Training Signal: an Example with Unsupervised Bilingual Embedding 
    Felix Gaschi, Ilias El-Baamrani, Barbara Gendron, Parisa RASTIN and Yannick Toussaint

  • • Learning to translate by learning to communicate 
    C.M. Downey, Xuhui Zhou, Zeyu Liu and Shane Steinert-Threlkeld

  • • Contrastive Learning for Universal Zero-Shot NLI with Cross-Lingual Sentence Embeddings 
    Md Kowsher, Md. Shohanur Islam Sobuj, Nusrat Jahan Prottasha, Mohammad Shamsul Arefin and Yasuhiko Morimoto

  • • UD-MULTIGENRE – a UD-Based Dataset Enriched with Instance-Level Genre Annotations
    Vera Danilova and Sara Stymne

  • • Comparing Styles across Languages 
    Shreya Havaldar, Matthew Pressimone, Eric Wong and Lyle Ungar

  • • Crosslingual Structural Priming and the Pre-Training Dynamics of Bilingual Language Models 
    Catherine Arnett, Tyler Chang, James Michaelov and Benjamin Bergen

  • • Embedding structure matters: Comparing methods to adapt multilingual vocabularies to new languages 
    C.M. Downey, Terra Blevins, Nora Goldfine and Shane Steinert-Threlkeld

  • • Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval 
    Jinrui Yang, Timothy Baldwin and Trevor Cohn

  • • Generating Continuations in Multilingual Idiomatic Contexts 
    Rhitabrat Pokharel and Ameeta Agrawal



Findings Papers

  • • mLongT5: A Multilingual and Efficient Text-To-Text Transformer forLonger Sequences 
    David Uthus,Santiago Ontanon,Joshua Ainslie,Mandy Guo

  • • ICU: Conquering Language Barriers in Vision-and-Language Modeling by Dividing the Tasks into Image Captioning and Language Understanding 
    Guojun Wu

  • • TRIP: Accelerating Document-level Multilingual Pre-training via Triangular Document-level Pre-training on Parallel Data Triplets 
    Hongyuan Lu,Haoyang Huang,Shuming Ma,Dongdong Zhang,Wai Lam,Zhaochuan Gao,Anthony Aue,Arul Menezes,Furu Wei

  • • mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations 
    Jonas Pfeiffer,Francesco Piccinno,Massimo Nicosia,Xinyi Wang,Machel Reid,Sebastian Ruder

  • • Diversifying language models for lesser-studied languages and language-usage contexts: A case of second language Korean 
    Hakyung Sung,Gyu-Ho Shin

  • • TaTA: A Multilingual Table-to-Text Dataset for African Languages 
    Sebastian Gehrmann,Sebastian Ruder,Vitaly Nikolaev,Jan A. Botha,Michael Chavinda,Ankur P Parikh,Clara E. Rivera

  • • Romanization-based Large-scale Adaptation of Multilingual Language Models 
    Sukannya Purkayastha,Sebastian Ruder,Jonas Pfeiffer,Iryna Gurevych,Ivan Vulić

  • • Exploring the Effectiveness of Multi-Lingual Commonsense Knowledge-Aware Open-Domain Dialogue Response Generation 
    Sixing Wu,Jiong Yu,Tianshi Che,Yang Zhou,Wei Zhou

  • • PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning 
    Yongil Kim,Yerin Hwang,Hyeongu Yun,Seunghyun Yoon,Trung Bui,Kyomin Jung

  • • Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs 
    Yihong Liu,Haotian Ye,Leonie Weissweiler,Renhao Pei,Hinrich Schuetze

  • • A Joint Matrix Factorization Analysis of Multilingual Representations 
    Zheng Zhao,Yftah Ziser,Bonnie L. Webber,Shay B Cohen

  • • MEEP: Is this Engaging? Prompting Large Language Models for Dialogue Evaluation in Multilingual Settings 
    Amila Ferron,Amber Shore,Ekata Mitra,Ameeta Agrawal

  • • X-SNS: Cross-Lingual Transfer Prediction through Sub-Network Similarity 
    Taejun Yun,Jinhyeon Kim,Deokyeong Kang,Seonghoon Lim,Jihoon Kim,Taeuk Kim

  • • KBioXLM: A Knowledge-anchored Biomedical Multilingual Pretrained Language Model 
    Lei Geng,Xu Yan,Ziqiang Cao,Juntao Li,Wenjie Li,Sujian Li,Xinjie Zhou,Yang Yang,Jun Zhang

  • • Good Meta-tasks Make A Better Cross-lingual Meta-transfer Learning for Low-resource Languages 
    Linjuan Wu,Zongyi Guo,Baoliang Cui,Haihong Tang,Weiming Lu

  • • Multilingual Generation and Answering of Questions from Texts and Knowledge Graphs 
    Kelvin Han,Claire Gardent

  • • A Multi-Modal Multilingual Benchmark for Document Image Classification 
    Yoshinari Fujinuma,Siddharth Varia,Nishant Sankaran,Srikar Appalaraju,Bonan Min,Yogarshi Vyas

  • • CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation 
    Weixiang Yan,Yuchen Tian,Yunzhe Li,Qian Chen,Wen Wang

  • • Extrapolating Multilingual Understanding Models as Multilingual Generators 
    Bohong Wu,Fei Yuan,hai zhao,Lei Li,Jingjing Xu

  • • GlotLID: Language Identification for Low-Resource Languages 
    Amir Hossein Kargaran,Ayyoob Imani,François Yvon,Hinrich Schuetze

  • • Take a Closer Look at Multilinguality! Improve Multilingual Pre-Training Using Monolingual Corpora Only 
    Jinliang Lu,Yu Lu,Jiajun Zhang

  • • Is Robustness Transferable across Languages in Multilingual Neural Machine Translation? 
    Leiyu Pan,Deyi Xiong

  • • Dialect-to-Standard Normalization: A Large-Scale Multilingual Evaluation 
    Olli Kuparinen,Aleksandra Miletić,Yves Scherrer

  • • Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting 
    Haoyang Huang,Tianyi Tang,Dongdong Zhang,Xin Zhao,Ting Song,Yan Xia,Furu Wei

  • • Towards a Deep Understanding of Multilingual End-to-End Speech Translation 
    Haoran Sun,Xiaohu Zhao,Yikun Lei,shaolin Zhu,Deyi Xiong

  • • Towards Multilingual Interlinear Morphological Glossing 
    Shu Okabe,François Yvon

  • • T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks 
    Iker García-Ferrero,Rodrigo Agerri,German Rigau

  • • Breaking the Language Barrier: Improving Cross-Lingual Reasoning with Structured Self-Attention 
    Negar Foroutan,Mohammadreza Banaei,Karl Aberer,Antoine Bosselut

  • • Unleashing the Multilingual Encoder Potential: Boosting Zero-Shot Performance via Probability Calibration 
    Ercong Nie,Helmut Schmid,Hinrich Schuetze

  • • Interpreting Indirect Answers to Yes-No Questions in Multiple Languages 
    Zijie Wang,Md Mosharaf Hossain,Shivam Mathur,Terry Cruz Melo,Kadir Bulut Ozler,Keun Hee Park,Jacob Quintero,MohammadHossein Rezaei,Shreya Nupur Shakya,Md Nayem Uddin,Eduardo Blanco

  • • BERTwich: Extending BERT’s Capabilities to Model Dialectal and Noisy Text 
    Aarohi Srivastava,David Chiang

  • • Isotropic Representation Can Improve Zero-Shot Cross-Lingual Transfer on Multilingual Language Models 
    Yixin Ji,Jikai Wang,Juntao Li,Hai Ye,Min Zhang

  • • One For All & All For One: Bypassing Hyperparameter Tuning with Model Averaging for Cross-Lingual Transfer 
    Fabian David Schmidt,Ivan Vulić,Goran Glavaš

  • • Multilingual Lottery Tickets to Pretrain Language Models 
    Jaeseong Lee,seung-won hwang

  • • In What Languages are Generative Language Models the Most Formal? Analyzing Formality Distribution across Languages 
    Asım Ersoy,Gerson Vizcarra,Tahsin Mayeesha,Benjamin Muller



Shared Task

MRL 2023 featured a new shared task on Multi-task Multi-lingual Information Retrieval, which provides a new multilingual evaluation benchmark for assessment of large scale representation learning models in a diverse set of under-represented languages in a range of predictive and generative tasks. The results of the shared task will be available very soon.




Important Dates

   • Sep. 8, 2023: Paper Due Date

   • Oct. 6, 2023: Notification of Acceptance

   • Oct. 18, 2023: Camera-ready papers due

   • Dec. 7, 2023: Workshop

   • Dec. 8-10, 2023: Main conference

(All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”))




Organizers

David Ifeoluwa Adelani, Google Deepmind and UCL Duygu Ataman, NYU Chris Emezue, TU Munich and Mila Omer Goldman, Bar-Ilan University Hila Gonen, UW and Meta AI Mammad Hajili, Microsoft Benjamin Muller, Meta Sebastian Ruder, Google Gözde Gül Şahin, Koç University Francesco Tinner, University of Amsterdam Genta Indra Winata, Bloomberg




Sponsors

Google

Bloomberg

Interested in being a Sponsor? Contact us!