3RD MULTILINGUAL REPRESENTATION LEARNING (MRL) WORKSHOP @EMNLP 2023

3RD MULTILINGUAL REPRESENTATION LEARNING (MRL) WORKSHOP

CO-LOCATED WITH EMNLP IN SINGAPORE, DEC 7 2023

Keynote Speakers

Orhan Firat, Google Deepmind

Katharina Kann, UC Boulder and JGU Mainz

Sunayana Sitaram, Microsoft Research

Workshop Schedule

Workshop Location: L1 in Leo 3&4

Poster Session Location: East Foyer located on B2

09:00 - 09:10	Opening remarks
09:10 - 09:50	Invited talk by Sunayana Sitaram
09:50 - 10:30	Invited talk by Katharina Kann
10:30 - 11:00	Coffee Break
11:00 - 12:30	Poster Session
12:30 - 14:00	Lunch Break
14:00 - 14:30	Shared Task Session • Findings Paper • Winning team presentation
14:30 - 15:30	Best Paper Session • Best Paper • Honorable Mentions
15:30 - 16:00	Coffee Break
16:00 - 16:50	Invited talk by Orhan Firat
16:50 - 17:00	Closing remarks

Best Paper Award

• Embedding Structure Matters: Comparing Methods to Adapt Multilingual Vocabularies to New Languages
C.M. Downey, Terra Blevins, Nora Goldfine and Shane Steinert-Threlkeld

Honorable Mentions

• Meta-learning For Vision-and-language Cross-lingual Transfer
Hanxu Hu and Frank Keller
• Adapt and Prune Strategy for Multilingual Speech Foundational Model on Low-resourced Languages
Hyeon Soo Kim, Chung Hyeon Cho, Hyejin Won and Kyung Ho Park

Workshop Description

Multi-lingual representation learning methods have recently been found to be extremely efficient in learning features useful for transfer learning between languages and demonstrating potential in achieving successful adaptation of natural language processing (NLP) models into languages or tasks with little to no training resources. On the other hand, there are many aspects of such models which have the potential for further development and analysis in order to prove their applicability in various contexts. These contexts include different NLP tasks and also understudied language families, which face important obstacles in achieving practical advances that could improve the state-of-the-art in NLP of various low-resource or underrepresented languages.

This workshop aims to bring together the research community consisting of scientists studying different aspects in multilingual representation learning, currently the most promising approach to improve the NLP in low-resource or underrepresented languages, and provide the rapidly growing number of researchers working on the topic with a means of communication and an opportunity to present their work and exchange ideas. The main objectives of the workshop will be:

• To construct and present a wide array of multi-lingual representation learning methods, including their theoretical formulation and analysis, practical aspects such as the application of current state-of-the-art approaches in transfer learning to different tasks or studies on adaptation into previously under-studied context;
• To provide a better understanding on how the language typology may impact the applicability of these methods and motivate the development of novel methods that are more generic or competitive in different languages;
• To promote collaborations in developing novel software libraries or benchmarks in implementing or evaluating multi-lingual models that would accelerate progress in the field.

By allowing a communication means for research groups working on machine learning, linguistic typology, or real-life applications of NLP tasks in various languages to share and discuss their recent findings, our ultimate goal is to support rapid development of NLP methods and tools that are applicable to a wider range of languages.

Accepted Papers

Congratulations to all the accepted papers:

• UniBriVL: Robust Audio Representation and Generation of Audio Driven Diffusion Models
Sen Fang, Bowen Gao, Yangjian Wu and TeikToe Teoh

• Meta-learning For Vision-and-language Cross-lingual Transfer
Hanxu Hu and Frank Keller

• Counterfactually Probing Language Identity in Multilingual Models
Anirudh Srinivasan, Venkata Subrahmanyan Govindarajan and Kyle Mahowald

• A General-Purpose Multilingual Document Encoder
Onur Galoğlu, Robert Litschko and Goran Glavaš

• Zero-Shot Cross-Lingual Sentiment Classification under Distribution Shift: an Exploratory Study
Maarten De Raedt, Semere Kiros Bitew, Fréderic Godin, Thomas Demeester and Chris Develder

• To token or not to token: A Comparative Study of Text Representations for Cross-Lingual Transfer
Md Mushfiqur Rahman, Fardin Ahsan Sakib, Fahim Faisal and Antonios Anastasopoulos

• Adapt and Prune Strategy for Multilingual Speech Foundational Model on Low-resourced Languages
Hyeon Soo Kim, Chung Hyeon Cho, Hyejin Won and Kyung Ho Park

• Multilingual Word Embeddings for Low-Resource Languages using Anchors and a Chain of Related Languages
Viktor Hangya, Silvia Severini, Radoslav Ralev, Alexander Fraser and Hinrich Schütze

• Explicit Representation Alignment Enables Cross-Lingual Transfer
Tom Sherborne, Tom Hosking and Mirella Lapata

• TalaMT: Multilingual Machine Translation for Cabécar-Bribri-Spanish
Alex Jones, Rolando Coto-Solano and Guillermo González Campos

• Mergen: The First Manchu-Korean Machine Translation Model Trained on Augmented Data
Jean Seo, Sungjoo Byun, Minha Kang and Sangah Lee

• Improving Cross-Lingual Transfer for Open Information Extraction with Linguistic Feature Projection
Youmi Ma, Bhushan Kotnis, Carolin Lawrence, Goran Glavaš and Naoaki Okazaki

• Geographic and Geopolitical Biases of Language Models
Fahim Faisal and Antonios Anastasopoulos

• Task-Based MoE for Multitask Multilingual Machine Translation
Hai Pham, Young Jin Kim, Subhabrata Mukherjee, David P. Woodruff, Barnabas Poczos and Hany Hassan

• Empowering Cross-lingual Abilities of Instruction-tuned Large Language Models by Translation-following demonstrations
Leonardo Ranaldi and Giulia Pucci

• CAPIVARA: Cost-Efficient Approach for Improving Multilanguage CLIP Performance on Low-Resource Languages
Gabriel Oliveira dos Santos, Diego Alysson Moreia, Alef Iury Ferreira, Jhessica Silva, Luiz Pereira, pedro bueno, Thiago Sousa, Helena Maia, Nádia Da Silva, Esther Colombini, Helio Pedrini and Sandra Avila

• Code-switching as a cross-lingual Training Signal: an Example with Unsupervised Bilingual Embedding
Felix Gaschi, Ilias El-Baamrani, Barbara Gendron, Parisa RASTIN and Yannick Toussaint

• Learning to translate by learning to communicate
C.M. Downey, Xuhui Zhou, Zeyu Liu and Shane Steinert-Threlkeld

• Contrastive Learning for Universal Zero-Shot NLI with Cross-Lingual Sentence Embeddings
Md Kowsher, Md. Shohanur Islam Sobuj, Nusrat Jahan Prottasha, Mohammad Shamsul Arefin and Yasuhiko Morimoto

• UD-MULTIGENRE – a UD-Based Dataset Enriched with Instance-Level Genre Annotations
Vera Danilova and Sara Stymne

• Comparing Styles across Languages
Shreya Havaldar, Matthew Pressimone, Eric Wong and Lyle Ungar

• Crosslingual Structural Priming and the Pre-Training Dynamics of Bilingual Language Models
Catherine Arnett, Tyler Chang, James Michaelov and Benjamin Bergen

• Embedding structure matters: Comparing methods to adapt multilingual vocabularies to new languages
C.M. Downey, Terra Blevins, Nora Goldfine and Shane Steinert-Threlkeld

• Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval
Jinrui Yang, Timothy Baldwin and Trevor Cohn

• Generating Continuations in Multilingual Idiomatic Contexts
Rhitabrat Pokharel and Ameeta Agrawal

Findings Papers

• mLongT5: A Multilingual and Efficient Text-To-Text Transformer forLonger Sequences
David Uthus,Santiago Ontanon,Joshua Ainslie,Mandy Guo

• ICU: Conquering Language Barriers in Vision-and-Language Modeling by Dividing the Tasks into Image Captioning and Language Understanding
Guojun Wu

• TRIP: Accelerating Document-level Multilingual Pre-training via Triangular Document-level Pre-training on Parallel Data Triplets
Hongyuan Lu,Haoyang Huang,Shuming Ma,Dongdong Zhang,Wai Lam,Zhaochuan Gao,Anthony Aue,Arul Menezes,Furu Wei

• mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations
Jonas Pfeiffer,Francesco Piccinno,Massimo Nicosia,Xinyi Wang,Machel Reid,Sebastian Ruder

• Diversifying language models for lesser-studied languages and language-usage contexts: A case of second language Korean
Hakyung Sung,Gyu-Ho Shin

• TaTA: A Multilingual Table-to-Text Dataset for African Languages
Sebastian Gehrmann,Sebastian Ruder,Vitaly Nikolaev,Jan A. Botha,Michael Chavinda,Ankur P Parikh,Clara E. Rivera

• Romanization-based Large-scale Adaptation of Multilingual Language Models
Sukannya Purkayastha,Sebastian Ruder,Jonas Pfeiffer,Iryna Gurevych,Ivan Vulić

• Exploring the Effectiveness of Multi-Lingual Commonsense Knowledge-Aware Open-Domain Dialogue Response Generation
Sixing Wu,Jiong Yu,Tianshi Che,Yang Zhou,Wei Zhou

• PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning
Yongil Kim,Yerin Hwang,Hyeongu Yun,Seunghyun Yoon,Trung Bui,Kyomin Jung

• Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs
Yihong Liu,Haotian Ye,Leonie Weissweiler,Renhao Pei,Hinrich Schuetze

• A Joint Matrix Factorization Analysis of Multilingual Representations
Zheng Zhao,Yftah Ziser,Bonnie L. Webber,Shay B Cohen

• MEEP: Is this Engaging? Prompting Large Language Models for Dialogue Evaluation in Multilingual Settings
Amila Ferron,Amber Shore,Ekata Mitra,Ameeta Agrawal

• X-SNS: Cross-Lingual Transfer Prediction through Sub-Network Similarity
Taejun Yun,Jinhyeon Kim,Deokyeong Kang,Seonghoon Lim,Jihoon Kim,Taeuk Kim

• KBioXLM: A Knowledge-anchored Biomedical Multilingual Pretrained Language Model
Lei Geng,Xu Yan,Ziqiang Cao,Juntao Li,Wenjie Li,Sujian Li,Xinjie Zhou,Yang Yang,Jun Zhang

• Good Meta-tasks Make A Better Cross-lingual Meta-transfer Learning for Low-resource Languages
Linjuan Wu,Zongyi Guo,Baoliang Cui,Haihong Tang,Weiming Lu

• Multilingual Generation and Answering of Questions from Texts and Knowledge Graphs
Kelvin Han,Claire Gardent

• A Multi-Modal Multilingual Benchmark for Document Image Classification
Yoshinari Fujinuma,Siddharth Varia,Nishant Sankaran,Srikar Appalaraju,Bonan Min,Yogarshi Vyas

• CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation
Weixiang Yan,Yuchen Tian,Yunzhe Li,Qian Chen,Wen Wang

• Extrapolating Multilingual Understanding Models as Multilingual Generators
Bohong Wu,Fei Yuan,hai zhao,Lei Li,Jingjing Xu

• GlotLID: Language Identification for Low-Resource Languages
Amir Hossein Kargaran,Ayyoob Imani,François Yvon,Hinrich Schuetze

• Take a Closer Look at Multilinguality! Improve Multilingual Pre-Training Using Monolingual Corpora Only
Jinliang Lu,Yu Lu,Jiajun Zhang

• Is Robustness Transferable across Languages in Multilingual Neural Machine Translation?
Leiyu Pan,Deyi Xiong

• Dialect-to-Standard Normalization: A Large-Scale Multilingual Evaluation
Olli Kuparinen,Aleksandra Miletić,Yves Scherrer

• Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting
Haoyang Huang,Tianyi Tang,Dongdong Zhang,Xin Zhao,Ting Song,Yan Xia,Furu Wei

• Towards a Deep Understanding of Multilingual End-to-End Speech Translation
Haoran Sun,Xiaohu Zhao,Yikun Lei,shaolin Zhu,Deyi Xiong

• Towards Multilingual Interlinear Morphological Glossing
Shu Okabe,François Yvon

• T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks
Iker García-Ferrero,Rodrigo Agerri,German Rigau

• Breaking the Language Barrier: Improving Cross-Lingual Reasoning with Structured Self-Attention
Negar Foroutan,Mohammadreza Banaei,Karl Aberer,Antoine Bosselut

• Unleashing the Multilingual Encoder Potential: Boosting Zero-Shot Performance via Probability Calibration
Ercong Nie,Helmut Schmid,Hinrich Schuetze

• Interpreting Indirect Answers to Yes-No Questions in Multiple Languages
Zijie Wang,Md Mosharaf Hossain,Shivam Mathur,Terry Cruz Melo,Kadir Bulut Ozler,Keun Hee Park,Jacob Quintero,MohammadHossein Rezaei,Shreya Nupur Shakya,Md Nayem Uddin,Eduardo Blanco

• BERTwich: Extending BERT’s Capabilities to Model Dialectal and Noisy Text
Aarohi Srivastava,David Chiang

• Isotropic Representation Can Improve Zero-Shot Cross-Lingual Transfer on Multilingual Language Models
Yixin Ji,Jikai Wang,Juntao Li,Hai Ye,Min Zhang

• One For All & All For One: Bypassing Hyperparameter Tuning with Model Averaging for Cross-Lingual Transfer
Fabian David Schmidt,Ivan Vulić,Goran Glavaš

• Multilingual Lottery Tickets to Pretrain Language Models
Jaeseong Lee,seung-won hwang

• In What Languages are Generative Language Models the Most Formal? Analyzing Formality Distribution across Languages
Asım Ersoy,Gerson Vizcarra,Tahsin Mayeesha,Benjamin Muller

Shared Task

MRL 2023 featured a new shared task on Multi-task Multi-lingual Information Retrieval, which provides a new multilingual evaluation benchmark for assessment of large scale representation learning models in a diverse set of under-represented languages in a range of predictive and generative tasks. The results of the shared task will be available very soon.

Important Dates

• ~~Sep. 8, 2023: Paper Due Date~~

• ~~Oct. 6, 2023: Notification of Acceptance~~

• ~~Oct. 18, 2023: Camera-ready papers due~~

• Dec. 7, 2023: Workshop

• Dec. 8-10, 2023: Main conference

(All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”))

Organizers

David Ifeoluwa Adelani, Google Deepmind and UCL	Duygu Ataman, NYU	Chris Emezue, TU Munich and Mila	Omer Goldman, Bar-Ilan University	Hila Gonen, UW and Meta AI	Mammad Hajili, Microsoft	Benjamin Muller, Meta	Sebastian Ruder, Google	Gözde Gül Şahin, Koç University	Francesco Tinner, University of Amsterdam	Genta Indra Winata, Bloomberg