SIGTYP -- Workshop 2024 Schedule

SIGTYP2024 — MARCH,22nd — Malta/Hybrid

We kindly invite everyone to join the virtual part of SIGTYP 2024! Below you may explore papers, slides, and recorded talks. On Google Group you may also find a single Zoom link that will be used during the day of the workshop.
SIGTYP2024 Proceedings are now available here.
Our colleagues Oksana Dereza, Frederick Riemenschneider, An Tuan Dao, Ercong Nie, Jessica Nieder have kindly agreed to help us with the in-person part and will be assisting you during the workshop!

Time zone: Malta

8:50 – 9:00 Opening Session (Oksana Dereza)

By SIGTYP2024 Organizing Committee

Opening remarks: the SIGTYP 2024 workshop, SIGTYP development and MRL! Slides are available here

✻ Keynote Talk ✻

9:00 – 10:00 Christian Bentz (Keynote): Zipfian Laws Across Diverse Languages

There are few - if any - universals which hold across all known languages. Promising candidates are quantitative laws such as Zipf’s law of word frequencies and Zipf’s law of abbreviation. This talk will review some of the current research into these laws from a cross-linguistic perspective. This includes a discussion of the methodological challenges when working with diverse languages, modalities,and writing systems, as well as the controversial question how “meaningful” the laws are given random baselines. Finally, an avenue for further research is explored: the challenge of defining a statistical fingerprint for human languages.
Bio: Christian Bentz is currently an Assistant Professor at the Department of General Linguistics, University of Tübingen. He received his PhD in Computation, Cognition, and Language from the University of Cambridge. His research interests include information theory, quantitative linguistics, language typology, and language evolution.

Slides Chris' Website

✻ Low-Resource NLP ✻

10:00 – 10:15 GUIDE: Creating Semantic Domain Dictionaries for Low-Resource Languages

By Jonathan Janetzki, Gerard De Melo, Joshua Nemecek and Daniel Lee Whitenack

Over 7,000 of the world’s 7,168 living languages are still low-resourced. This paper aims to narrow the language documentation gap by creating multiparallel dictionaries, clustered by SIL’s semantic domains. This task is new for machine learning and has previously been done manually by native speakers. We propose GUIDE, a language-agnostic tool that uses a GNN to create and populate semantic domain dictionaries, using seed dictionaries and Bible translations as a parallel text corpus. Our work sets a new benchmark, achieving an exemplary average precision of 60% in eight zero-shot evaluation languages and predicting an average of 2,400 dictionary entries. We share the code, model, multilingual evaluation data, and new dictionaries with the research community.

✻ Keynote Talk ✻

✻ Low-Resource NLP ✻

✻ Typology and Language Comparison ✻

✻ Keynote Talk ✻

✻ Multilinguality ✻

✻ Shared Task Session ✻

✻ Typology and Human Language Processing ✻

✻ Findings ✻