The 1st Shared Task on Multilingual Clause-level Morphology
Description and Objectives
Morphology has been widely studied as a word-level task, although in many languages it has complex hierarchical relationships with different layers of language, such as phonetic, syntactic or semantic representations of phrase or sentence-level utterances. The extent of this relationship as well as its complexity, however, still remain unknown. The new shared task on multilingual clause-level morphology aims to investigate methods for morphological analysis or generation of different forms in languages with varying typology, where the modeling and alignment of morphosyntactic structure is accomplished at the level of clauses.
The shared task aims to provide a new benchmark that can help bring novel understandings in:
- • The relationship between morphology and syntax in different languages
- • How morphosyntactic structure aligns across languages with varying typology
- • The performance of conventional statistical methods for language modeling or representation learning in learning formal and semantic features that can generalize across languages, scripts and syntactic structures
- • The limitations of conventional methods for morphological or syntactic modeling as well as the specifications required for developing more comprehensive and theoretically complete models of language
Languages
The shared task will initially include six languages from different language families and with varying morphological characteristics: English, French, German, Hebrew, Russian and Turkish. We anticipate the extension of the benchmark to include more languages as time and resources become available.
Tasks
The shared task can be studied in terms of three parts.
Task 1: Inflection
In this task the input is verbal lemma (the form given as a lexicon entry) and a specific set of inflectional features. The task requires generating the desired output clause manifesting the features.
Examples
Languages | Input | Output |
---|---|---|
English | give IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM) | I will give him to her |
German | geben IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM) | Ich werde ihn ihr geben |
Turkish | vermek IND;FUT;NOM(1,SG);ACC(3,SG);DAT(3,SG) | Onu ona vereceğim |
Hebrew | נתן IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM) | אתן אותו לה |
Task 2: Reinflection
In this task the input is an inflected clause, accompanied by its features, and a new set of features representing the desired form. The task is to generate the desired output that will represent the desired features.
Examples
Languages | Input | Output |
---|---|---|
English | I will give him to her IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM) IND;PRS;NOM(1,PL);ACC(2);DAT(3,PL);NEG |
We don't give you to them |
German | Ich werde ihn ihr geben IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM) IND;PRS;NOM(1,PL);ACC(2,SG);DAT(3,PL);NEG |
Wir geben dich ihnen nicht |
Turkish | Onu ona vereceğim IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM) IND;PRS;PROG;NOM(1,PL);ACC(2,SG);DAT(3,PL);NEG |
Seni onlara vermiyoruz |
Hebrew | אתן אותו לה IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM) IND;PRS;NOM(1,PL,MASC);ACC(2,SG,MASC);DAT(3,PL,FEM);NEG |
אנחנו לא נותנים אותך להן |
Task 3: Analysis
This task is the opposite of task 1, where a system is required to analyze given clauses and generate the lemma and features underlying them.
Examples
Languages | Input | Output |
---|---|---|
English | I will give him to her | give IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM) |
German | Ich werde ihn ihr geben | geben IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM) |
Turkish | Onu ona vereceğim | vermek IND;FUT;NOM(1,SG);ACC(3,SG);DAT(3,SG) |
Hebrew | אתן אותו לה | נתן IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM) |
Participation
Interested parties are invited to join the mailing list at participants-mcmsharedtask-2022@googlegroups.com to be involved in the competition.
All participating systems will be evaluated together with our baselines against the same held-out test set, to be released shortly before evaluation. Submitted systems can compete in some or all sub-tasks. Submission system is now open!
Participating teams will be invited to submit a short paper describing their work to the MRL workshop and to present it in a special session in the workshop. Paper submissions must follow the EMNLP paper format and sent to Softconf Conference Link of MRL 2022 before the paper submission deadline.
Participating teams will be invited to submit a short paper describing their work to the MRL workshop and to present it in a special session in the workshop.
Data
Training and development data for building systems for the above three tasks in six languages can be found on our github repository.
Important dates
May 16, 2022: Release of training and development data
July 20, 2022 August 7, 2022: Release of testing data (including surprise languages)
August 14, 2022: Deadline to release external data and resources used in systems
July 30, 2022 August 22, 2022: Deadline for submission of systems
August 25, 2022: Release of rankings and results
September 15, 2022: Deadline for submitting system description papers
October 10, 2022: Paper notifications
November 9, 2022: Camera-ready papers and posters due
December 8, 2022: Workshop
Evaluation
System outputs will be evaluated using standard evaluation metrics used in morphological analysis and inflection, including the exact match accuracy ratings (precision, recall and F-1) as well as metrics for generated text, such as the edit distance.
Ranking
The systems will be evaluated based on the global ranking on all benchmark languages. Participants can submit systems that are language-specific (monolingual) and their systems will be evaluated as a partial submission to the specific language their system is trained on. We anticipate awarding a prize to the winning team.
Additional resources
The shared task allows participants to use external resources or tools as long as they are openly available and can be, in theory, used by other participants for research purposes. In case participants decide to use external resources and data in their system they should contact the organizers in case the specific resources would be permitted, in such cases specific information on the used resources and how they can be obtained should be shared on the participants mail group at the latest by August 20th, 2022.
Organizers
Omer Goldman, Bar Ilan University
Reut Tsarfaty, Bar Ilan University
Djame Seddah, INRIA Paris
Benjamin Muller, INRIA Paris and Sorbonne University
Benoît Sagot, INRIA Paris
Hila Gonen, University of Washington and Meta AI
Jamshidbek Mirzakhalov, Salesforce
Kelechi Ogueji, University of Waterloo
Francesco Tinner, University of Zurich
Duygu Ataman, New York University
Contact
Feel free to join the mailing list at participants-mcmsharedtask-2022@googlegroups.com or contact mrlw2022@gmail.com if you have any questions about the workshop.