The 1st Shared Task on Multilingual Clause-level Morphology


Description and Objectives


Morphology has been widely studied as a word-level task, although in many languages it has complex hierarchical relationships with different layers of language, such as phonetic, syntactic or semantic representations of phrase or sentence-level utterances. The extent of this relationship as well as its complexity, however, still remain unknown. The new shared task on multilingual clause-level morphology aims to investigate methods for morphological analysis or generation of different forms in languages with varying typology, where the modeling and alignment of morphosyntactic structure is accomplished at the level of clauses.

The shared task aims to provide a new benchmark that can help bring novel understandings in:

  •    • The relationship between morphology and syntax in different languages
  •    • How morphosyntactic structure aligns across languages with varying typology
  •    • The performance of conventional statistical methods for language modeling or representation learning in learning formal and semantic features that can generalize across languages, scripts and syntactic structures
  •    • The limitations of conventional methods for morphological or syntactic modeling as well as the specifications required for developing more comprehensive and theoretically complete models of language

Languages


The shared task will initially include six languages from different language families and with varying morphological characteristics: English, French, German, Hebrew, Russian and Turkish. We anticipate the extension of the benchmark to include more languages as time and resources become available.


Tasks


The shared task can be studied in terms of three parts.

Task 1: Inflection

In this task the input is verbal lemma (the form given as a lexicon entry) and a specific set of inflectional features. The task requires generating the desired output clause manifesting the features.


Examples

Languages Input Output
English give IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM) I will give him to her
German geben IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM) Ich werde ihn ihr geben
Turkish vermek IND;FUT;NOM(1,SG);ACC(3,SG);DAT(3,SG) Onu ona vereceğim
Hebrew נתן IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM) אתן אותו לה



Task 2: Reinflection

In this task the input is an inflected clause, accompanied by its features, and a new set of features representing the desired form. The task is to generate the desired output that will represent the desired features.


Examples

Languages Input Output
English I will give him to her IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM)
IND;PRS;NOM(1,PL);ACC(2);DAT(3,PL);NEG
We don't give you to them
German Ich werde ihn ihr geben IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM)
IND;PRS;NOM(1,PL);ACC(2,SG);DAT(3,PL);NEG
Wir geben dich ihnen nicht
Turkish Onu ona vereceğim IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM)
IND;PRS;PROG;NOM(1,PL);ACC(2,SG);DAT(3,PL);NEG
Seni onlara vermiyoruz
Hebrew אתן אותו לה IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM)
IND;PRS;NOM(1,PL,MASC);ACC(2,SG,MASC);DAT(3,PL,FEM);NEG
אנחנו לא נותנים אותך להן



Task 3: Analysis

This task is the opposite of task 1, where a system is required to analyze given clauses and generate the lemma and features underlying them.


Examples

Languages Input Output
English I will give him to her give IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM)
German Ich werde ihn ihr geben geben IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM)
Turkish Onu ona vereceğim vermek IND;FUT;NOM(1,SG);ACC(3,SG);DAT(3,SG)
Hebrew אתן אותו לה נתן IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM)

Participation


Interested parties are invited to join the mailing list at participants-mcmsharedtask-2022@googlegroups.com to be involved in the competition. All participating systems will be evaluated together with our baselines against the same held-out test set, to be released shortly before evaluation. Submitted systems can compete in some or all sub-tasks. Submission system is now open! Participating teams will be invited to submit a short paper describing their work to the MRL workshop and to present it in a special session in the workshop. Paper submissions must follow the EMNLP paper format and sent to Softconf Conference Link of MRL 2022 before the paper submission deadline.

Participating teams will be invited to submit a short paper describing their work to the MRL workshop and to present it in a special session in the workshop.

Data


Training and development data for building systems for the above three tasks in six languages can be found on our github repository.

Important dates


May 16, 2022: Release of training and development data
July 20, 2022 August 7, 2022: Release of testing data (including surprise languages)
August 14, 2022: Deadline to release external data and resources used in systems
July 30, 2022 August 22, 2022: Deadline for submission of systems
August 25, 2022: Release of rankings and results
September 15, 2022: Deadline for submitting system description papers
October 10, 2022: Paper notifications
November 9, 2022: Camera-ready papers and posters due
December 8, 2022: Workshop

Evaluation


System outputs will be evaluated using standard evaluation metrics used in morphological analysis and inflection, including the exact match accuracy ratings (precision, recall and F-1) as well as metrics for generated text, such as the edit distance.

Ranking


The systems will be evaluated based on the global ranking on all benchmark languages. Participants can submit systems that are language-specific (monolingual) and their systems will be evaluated as a partial submission to the specific language their system is trained on. We anticipate awarding a prize to the winning team.

Additional resources


The shared task allows participants to use external resources or tools as long as they are openly available and can be, in theory, used by other participants for research purposes. In case participants decide to use external resources and data in their system they should contact the organizers in case the specific resources would be permitted, in such cases specific information on the used resources and how they can be obtained should be shared on the participants mail group at the latest by August 20th, 2022.

Organizers


Omer Goldman, Bar Ilan University
Reut Tsarfaty, Bar Ilan University
Djame Seddah, INRIA Paris
Benjamin Muller, INRIA Paris and Sorbonne University
Benoît Sagot, INRIA Paris
Hila Gonen, University of Washington and Meta AI
Jamshidbek Mirzakhalov, Salesforce
Kelechi Ogueji, University of Waterloo
Francesco Tinner, University of Zurich
Duygu Ataman, New York University

Contact

    Feel free to join the mailing list at participants-mcmsharedtask-2022@googlegroups.com or contact mrlw2022@gmail.com if you have any questions about the workshop.