The 1st Shared Task on Multilingual Clause-level Morphology

Description and Objectives

Morphology has been widely studied as a word-level task, although in many languages it has complex hierarchical relationships with different layers of language, such as phonetic, syntactic or semantic representations of phrase or sentence-level utterances. The extent of this relationship as well as its complexity, however, still remain unknown. The new shared task on multilingual clause-level morphology aims to investigate methods for morphological analysis or generation of different forms in languages with varying typology, where the modeling and alignment of morphosyntactic structure is accomplished at the level of clauses.

The shared task aims to provide a new benchmark that can help bring novel understandings in:

• The relationship between morphology and syntax in different languages
• How morphosyntactic structure aligns across languages with varying typology
• The performance of conventional statistical methods for language modeling or representation learning in learning formal and semantic features that can generalize across languages, scripts and syntactic structures
• The limitations of conventional methods for morphological or syntactic modeling as well as the specifications required for developing more comprehensive and theoretically complete models of language

Languages

The shared task will initially include six languages from different language families and with varying morphological characteristics: English, French, German, Hebrew, Russian and Turkish. We anticipate the extension of the benchmark to include more languages as time and resources become available.

Tasks

The shared task can be studied in terms of three parts.

Task 1: Inflection

In this task the input is verbal lemma (the form given as a lexicon entry) and a specific set of inflectional features. The task requires generating the desired output clause manifesting the features.

Examples

Languages	Input	Output
English	give IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM)	I will give him to her
German	geben IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM)	Ich werde ihn ihr geben
Turkish	vermek IND;FUT;NOM(1,SG);ACC(3,SG);DAT(3,SG)	Onu ona vereceğim
Hebrew	נתן IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM)	אתן אותו לה

Task 2: Reinflection

In this task the input is an inflected clause, accompanied by its features, and a new set of features representing the desired form. The task is to generate the desired output that will represent the desired features.

Examples

Languages	Input	Output
English	I will give him to her IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM) IND;PRS;NOM(1,PL);ACC(2);DAT(3,PL);NEG	We don't give you to them
German	Ich werde ihn ihr geben IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM) IND;PRS;NOM(1,PL);ACC(2,SG);DAT(3,PL);NEG	Wir geben dich ihnen nicht
Turkish	Onu ona vereceğim IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM) IND;PRS;PROG;NOM(1,PL);ACC(2,SG);DAT(3,PL);NEG	Seni onlara vermiyoruz
Hebrew	אתן אותו לה IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM) IND;PRS;NOM(1,PL,MASC);ACC(2,SG,MASC);DAT(3,PL,FEM);NEG	אנחנו לא נותנים אותך להן

Task 3: Analysis

This task is the opposite of task 1, where a system is required to analyze given clauses and generate the lemma and features underlying them.

Examples

Languages	Input	Output
English	I will give him to her	give IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM)
German	Ich werde ihn ihr geben	geben IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM)
Turkish	Onu ona vereceğim	vermek IND;FUT;NOM(1,SG);ACC(3,SG);DAT(3,SG)
Hebrew	אתן אותו לה	נתן IND;FUT;NOM(1,SG);ACC(3,SG,MASC);DAT(3,SG,FEM)

Participation

Interested parties are invited to join the mailing list at participants-mcmsharedtask-2022@googlegroups.com to be involved in the competition. All participating systems will be evaluated together with our baselines against the same held-out test set, to be released shortly before evaluation. Submitted systems can compete in some or all sub-tasks. Submission system is now open! Participating teams will be invited to submit a short paper describing their work to the MRL workshop and to present it in a special session in the workshop. Paper submissions must follow the EMNLP paper format and sent to Softconf Conference Link of MRL 2022 before the paper submission deadline.

Participating teams will be invited to submit a short paper describing their work to the MRL workshop and to present it in a special session in the workshop.

Data

Training and development data for building systems for the above three tasks in six languages can be found on our github repository.

Important dates

May 16, 2022: Release of training and development data
~~July 20, 2022~~ August 7, 2022: Release of testing data (including surprise languages)
August 14, 2022: Deadline to release external data and resources used in systems
~~July 30, 2022~~ August 22, 2022: Deadline for submission of systems
August 25, 2022: Release of rankings and results
September 15, 2022: Deadline for submitting system description papers
October 10, 2022: Paper notifications
November 9, 2022: Camera-ready papers and posters due
December 8, 2022: Workshop

Evaluation

System outputs will be evaluated using standard evaluation metrics used in morphological analysis and inflection, including the exact match accuracy ratings (precision, recall and F-1) as well as metrics for generated text, such as the edit distance.

Ranking

The systems will be evaluated based on the global ranking on all benchmark languages. Participants can submit systems that are language-specific (monolingual) and their systems will be evaluated as a partial submission to the specific language their system is trained on. We anticipate awarding a prize to the winning team.

Additional resources

The shared task allows participants to use external resources or tools as long as they are openly available and can be, in theory, used by other participants for research purposes. In case participants decide to use external resources and data in their system they should contact the organizers in case the specific resources would be permitted, in such cases specific information on the used resources and how they can be obtained should be shared on the participants mail group at the latest by August 20th, 2022.

Organizers

Omer Goldman, Bar Ilan University
Reut Tsarfaty, Bar Ilan University
Djame Seddah, INRIA Paris
Benjamin Muller, INRIA Paris and Sorbonne University
Benoît Sagot, INRIA Paris
Hila Gonen, University of Washington and Meta AI
Jamshidbek Mirzakhalov, Salesforce
Kelechi Ogueji, University of Waterloo
Francesco Tinner, University of Zurich
Duygu Ataman, New York University

Contact

Feel free to join the mailing list at participants-mcmsharedtask-2022@googlegroups.com or contact mrlw2022@gmail.com if you have any questions about the workshop.

The 1st Shared Task on Multilingual Clause-level Morphology

Description and Objectives

Languages

Tasks

Participation

Data

Important dates

Evaluation

Ranking

Additional resources

Organizers

Contact

Workshops

Shared Tasks