Data augmentation for machine translation of sign language of the Netherlands and Flemish Sign Language

Maud Goddefroy and Tim Van de Cruys

The state of the art in machine translation are deep neural networks including transformer models (Vaswani et al., 2017), but a sizeable parallel corpus is needed, typically consisting of written data, in order to train them. When either the source (sign2text) or the target language (text2sign) is a sign language, new challenges emerge. These include the new elements that signers constantly introduce, called productive lexicon (e.g., classifier constructions) and the fact that signers have multiple articulators that they can and will use simultaneously (Baker et al., 2016).

Another challenge is the lack of a generally accepted written form (Baker et al., 2016). An alternative is using gloss-annotations, although glosses are just labels given to each lexical sign; they cannot capture productive lexicon nor the information given with non-manual articulators (Baker et al., 2008; Yin & Read, 2020). Despite these shortcomings, gloss-annotations have been proven to be valuable in sign2text machine translation research. Models using gloss-annotations in some way, the sign2gloss2text models, achieve higher BLEU scores compared to models that do not incorporate the glosses at all (Camgöz et al., 2018, 2020; De Coster et al., 2021; Yin & Read, 2020).

Annotating sign videos with glosses is a time consuming, manual task. For this reason, there exist no datasets to train sign2gloss2text models for most sign languages. For Flemish Sign Language (VGT) and Sign Language of the Netherlands (NGT), the VGT corpus (Van Herreweghe & Vermeerbergen, 2015) and the corpus NGT (Crasborn et al., 2008; Crasborn & Zwitserlood, 2008) are available, but both are only partly annotated with glosses. We propose to bridge this gap with a rule-based system that generates synthetic glosses from Dutch textual data. The system is based on linguistic knowledge of NGT (i.a. Baker et al., 2016; Klomp, 2021) and VGT (i.a. Baker et al., 2008; Heyerick et al., 2011; Van Herreweghe, 1970; Vermeerbergen, 2010) and includes a large set of rules on the word and clause level. The goal is to augment parallel data consisting of only a sign video and a translation with synthetic glosses. Such an augmented corpus should be sizeable enough to train a sign2gloss2text model. Moryossef et al. (2021) proved that there is a significant impact of this technique on a gloss2text translation task for American (ASL) and German (DGS) Sign Language, even though they used only a few linguistically motivated rules (Moryossef et al., 2021).

We will set up multiple experiments to determine the value of the augmented data. In a first step, we used the translated and annotated parts of the corpus NGT and compared our synthetic glosses with the true glosses. This analysis shows that the linguistically informed synthetic glosses approximate the true glosses significantly better than the synthetic glosses generated by a system equivalent to the one proposed by Moryossef et al. (2021). In future work, we will further explore the value of our synthetic glosses in sign2gloss2text models.