The Corpora Games All-inclusive games machine translation

Sarah Theroine, María Isabel Rivas Ginel, Eva Vanmassenhove, Éric Poirier, Loïc Barrault, Felix Hoberg, Izabella Thomas, Lucie Bernard, Oliver Czulo and Sebastian Vincent

The All-inGMT project, funded by the ISITE BFC and the «Investissement d’Avenir» programmes, was created as a response to the needs of our contemporary society and is niched at the intersection of matters dealing with personal identity, new technologies, video games, and current practices in the translation market. Nowadays we can observe that the community strives to find solutions to erase gender binarity from our language and transition towards inclusivity. Nevertheless, in the field of translation, the increasing adoption of Machine Translation (MT) systems threatens these efforts due to the gender bias they generate and their inability to detect and reproduce inclusive language. Moreover, the particularities of the video game localisation field—i.e. lack of visual environment, lack of text linearity, use of variables, and constant changes in the source text—in combination with the use of MT systems further complicate the task of dealing with gender. As such, professional localisers struggle to identify the gender of the character or the addressee in a conversation and are prone to cause mistranslations (Theroine et al., 2021).

Numerous papers have studied the causes and occurrences of gender bias due to MT and translation scholars have focused on reducing their impact (Stanovsky et al., 2019; Saunders et al., 2020; Savoldi et al., 2021) as well as tackling the issue of inclusivity, neutral language, and neutralisation (Vanmassenhove et al., 2021). All-inGMT aims at applying these endeavours to the field of video game localisation by conceiving an NMT system specially trained to deal with non-binary characters from English to French. The base of the one-year-long project is a corpus compiled from video games already translated by humans that contain at least one non-binary character and use techniques of neutralisation, reformulation, and newly created pronouns. This paper will present the current status of the project as well as the different methods used to improve the quality of the collected data, the preliminary results, and the techniques involved in augmenting the quantity of the materials. The latter has mostly consisted of re-writing the scripts both to create a second version of the said batch of data by paraphrasing, summarising, and artificially increasing the number of non-binary characters as well as including texts from other sources.

Savoldi, Beatrice; Gaido, Marco; Bentivogli, Luisa; Negri, Matteo; Turchi, Marco. (2021) “Gender Bias in Machine Translation”. Transactions of the Association for Computational Linguistics; 9 845–874.

Saunders, Danielle; Sallis, Rosie; Byrne, Bill (2020) “Neural Machine Translation Doesn’t Translate Gender Coreference Right Unless You Make It”

Stanovsky, Gabriel; Smith, Noah A.; Zettlemoyer, Luke. (2019) “Evaluating Gender Bias in Machine Translation”. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1679–1684, Florence, Italy. Association for Computational Linguistics.

Theroine, Sarah; Rivas Ginel, María Isabel; Perrin, Aurélie. (2021) “Au Cœur de La Terminologie Du Jeu Vidéo. L’absence de Ressources, Frein Majeur Pour Les Traducteurs” Traduire (244):27–40.

Vanmassenhove, Eva; Emmery, Chris; Shterionov, Dimitar (2021) “NeuTral Rewriter: A Rule-Based and Neural Approach to Automatic Rewriting into Gender-Neutral Alternatives”