The widespread use of language technologies has motivated growing interest in their social impact (Hovy and Spruit, 2016), with gender bias representing a major cause of concern (Sun et al., 2019). As regards translation tools, focused evaluations have exposed that both speech translation (ST) and machine translation (MT) models do overproduce masculine references in their outputs (Bentivogli et al., 2020), except for feminine associations perpetuating traditional gender roles and stereotypes (Stanovsky et al., 2019).
Having identified data as the primary source of gender asymmetries, most works in this context have highlighted the misrepresentation of gender groups in datasets (Vanmassenhove et al., 2018), and focused on the development of data-centred mitigating techniques (Zmigrod et al., 2019; Saunders and Byrne, 2020). Although data are not the only factor contributing to bias (Shah et al., 2020, Vanmassenhove et al., 2019), only a few inquiries paid attention to other technical components that can exacerbate the problem. From an algorithmic perspective, Roberts et al. (2020) expose how “taken-for-granted” approaches may come with high overall translation quality in terms of BLEU scores, but are actually detrimental when it comes to gender bias. Along this line, we focus on ST systems and inspect a core aspect of neural models: word segmentation. Byte-Pair Encoding (BPE) (Sennrich et al., 2016) represents the de-facto standard by yielding better results compared to character-based segmentation in ST (Di Gangi et al., 2020). But does this hold true for gender translation as well?
Languages like French and Italian often exhibit comparatively complex feminine forms, derived from the masculine ones by means of an additional suffix (e.g. en: professor, fr: professeur M vs. professeure F). Also, women and feminine expressions of gender are under-represented in existing corpora. Accordingly, purely statistical segmentation methods could be unfavourable for gender translation: as BPE merges the character sequences that co-occur more frequently, rarer or more complex feminine-marked words may result in less compact sequences of tokens (e.g. en: described, it: des@@critto M vs. des@@crit@@ta F). Due to such typological and distributive conditions, may certain splitting methods hinder the prediction of feminine forms?
In our work (Gaido&Savoldi et al., 2021) we address such questions by implementing three families of segmentation approaches (character, morphological, statistical) on the decoder side of ST models built on the same training data. We compare the resulting models both in terms of overall translation quality and gender accuracy. For two language pairs (en-fr and en-it) we find that the target segmentation method is indeed an important factor for models’ gender bias. Our experiments consistently show that BPE’s highest BLEU scores come at the cost of higher gender bias, whereas character-based models are the best at translating gender. Finally, preliminary automatic and manual analyses suggest that the isolation of the morphemes encoding gender may be a key factor in gender translation. Our work (Gaido&Savoldi et al., 2021) contributes to research on gender bias by unveiling the less accounted role that algorithmic decisions play in its amplification.