Gender-Based Stylometry Detection for Prominent Women on Twitter Using Machine Learning

Monique Megens and Eva Vanmassenhove

The “broken rung”, the inequality in the path to leadership between men and women, is still a big issue today. Between 2015 and 2020, there has been only limited indication of advancement in the representation of women in the corporate pipeline. During this epoch, the share of women in (senior-) vice-president positions only increased from 23% to 28% (Coury et al., 2020). A lack of gender diversity in executive positions can be attributed to the sustained existence of a male/masculine stereotypical leadership image (Petit, 2014).

In line with the stereotypical leadership image, both recent and earlier literature established that men are more likely to be perceived as prominent or leader-like by others than women (Eagly & Karau, 2002; Massengill & Di Marco, 1979; Schein, 1973). This has been attributed to their more assertive nature and direct style in written/spoken language (Badura, Grijalva, Newman, Yan, & Jeon, 2018). The assertive nature of men often makes them lobby for more, women on the other hand (due to their more communal and more nurturing nature) do not advocate for themselves as much (Sandberg, 2016).

Language is a widely studied and ever-changing phenomenon. Not only does one’s style of written/spoken language change based on external factors like societal and technological developments, but it changes just as much based on one’s internal motivations (Tahmasebi, Borin, Jatowt, Xu, & Hengchen, 2021). The way in which we use language conveys a lot about ourselves, our audiences and the situations we are in. (Un-)consciously chosen and combined words and sentences help us understand ourselves and the culture we live in (Shashevick, 2019). As such, women aspiring to claim their position in the corporate pipeline, driven by an internal motivation, could (un-)consciously adjust their language in order for it to be more male-like, as male-like language features typically have been associated with more leader-like features (Badura et al., 2018). We aim to empirically explore whether and to what extent women alter their style of written language for it to embody more male-like characteristics as they become more prominent. We explore gender-based stylometry changes utilizing a selection of Natural Language Processing (NLP) techniques, Machine Learning (ML) algorithms and publicly available Twitter data.

An assessment of one’s degree of male- or female-like language style is only effective if the language between genders differs in such a way that stylistic differences can be (empirically) identified (Eckert & McConnell-Ginet, 2013; Kocher & Savoy, 2017). Therefore, we first verify whether ML algorithms can correctly distinguish binary gender from plain written text, i.e. Tweets. Thereafter, the optimized ML algorithms are utilized to test generalizability to a sample of prominent individuals, to check if a similar distinction can be made for them. Finally, using Change Point Detection (CPD), we perform a statistical analysis of the chronology and evolution of Tweets written by prominent men and women (i.e. gender-based stylometry detection) and aggregate the results per gender, allowing for a relative comparison of the language evolution across genders.