Mapping n-deletion in Dutch using Twitter location data

Tommy Pieterse, Hans van Halteren and Roeland van Hout

The phenomenon of n-deletion in Dutch concerns, stated succinctly, the deletion of /n/ following /ə/ at the end of a morpheme. Whether and where n-deletion occurs in the speech of different speakers of Dutch has been the subject of several linguistic studies. Van de Velde and Van Hout (2003) distinguished two categories of factors influencing n-deletion: external and internal factors. External factors included region, age, and sex, while internal factors included word class, whether the /n/ was part of a suffix, and prosodic boundary and focus. Their study indicated that these previous factors all seem to have differing effects on the occurrence of n-deletion. However, the exact nature of the interplay between these factors remained somewhat uncertain.

In recent years, dialectologists have made attempts to utilize Twitter location data in order to map out linguistic features. Gonçalves and Sánchez (2014), for example, used this relatively new source of data to map out lexical variation across the Spanish world, leading them to the conclusion that Spanish spoken across the world consists of two “superdialects.” Jones (2015) focused on mapping out phonological variation in African-American English by determining the geographical distribution of certain types of non-standard spellings that are linked to non-standard pronunciations. Van Halteren et al. (2018) directed their attention to mapping out isoglosses relating to Limburgian dialect features in the Netherlands, leading to maps that tended to line up with existing knowledge on these isoglosses, albeit with more diffuse boundaries. Lastly, Willis (2020) sought to map out morphosyntactic variation in Welsh. While the total number of these studies might be somewhat low, they are united in the fact that they have all provided results that indicate that tweets can indeed form a viable data source when it comes to mapping out dialect features.

The main purpose of this study is to map the occurrence of n-deletion in tweets from the Netherlands, taking into account both the previously described contexts of occurrence and the geographical provenance. A secondary aim is to assess the usefulness of using Twitter data in mapping this type of phonological feature in general.

Our utilized dataset consisted of data from the (as of right now still unpublished) TwiNT corpus compiled by Stef Grondelaers, supplemented with additional data from the more expansive TwiNL corpus (Tjong Kim Sang & Van den Bosch, 2013). In order to measure the occurrence of n-deletion, search patterns were constructed to match relevant word forms. A subset of users with a seemingly persistent location was constructed, and the tweets of those users were searched for words allowing for n-deletion, either in their normal or in an n-deleted form. After this, the occurrence of n-deletion in the previously mentioned contexts was mapped out geographically.