Sarah A. M. Bosscha, Tessel E. Haagen, Richard A. Koetschruyter, Loïs M. Dona and Beatriz Zamith Castro
Previous research has set out to quantify the syntactic capacity of BERT in the context of phenomena such as control verb nesting and verb raising in Dutch (Kogkalidis & Wijnholds, 2022). A similar complex language phenomenon is ellipsis, where information that can be recovered from context is omitted from a sentence. Some previous research has been done regarding ellipsis resolution using BERT (Devlin, Chang, Lee, & Toutanova, 2018; Aralikatte, Lamm, Hardt, & Søgaard, 2019; Lin, Huang, & Wu, 2019), but with little success. Like verb raising and control verb nesting, ellipsis can be used to quantify BERT’s linguistic capacity, since it requires further processing to recover the missing words. In this work, we outline an approach to identify noun-verb dependencies in Dutch sentences with verb phrase ellipsis using BERTje (Vries et al., 2019). We train a probe model to identify these noun-verb dependencies using training data from Lassy (van Noord et al., 2013), whereafter the model is tested on sentences that are generated using a Multiple Context Free Grammar (MCFG) (Pollard, 1984). This MCFG is designed to generate elliptical sentences. Results from this research will inform us about BERT’s syntactic capacities and can facilitate future attempts at ellipsis resolution.