Modelling Language Acquisition through Syntactico-Semantic Pattern Finding on Semantically Annotated Corpora

Jonas Doumen, Katrien Beuls and Paul Van Eecke

The constructivist acquisition of language has been elaborately documented by researchers in psycholinguistics and cognitive science (Pine & Lieven 1997; Tomasello 2003). However, the syntactico-semantic pattern finding mechanisms through which children learn constructions and grammatical categories have so far only to a limited extent been translated into computational models, and no faithful operationalisations of these mechanisms exist to date.

The research on which we report here aims to fill this void by introducing a mechanistic model of language acquisition through syntactico-semantic pattern finding, which models the co-emergence of constructions and grammatical categories based on semantically annotated corpora. Concretely, we present a methodology for learning computational construction grammars and a network of emergent grammatical categories by generalising over similarities and differences in the form and meaning of linguistic observations alone. The resulting grammars consist of bidirectional form-meaning mappings (i.e. constructions) of varying degrees of abstraction, which can be used for language comprehension (mapping from form to meaning) and production (mapping from meaning to form). We have implemented our methodology in the form of a set of learning operators for Fluid Construction Grammar (Steels 2011; see and have validated our methodology on the CLEVR benchmark dataset (Johnson 2017). The model achieves 100% accuracy on mapping between CLEVR questions and their meaning representations after having processed 1000 observations, and already achieves 90% accuracy after having processed 500 observations. The results show that our approach allows for online, incremental, data-efficient, transparent, and effective learning.

The research that we present here has both theoretical and practical implications. From a theoretical perspective, we provide computational evidence for the cognitive plausibility of usage-based constructionist theories of language acquisition by means of a precise mechanistic model of how a fully operational construction grammar consisting of constructions of varying degrees of abstraction can be bootstrapped from raw observations.
From a practical point of view, the techniques that we introduce here pave the way for learning computationally tractable, usage-based construction grammars that can be used for both language comprehension and production. Such systems are valuable for a large range of application domains, including intelligent conversational agents, the semantic analysis of discourse, intelligent tutoring systems and question answering systems.


Johnson, Justin, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick & Ross Girshick. 2017. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2901-2910.

Pine, Julian & Elena Lieven. 1997. Slot and frame patterns and the development of the determiner category. Applied Psycholinguistics, 18(2), 123-138.

Steels, Luc (ed.) 2011. Design patterns in Fluid Construction Grammar. Amsterdam: John Benjamins

Tomasello, Michael. 2003. Constructing a language: A usage-based theory of language acquisition. Harvard University Press.