Jens Nevens, Jonas Doumen, Paul Van Eecke and Katrien Beuls
A crucial step towards the development of truly intelligent agents in AI consists in the development of communication systems that offer the robustness, flexibility and adaptivity found in human languages. While the processes through which children acquire language are by now relatively well understood, a faithful computational operationalisation of the underlying mechanisms is still lacking. Two main cognitive processes are involved in child language acquisition: intention reading and pattern finding (Tomasello, 2003). The former is used to understand the communicative intention of the interlocutor, thereby reconstructing the meaning underlying an observed utterance, while the latter is used to abstract away over concrete utterances and meaning representations and construct more general, productive schemata.
Here, we introduce a mechanistic model of the intention reading capacity and its integration with pattern finding. Through situated, task-oriented interactions, an agent learns a grammar that enables it to ask and answer questions about scenes of geometrical objects from the CLEVR dataset (Johnson et al, 2017). Intention reading consists in a query composition process based on the question, the answer, and the current scene. Specifically, a limited inventory of atomic operations can be combined into compositional queries that capture the meaning of a question. The search space faced by this query composition process is exponentially large, as many possible queries can lead to the same answer in any given scene and because primitive operations can be combined in many different ways. Initially, the agent does not know which part of a question corresponds to which part of the composed query and therefore stores this mapping as a holistic form-meaning pattern. The pattern finding process generalises over reoccurring form-meaning patterns, thereby capturing the compositional structure of questions through constructions of varying degrees of abstraction. The key in successfully tackling this learning task is the integration of intention reading and pattern finding. Indeed, intention reading facilitates pattern finding by providing meaning hypotheses. In turn, pattern finding drastically reduces the search space involved in intention reading through the provision of partial analyses. Over many interactions, the agent incrementally acquires a grammar that can be used for both language comprehension (mapping questions to queries) and production (mapping queries to question) without ever having observed the queries that represent the meaning of the observed questions.
The acquired grammar and the accompanying queries effectively solve CLEVR’s visual question answering task. This work provides computational evidence for the cognitive plausibility of usage-based theories of language acquisition. It also introduces a powerful new methodology that allows autonomous agents to acquire an effective communication system with human-like properties through task-oriented interactions.
Michael Tomasello. Constructing a Language: A Usage-Based Theory of Language Acquisition. Harvard University Press, Harvard, 2003.
Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, and Ross Girshick. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In IEEE CVPR, pages 2901–2910, 2017.