Towards Semantic Memory in Transformer-Based Language Models

Helena Balabin, Vladimir Araujo, Rik Vandenberghe and Marie-Francine Moens

Despite the success of Transformers on several Natural Language Processing tasks, pre-trained Transformer-based language models often fail to incorporate world knowledge adequately. Several approaches have been proposed to address this issue, typically by infusing structured knowledge in the form of Knowledge Graphs (KGs) into pre-trained language models. However, previous work primarily focuses on a static integration of knowledge through a fixed feature fusion strategy rather than a dynamic knowledge retrieval process like in the human brain. In this work, we draw inspiration from declarative memory in the human brain, which encompasses episodic and semantic memory. More specifically, we extend a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model with episodic and semantic memory components. We implement episodic memory using sparse experience replay and semantic memory with a RotatE KG embedding (KGE) model trained on a subset of the Wikidata-based WD50K dataset. Next, we construct a semantic search index for the KG based on encoding the textual descriptions of each triple with a Sentence-BERT model. Then, we mimic the dynamic retrieval from semantic memory in our model based on a gating mechanism that chooses when to integrate KG embeddings from a triple based on a similarity threshold, which incorporates additional KGEs only if the textual description of the respective triple has a sufficiently high cosine similarity to the text input of the model. We evaluate our model on the DBpedia14 text classification dataset to showcase the effectiveness of our proposed model architecture.