CLS-concat: combining document-level and sentence-level representations for more effective stance detection

Xiao Zhang and Suzan Verberne

Stance detection (SD) determines whether a piece of text is in favor, against, neutral, or unrelated to a specific target. The application of SD in the news domain helps increasing the diversity of news that is recommended to the public. Despite significant progress on SD, previous work mainly focused on single model and task. For example, BertEmb, the state-of-the-art model for the STANDER dataset [1], uses a single neural model for the news SD task. However, the performance achieved by this model has not been close to the upper bound. In this paper we propose multi-task and multi-model approaches to improve the performance.

Our first model is CLS-transfer BERT, inspired by Stance-BERT [2], which reuses the CLS token of a BERT model fine-tuned by another task. We design two new tasks to get transferable CLS tokens. The first task is to predict the label combinations (F-F, F-A, F-N, F-U, A-A, A-N, A-N, A-U, N-N, N-U and U-U) for input news pairs. The second task is to predict the stance that one news has towards other news. We replace the initial CLS token of the BERT model with these two CLS tokens respectively. As a supplement, we also use a randomly generated CLS token.

Our second model is CLS-concat BERT, which combines BertEmb and BERT by concatenating the embedding of the CLS token of BERT and the embeddings of the sentences and the target. In this way, the concatenated embedding contains not only the information at the sentence-level, but also the document-level information. Then the concatenated embedding is processed by TwoWingOS and linear layers.

For the evaluation, we use STANDER, an expert-annotated dataset for news stance detection, as the dataset for the experiment. It contains news of four mergers that involve six companies in the healthcare industry. 3291 pieces of news and their stances (Favor, Against, Neutral or Unrelated) towards four targets are included.

we take BertEmb as the first baseline. BertEmb uses Sentence-BERT to get the embeddings of the sentences in each news article and the corresponding target, and uses the Two-Wing Optimization Strategy (TwoWingOS) to calculate the probability that a sentence is evidence for the target. We compare to BERT as the second baseline model.

As for the performance of these two models, according to the average F1-score of multiple experiments, CLS-transfer BERT does not improve the performance of BERT. Moreover, the random embedding still achieves similar results, which means that the CLS token does not contain sufficient stance information for further fine-tuning. However, our CLS-concat BERT successfully improves performance by overperforming the BertEmb model by 12.7\% points and the BERT model by 1.4\% points, demonstrating that the combination of information at different levels is useful for stance detection.

[1] Costanza C. et al. “STANDER: An Expert-Annotated Dataset for News Stance Detection and Evidence Retrieval”. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp.4086–4101.
[2] Lin T. et al. “Early Detection of Rumours on Twitter via Stance Transfer Learning”. In: Advances in Information Retrieval 12035(2020), pp.575-588.