Interpretable Topic Modeling for Text Classification with FLSA-W

Emil Rijcken, Floortje Scheepers, Pablo Mosteiro, Kalliopi Zervanou, Marco Spruit and Uzay Kaymak

Topic models are popular unsupervised statistical methods that extract hidden topics underlying a collection of documents. Aimed at more interpretable text classification, FLSA-W was proposed [1]. It is a topic modeling algorithm that uses fuzzy clustering and is inspired by Fuzzy Latent Semantic Analysis (FLSA) [2]. In a recent experiment, FLSA-W outperforms other topic modeling algorithms on various open datasets (20Newsgroup, BBC-News, M10 & DBLP) in terms of interpretability, coherence and diversity. While comparing FLSA-W with other topic models for text classification in terms of predictive performance and topic quality, we find no correlation between the two.

The Python package ‘FuzzyTM’ features three fuzzy topic models amongst which FLSA-W and FLSA.

[1] Rijcken, E., Scheepers, F., Mosteiro, P., Zervanou, K., Spruit, M., & Kaymak, U. (2021, December). A Comparative Study of Fuzzy Topic Models and LDA in terms of Interpretability. In 2021 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1-8). IEEE.

[2] Karami, A., Gangopadhyay, A., Zhou, B., & Kharrazi, H. (2018). Fuzzy approach topic discovery in health and medical corpora. International Journal of Fuzzy Systems, 20(4), 1334-1345.