Jens Lemmens and Walter Daelemans
We present preliminary work on two NLP pipelines that are being developed in the context of the CLARIAH-VL project (https://clariahvl.hypotheses.org/). As part of this project, NLP tools will be provided in an easy-to-use interface for researchers in the humanities without programming or NLP expertise. The first pipeline, which focuses on stylometry, provides a variety of well-known stylometric features for the input text(s), and will be extended with more advanced tools (e.g., non-literal language detection). In addition, various author attributes are predicted (including the author’s age, gender, education level, and personality). Finally, the genre of the text is predicted. The second pipeline focuses on topic modeling: it uses standard algorithms to cluster input texts into different topic clusters. In order to provide insights into the topics that the clusters represent, the most important keywords / key phrases from each cluster and their importance weights are provided as well.