The EXALT Project – Creating a Multilingual Entity-Based Search Engine for Archaeological Reports

Alex Brandsen, Suzan Verberne, Karsten Lambers and Milco Wansleeben

Archaeology (like many other domains) struggles with an overload of literature being produced, and finding information relevant to specific research questions is difficult and time-consuming with standard tools based on metadata search. Also, archaeological texts deal with a lot of uncertainty, noise, synonymy and polysemy, and have a strong geographical element. This means that standard full-text search solutions are not ideal in this domain. To overcome these problems, we propose entity-based search combined with free-text and geographic search (filtering on location by selected map area).

In a previous project, we created the AGNES search engine, which leveraged a custom trained BERT model (named ArcheoBERTje) for Named Entity Recognition in Dutch excavation reports from the DANS archive. This model achieved a micro F1 score of 0.76. The detected entities are indexed together with the free text and geographical information, making complex and archaeology-specific queries possible.

After receiving funding from the NWO ‘Archeologie Telt’ grant, we have started the EXALT project, in which we aim to further build on AGNES and create a one-stop-shop for searching in texts about archaeology in The Netherlands and surrounding areas. We expand on AGNES on the following fronts:

(1) We are incorporating more types of documents, so not only excavation reports, but also theses, papers, books and annals.

(2) We are adding more languages: English and German are also used in literature about Dutch archaeology, and need to be incorporated. This means developing NER methods for all 3 languages.

(3) We aim to create a multilingual, hierarchical ontology for archaeology, linking concepts between languages.

(4) We will link entities to the ontology, through which we facilitate multilingual search and query broadening/narrowing, by adding synonyms and similar terms in other languages to a query.

Initial results from AGNES have already shown the value of such a search system: users are positive about AGNES, and a case study on Early Medieval cremations led to an increase of 30% in cremations known to experts. We will evaluate the EXALT project with 3 more case studies in various subfields of archaeology, hopefully leading to more valuable insights into the past.

In our poster we will present the challenges of the EXALT project and the solutions we are working on, with some initial results.