Shantanu Nath, Gosse Minnema and Tommaso Caselli
People talk about the same events in different ways. Narrations of the same facts can vary according to who is telling them, and the perspective they take on it. In this contribution, we use football matches to extract entity-centric event sequences to identify and highlight the presence of perspectives. We argue that football is particularly suited for this purpose because of three defining characteristics: (i) football is a clearly defined semantic domain; (ii) news report are usually short and simple to process; (iii) it is straightforward to define perspectives, as there are always exactly two teams that compete against each other, and which may be reported on more or less favorably by different sources.
We model events by means of Fillmorean semantic frames (Fillmore, 2006). Entity-centric event sequences are based on Chambers and Jurafsky (2008, 2009) and represent coherent sequences of sets of events sharing a single protagonist. Each team in a match is a protagonist of an event sequence. For instance, given the text (1a), we can extract the event sequence (1b) for one of the teams:
a) [The Flying Dutchmen] will *advance* to the semi-finals. [Frank de Boer] is in *possession* of the ball, [he] *passes* the ball to [Dennis Bergkamp], [Bergkamp] *picks up* the ball and *scores*.
b) PROGRESSION (advance) => POSSESSION (possession) => PASS (passes) => CONTROL (pick up) => SCORE_GOAL (scores)
To obtain the event sequence in (1b), we first extract events using an end-to-end frame semantic parser (Xia et al. 2021; Minnema, 2021) trained on Kicktionary (Schmidt, 2008), a multilingual (German, French, English) football domain-specific framenet resource. Minnema (2021) showed that despite limited training material, the model achieves good performance (0.83 F1-score on frame detection, and 0.81 F1-score on semantic role detection given gold predicates) on a held-out test set. Since the model is based on XLM-R (Conneau et al. 2020), it can also be applied in a zero-shot cross-lingual transfer setting.
Once frames and roles have been identified, protagonists must be aggregated at the team level. In our case, all mentions of individuals (“Frank de Boer”, “he”, “Dennis Bergkamp”, “Bergkamp”) as well as mentions of the targeted team (“The Flying Dutchmen”) are abstracted to a single protagonist representing the team.
In this stage, the temporal order of events is assumed to be iconic with the order of mention in the text, although it may not always be the case. By comparing sequences of events for each of the teams, we can highlight different perspectives taken on the same match by different texts. For example, in (1b), the perspective is clearly on the Dutch team: we have a contiguous sequence of 5 events in which only the Dutch team is mentioned, and always in an Agent or Beneficiary role. Currently, we are applying this approach on a large multilingual corpus (Dutch, German, and English) of football matches composed of 312 documents from different sources and languages, allowing us to extensively investigate perspectives associated with entity-centric event sequences.