InDeep x NMT: Empowering Human Translators via Interpretable Neural Machine Translation

Gabriele Sarti and Arianna Bisazza

As part of the broader consortium “InDeep: Interpreting Deep Learning Models for Text and Sound” funded by the Dutch Research Council (NWO), we aim to build upon the latest advances in interpretability studies to empower end-users of neural machine translation via the application of interpretability techniques for neural machine translation. Central to our project is improving the subjective post-editing experience for human professionals, promoting a shift from a passive proofreading routine to an active role in the translation process by employing interactive and intelligible computational practices, driving further enhancements in the quality and efficiency of post-editing in real-world scenarios. On the methodological side, this entails developing and adapting tools and methodologies to improve prediction attribution, error analysis, and controllable generation for NMT systems. We will evaluate our approaches using automatic metrics (Rei et al. 2020), and via a field study surveying professionals in collaboration with GlobalTextware.

The focus of the first part of the project will be on identifying approaches that could be generalized to conditional text generation tasks, such as the black-box methods proposed by Alvares-Melis and Jaakkola (2017) and He et al. (2019). Feature and instance attribution methods are interesting due to their practical applicability in standard translation workflows. In particular, we find it essential to assess the relationship between importance scores produced by these methods and different categories of translation errors. Evaluating the faithfulness of model attributions, i.e., how they are causally linked to the system’s outputs, is another fundamental component of our investigation and will be pursued by employing a mix of existing and new techniques (De Young et al. 2020).

The second part of the project will involve a field study combining behavioral and subjective quality metrics to empirically estimate the effectiveness of our methods in real-world scenarios. For the behavioral part, we intend to use a combination of keylogging and possibly eye-tracking to collect granular information about the post-editing process. Our analysis will benefit from insights from recent interactive MT studies (Santy et al. 2019; Vandeghinste et al. 2019) to present translators with useful information while avoiding visual clutter. Our preliminary inquiry involving professionals highlighted sentence-level quality estimation and adaptive style/terminology constraints as promising directions to increase post-editing productivity and enjoyability, supporting the potential of combining interpretable and interactive modules for NMT.

Alvarez-Melis, David, and Tommi Jaakkola. 2017. A Causal Framework for Explaining the Predictions of Black-Box Sequence-to-Sequence Models
In Proceedings of EMNLP 2017.

He, Shilin et al. 2019. Towards Understanding Neural Machine Translation with Word Importance
In Proceedings of EMNLP-IJCNLP 2019.

Rei, Ricardo et al. 2020. COMET: A Neural Framework for MT Evaluation
In Proceedings of EMNLP 2020.

DeYoung, Jay et al. 2020. ERASER: A Benchmark to Evaluate Rationalized NLP Models
In Proceedings of ACL 2020.

Santy, Sebastin et al. 2019. INMT: Interactive Neural Machine Translation Prediction
In Proceedings of EMNLP 2019.

Vandeghinste, Vincent et al. 2019. Improving the Translation Environment for Professional Translators
Informatics 6 (2).