Chunliu Wang and Johan Bos
The neural turn in computational linguistics has made it relatively easy to build systems for natural language generation, as long as suitable annotated corpora are available. But can such systems deliver the goods? Using Dutch data of the Parallel Meaning Bank, a corpus of (mostly short) texts annotated with language-neutral meaning representations, we investigate what challenges arise and what choices can be made when implementing sequence-to-sequence or graph-to-sequence transformer models for generating Dutch texts from formal meaning representations. Challenges are dealing with unknown concepts in the input, finding a systematic way to handle domain constants (proper names, dates, clocktimes), and suppressing hallucination in the output.