Exploring automating accuracy assessment of children’s oral reading

Bo Molenaar, Cristian Tejedor-García, Catia Cucchiarini and Helmer Strik

Learning to read is a vital skill to function in society. Unfortunately, an increasing part of Dutch adolescents have such poor reading skills that they risk becoming functionally illiterate as adults (Gubbels et al., 2019). As such, the issue of improving reading education should be addressed as early in children’s lives as possible. Personalised reading instruction can tackle this issue by supporting children in developing their reading skills at an early stage.

A crucial prerequisite for personalised reading education is detailed assessment of the reader’s strengths and weaknesses. However, objective testing of individual children’s reading skills is a laborious and time-consuming endeavour for teachers. Within the ASTLA project (Advanced Speech Technology and Learning Analytics for child personalised reading education, http://hstrik.ruhosting.nl/ASTLA/), we study methods to automate the assessment of children’s reading skills. This project brings together knowledge on automatic speech recognition (ASR), speech diagnostics and learning analytics from Radboud Universiteit, Universiteit Twente, Cito and Expertisecentrum Nederlands.

One of the research projects within ASTLA focuses on automating accuracy assessment of children’s oral reading to provide information on reading strengths and weaknesses. By automatic accuracy assessment, we refer to automatic checking of actually spoken words against the reading prompt and judging whether it is read correctly. Accuracy feedback provided by ASR has been used successfully to help children improve their reading in recent work (Bai et al., 2020).

In this project, we investigate several methods of automatic accuracy assessment. Currently, experiments focus on using existing corpora of child speech like JASMIN (Cucchiarini et al, 2008) to investigate various methods such as ADAPT (Elffers et al., 2005) distance scoring between reading prompts and ASR transcriptions of read speech and forced decoding of the read speech (with different alternatives for the language model, i.e. general-purpose or specific-purpose at different levels). Furthermore, we aim to achieve higher accuracy by training a dedicated ASR system with new child speech data for this purpose. In this talk we intend to present the preliminary results of the above-mentioned experiments and to discuss possible alternative approaches for future research.

Bai, Y., Hubers, F., Cucchiarini, C., & Strik, H. (2020). ASR-Based Evaluation and Feedback for Individualized Reading Practice. Interspeech 2020, 3870–3874. https://doi.org/10.21437/Interspeech.2020-2842
Cucchiarini, C., Driesen, J., Van Hamme, H., & Sanders, E. (2008). Recording Speech of Children, Non-Natives and Elderly People for HLT Applications: The JASMIN-CGN Corpus. Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08). http://www.lrec-conf.org/proceedings/lrec2008/pdf/366_paper.pdf
Elffers, B., Van Bael, C., & Strik, H. (2005). Algorithm for Dynamic Alignment of Phonetic Transcriptions. Radboud University.
Gubbels, J., van Langen, A., Maassen, N., & Meelissen, M. (2019). Resultaten PISA-2018 in vogelvlucht. Universiteit Twente. https://doi.org/10.3990/1.9789036549226