Denise de Waard, Jelte van Waterschoot and Mariët Theune
In spoken dialogue systems such as conversational agents and virtual assistants, mistakes happen in which the agent or assistant does not fully understand the user. Grounding is the process of reaching mutual understanding of what is being said, and is necessary for resolving mistakes. The mistakes in this research are speech recognition errors by an agent.
For our experiment we implemented a part of the grounding process in a turn-based spoken dialogue system for repairing misunderstandings. In particular, we implemented two types of grounding: conditional and unconditional grounding. Conditional grounding is a repair process where the agent repeats the question if it thinks it misunderstood the user. Unconditional grounding is a form of implicit verification in which the agent always repeats part of the user’s previous response while asking its next question, giving the user the opportunity to correct the agent if it has misunderstood anything.
In our experiment, twenty users interacted with both a conditional and an unconditional prototype, to determine which type of grounding in spoken dialogue systems users prefer. The system design was semi-autonomous: a wizard controlled the misunderstandings and typed in grammar mistakes as a human speech recognizer, to make sure that the unconditional grounding used words that were similar but not identical to what the user said. In the conditional prototype, no intervention of the wizard was necessary.
We asked participants after interacting with the two systems which one they preferred, and most preferred the conditional prototype. Looking more into the details, we found a possible recency effect, because most participants preferred the prototype they spoke with most recently. Additionally, we asked how people liked the tempo of the conversation, the intelligence and empathy of the agent and the certainty of understanding. Participants found the conditional prototype to be more intelligent, but less empathetic than the unconditional prototype. They felt that the tempo of the conditional prototype was faster, most likely due to the fact that the wizarding caused a delay in the unconditional prototype. Participants had different opinions on which prototype had the most certainty of understanding. Some people preferred to hear their answer back to confirm (unconditional) and others preferred the conditional prototype because the agent repeating the questions is a clear indication of misunderstanding.