D-Score: Holistic Dialogue Evaluation Without Reference

Page view(s)
6
Checked on Feb 10, 2025
D-Score: Holistic Dialogue Evaluation Without Reference
Title:
D-Score: Holistic Dialogue Evaluation Without Reference
Journal Title:
IEEE/ACM Transactions on Audio, Speech, and Language Processing
Publication Date:
21 April 2021
Citation:
Zhang, C., Lee, G., D’Haro, L. F., & Li, H. (2021). D-Score: Holistic Dialogue Evaluation Without Reference. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 2502–2516. https://doi.org/10.1109/taslp.2021.3074012
Abstract:
In artistic gymnastics, difficulty score or D-score is used for judging performance. Starting from zero, an athlete earns points from different aspects such as composition requirement, difficulty, and connection between moves. The final score is a composition of the quality of various performance indicators. Similarly, when evaluating dialogue responses, human judges generally follow a number of criteria, among which language fluency, context coherence, logical consistency, and semantic appropriateness are on top of the agenda. In this paper, we propose an automatic dialogue evaluation framework called D-score that resembles the way gymnastics is evaluated. Following the four human judging criteria above, we devise a range of evaluation tasks and model them under a multi-task learning framework. The proposed framework, without relying on any human-written reference, learns to appreciate the overall quality of human-human conversations through a representation that is shared by all tasks without over-fitting to individual task domain. We evaluate D-score by performing comprehensive correlation analyses with human judgement on three dialogue evaluation datasets, among which two are from past DSTC series, and benchmark against state-of-the-art baselines. D-score not only outperforms the best baseline by a large margin in terms of system-level Spearman correlation but also represents an important step towards explainable dialogue scoring.
License type:
Attribution 4.0 International (CC BY 4.0)
Funding Info:
This research / project is supported by the National Research Foundation Singapore - AI Singapore Programme
Grant Reference no. : AISG-GC-2019-002

This research / project is supported by the National Research Foundation Singapore - Human-Robot Interaction Phase 1
Grant Reference no. : 192 25 00054

This research / project is supported by the National Research Foundation Singapore - Human Robot Collaborative AI for AME
Grant Reference no. : A18A2b0046

This research is supported in part by the National Research Foundation (NRF) Singapore under the National Robotics Programme.

This work is supported in part by Robert Bosch (SEA) Pte Ltd under EDB.s Industrial Postgraduate Programme II (EDB-IPP), Project Title: Applied Natural Language Processing.

This work is supported in part by the Spanish Projects: AMIC (MINECO, TIN2017-85854-C4-4-R)

This work is supported in part by CAVIAR (MINECO, TEC2017-84593-C2-1-R) projects partially funded by the EuropeanUnion.
Description:
ISSN:
2329-9290
2329-9304