Exploring machine understanding of story dialogue via new tasks and dataset, improving coherence and speaker recognition in storytelling AI.
Authors:
(1) Jianzhu Yao, The CoAI group, Tsinghua University, Beijing, China Department of Computer Science and Technology, Tsinghua University, Beijing, China Beijing National Research Center for Information Science and Technology;
(2) Ziqi Liu, The CoAI group, Tsinghua University, Beijing, China Department of Computer Science and Technology, Tsinghua University, Beijing, China Beijing National Research Center for Information Science and Technology;
(3) Jian Guan, The CoAI group, Tsinghua University, Beijing, China Department of Computer Science and Technology, Tsinghua University, Beijing, China Beijing National Research Center for Information Science and Technology;
(4) Minlie Huang, The CoAI group, Tsinghua University, Beijing, China Department of Computer Science and Technology, Tsinghua University, Beijing, China Beijing National Research Center for Information Science and Technology.
Many classical fairy tales, fictions, and screenplays leverage dialogue to advance story plots and establish characters. We present the first study to explore whether machines can understand and generate dialogue in stories, which require capturing traits of different characters and the relationships between them. To this end, we propose two new tasks including Masked Dialogue Generation and Dialogue Speaker Recognition, i.e., generating missing dialogue turns and predicting speakers for specified dialogue turns, respectively. We build a new dataset DIALSTORY, which consists of 105k Chinese stories with a large amount of dialogue weaved into the plots to support the evaluation. We show the difficulty of the proposed tasks by testing existing models with automatic and manual evaluation on DIALSTORY. Furthermore, we propose to learn explicit character representations to improve performance on these tasks. Extensive experiments and case studies show that our approach can generate more coherent and informative dialogue, and achieve higher speaker recognition accuracy than strong baselines.
1 Introduction
Dialogue plays an important role in various types of literary works such as short stories, novels, and screenplays by advancing plots, establishing characters, and providing expositions using natural and lifelike words (Kennedy et al., 1983). Compared to dialogue in conversational scenarios such as chitchat bots (Shang et al., 2015) or task-oriented dialogue systems (Deng et al., 2012), dialogue in stories is mainly used to exhibit emotions, motivations, or personalities of characters following the authors’ design, which further serve for the coherence, informativeness, engagingness, and plot development of whole stories. Dialogue is also revealed to be essential for user-agent interaction in many text adventure games (Xi et al., 2021;Li et al., 2022). It has not been widely explored for machines to understand and generate dialogue in stories despite the broad recognition of the importance of this ability.
To spur research in this field, we present a new story dataset named DIALSTORY, which consists of 105k Chinese short stories with automatic annotations of dialogue turns and corresponding speakers. Furthermore, we formulate two new tasks including: (1) Masked Dialogue Generation (DialGen): completing a story with several dialogue turns masked with placeholders; and (2) Dialogue Speaker Recognition (DialSpk): choosing correct speakers from given candidates for several specified dialogue turns. We construct standardized datasets for these tasks through automatic or manual annotation based on DIALSTORY. As exemplified in Figure 1, these tasks comprehensively investigate the ability to capture characters’ emotions (e.g., the mother squirrel is worried about baby squirrel’s catching cold), motivations (e.g., the mother squirrel intends to call her baby to come back home or get medicine), and relationship between each other (e.g. mother rabbit knows the kind of flower as a friend of mother squirrel when they were kids).
Furthermore, we found in massive stories that dialogue had a strong connection to different characters, reflected in their emotional states, speaking styles, and plot development. To provide more insights to tackle these tasks, we propose to learn character representations for modeling dependencies between characters and dialogue explicitly. Extensive experiments and case studies show that our approach can generate more coherent and informative dialogue, and achieve higher speaker recognition accuracy than strong baselines.