AIP490 Project GradBot - A unified dialogue state tracking and dialogue response model for Task Oriented Dialogues (TOD) and Open Domain Dialogues (ODD) by Sinh Nguyen, Truc Nguyen THE FPT UNIVERSITY HO CHI MINH CITY ITS Faculty FPT University HCMC Final Capstone Project 2 of 75 GradBot - A unified dialogue state tracking and dialogue response model for Task Oriented Dialogues (TOD) and Open Domain Dialogues (ODD) by Sinh Nguyen, Truc Nguyen Supervisor: Dr. Nguyen Quoc Trung, Dr. Truong Hoang Vinh A final year capstone project submitted in partial fulfillment of the requirement for the Degree of Bachelor of Artificial Intelligence in Computer Science DEPARTMENT OF ITS THE FPT UNIVERSITY HO CHI MINH CITY April 2024 (Month year) Final Capstone Project 3 of 75 ACKNOWLEDGMENTS The project we are working on is under the ownership of Gradients Technologies. During our internship with the company, we were granted permission by the Director to utilize this project for our academic requirements, specifically for our graduation project.
The conceptualization and execution of the project plan were primarily our contributions. Prior to initiating the research, we were equipped with fundamental knowledge by our mentors and project managers. They also provided us with substantial support in refining our methodologies to optimize results. Furthermore, they assisted us in reviewing our code after each task.
For the training of models across all modules in this system, we utilized the company's GPU resources. Upon the conclusion of our internship, we, along with our instructors, were given the opportunity to continue our research and further development on the project. This professional experience has been instrumental in our academic and career growth. Final Capstone Project 4 of 75 AUTHOR CONTRIBUTIONS Conceptualization, Sinh Nguyen and Truc Nguyen and Gradients Technologies.; methodology, Sinh Nguyen and Truc Nguyen and Gradients Technologies; software, Sinh Nguyen and Truc Nguyen; validation, Sinh Nguyen and Gradients Technologies; formal analysis, Sinh Nguyen and Truc Nguyen; investigation, Gradients Technologies; resources, Gradients Technologies; data curation, Sinh Nguyen and Truc Nguyen; writing—original draft preparation, Sinh Nguyen and Truc Nguyen; writing—review and editing, Sinh Nguyen and Truc Nguyen; visualization, Sinh Nguyen and Truc Nguyen; supervision, Dr.Nguyen Quoc Trung and Dr.
Truong Hoang Vinh; project administration, Gradients Technologies; funding acquisition, Sinh Nguyen and Truc Nguyen. All authors have read and agreed to the Final Capstone Project document. Final Capstone Project 5 of 75 ABSTRACT The system for task-oriented dialogue domain requires classifying user intent and replying to a specific goal domain. Within the task-oriented sub-module, the Dialogue State Tracker (DST) is well-known as a variety processing tracker.
Nonetheless, current DST models tend to focus solely on task-oriented domains (ToD), resulting in constrained performance when deployed in varied scenarios. Besides, current dialogue response models of previous studies achieved quite poor results because the responses were not natural and fluent. In this paper, we propose GradBot, a unified system including DST model vs response model that predicts both types of tasks, task-oriented dialogue (TOD) and open domain dialogue (ODD). Our model leverages the recent advances in prompt engineering and conditional generation to perform zero-shot learning.
After experiments, GradBot has achieved an 88.5% score on Joint Goal Accuracy metrics when evaluating the Scheme-Guided Dialogue (SGD) and FusedChat test sets correspondingly, demonstrating the adaptation ability for multi-domains. Keywords: Task Oriented Dialogue, Open Domain Dialogue, Dialogue State Tracking, Conversational AI Final Capstone Project 6 of 75 CONTENTS ACKNOWLEDGMENTS 3 AUTHOR CONTRIBUTIONS 4 ABSTRACT 5 CONTENTS 6 List of Figures 9 List of Tables 10 1.1 Dialogue State Tracking 16 2.2 Enhance Reading Comprehension 16 2. PROJECT MANAGEMENT PLAN 20 3.1 Overall Project Objective 20 3.1 Set up the system’s architecture 23 4.1 Consistency and Compatibility Between Modules 25 4.2 Flexibility to Change and Upgrade the System 26 4.3 Optimal training time 27 4.4 Suitability for Research or Production Needs 28 Final Capstone Project 7 of 75 4.3 Optimization in training 29 4.3 Combine Model Parallel and Data Parallel 34 4.4 Choosing checkpoint model 36 4.1 Multitask pre-training 36 4.2 Instruction training and Chain of Thought training 38 4.5 Complexity of building Schema-guided Definition 40 4.6 Predefined structure model 42 4.1 Schema-guided representation 42 4.2 Dialogue context representation 42 4.3 Action enhances constraint 43 4.7 DST as guided Reading Conversation 44 5.2 Schema-Guided Dialogue (SGD) 49 5.1 Overview Schema-Guided 49 5.2 Schema-Guided Approach 50 5.4 Comparison With Other Datasets 52 5.1 Overview 53 Final Capstone Project 8 of 75 5.2 Annotation Error Types 53 6.1 Joint Goal Accuracy 56 6. RESULTS AND DISCUSSION 61 7.1 Performance on FusedChat 61 7.2 Performance on SGD 62 7.3 Performance on MultiWoz 2.
CONCLUSIONS AND PERSPECTIVES 66 9. APPENDIX 72 Final Capstone Project 9 of 75 List of Figures INTRODUCTION Figure 1. An example of Attraction Domain on Fusedchat datasets. The conversation builds from MutiWoz2.4 by rewriting the existing Task-oriented domain turns and adding new Open Dialogue domain turns.
Pytorch Fully Sharded Data Parallel (FSDP) 32 4. Decomposing All-Reduce Operations in Distributed Data Parallel Training: A Path to Full Parameter Sharding. Efficient Machine Learning with Hybrid Parallelism: The diagram shows a hybrid approach using model and data parallelism for efficient machine learning. It optimizes resource utilization and closely matches the speed of data parallelism, ideal for smaller research teams.
Our fine tuning data comprises 473 datasets, and 1,836 total tasks. Results of Flan T5 and T5 on MMLU, BBH, TyDiQA, MGSM. Compare using chain-of-thought training and not using CoT training. Compare using instruction training and not using instruction training.
Overview of GradBot approach for schema-guided multi-domains dialogues. The bottom figure includes specific examples for dialogue context, user action, ontology and current query while the top figure stimulates predictions. Their dialogue system allows a user and a digital assistant to switch between (TOD) and (ODD) modes. An example includes a query about college fees (TOD) and a chat about personal growth and finance (ODD).
An TOD + ODD instance from FusedChat 48 5. An ODD + TOD instance from FusedChat 49 5. Example schema for a digital wallet service 50 5. In the context of two distinct flight services, dialogue state tracking labels are applied after each user statement.
With the schema-guided method, these annotations depend on the service’s schema, located at the extreme left/right. Examples of each error type. Overview simultaneously enhances the construct meaning of the input and target value. 57 Final Capstone Project 10 of 75 List of Tables 3.
Above are our main tasks assignments. Source data 22 5 Table 3. Statistics for SGD, FusedChat, and MultiWoz2.4, computed across train, validation, and test sets. FusedChat incorporates MultiWoz2.4, with the addition of ODD to its TOD part.
In SGD, ”unique” slots are represented in italics, and the number of slot values includes those for ”categorical” slots. Experimental results on the FUSEDCHAT test set with Join Goal Accuracy (JGA), Slot Accuracy (SA), F1-score performance. Two models from FUSEDCHAT [23] are cited to compare for JGA and SA metrics, their parameters are not referred to in the original paper so that we hide it. Addition F1 column is reported to ensure proper tracking of the dialogue’s type (ODD or TOD).
Our GradTOD model’s performance is written in italics. Experimental results on the SGD test set with Join Goal Accuracy (JGA) performance on seen and unseen domains, the value with Large Language Model (params more than 1B) and our GradTOD model written in bold and in italics, respectively. Comparison of performance between state-of-the-art research on MultiWoz 2. The result of SOM-DST on MultiWoz 2.4 is referred to on [34].
The highest score with Encoder only and Large Language Model (Seq2Seq) are written in bold while our GradTOD model is written in italics, respectively. Example of Metadata miss slots 67 Appendix Table 8. Example of Dialog act miss slots 68 Appendix Table 9. Example of Metadata and dialog act miss slots 68 Appendix Table 10.
Example of Dialog act miss slots and values 69 Appendix Table 11. Example of Inconsistent values 69 Final Capstone Project 11 of 75 1. INTRODUCTION The task-oriented domain has attracted a lot of attention not only in academics but also in industry. This objective is to achieve specific strategies, such as providing information or performing an action that satisfies the user’s request.
Specifically, the task-oriented system will replace most product consultants, reservation staff and customer service staff. This system can interact with users and allow them to carry out intentions such as: buying products, searching or booking hotel rooms, restaurant tables, making medical appointments, buying music tickets, buying tickets flight, train, taxi, etc. and actions include: providing characteristics of the hotel or restaurant that the user wants to search or book, request the system about the place (does it has internet? How much does it cost ? etc), accept orders, confirm orders, etc. After receiving actions and intentions from the user, the system will respond to the user with system actions such as: providing information about hotels and restaurants that the user requests, offering hotel, restaurant in the user’s destination, performing order, etc.
One of the crucial components of the task-oriented domain is Dialogue State Tracking (DST), which tries to predict appropriate actions to resolve the goals. At every turn, DST has to look up the dialogue history (whole or sliding window) to the current user query to determine user intentions, actions with specific values in the slot list [1, 2]. In our observation, there are two kinds of DST designed: ● The traditional method uses an Encoder module exploiting multihead layers to build classified data intent prediction, slots prediction, and slot filling [1, 3]; ● Seq2Seq module uses prompting to show semantics between turns and ontology through conversation to predict a required value [4–6]. Final Capstone Project 12 of 75 In industrial applications, DST is required to adapt flexibly new domains (services) without prior training for a specific task.
For this purpose, the role of zeroshot prediction on unseen domains becomes important in DST. Some previous work [4–6] uses guided schema as a description to show the semantics of slots with input sentences (user query). With recent advances in pre-trained language models [7–9], augmented language techniques are gaining more and more attention. These methods have demonstrated impressive improvement and zero-shot adaptability [3, 10, 11].
Moreover, the in-context learning framework (ICL) shows efficient methods and techniques in DST without the re-training stage by combining prompting and examples for a task (few-shot) [12, 13]. Often other studies only focus on the ability to predict user actions and intentions using the DST model, without researching the ability to respond to users appropriately and fluently. Therefore, our system incorporates a Dialogue Response model designed to interact with users in a way that depends on the predictions made by the DST model (including the user’s actions and intentions). Historically, dialogue response models were primarily developed for conversational AI systems, with a particular focus on question answering systems.
The input model mainly consists of historical context and the current user query. In cases where the system is required to respond to untrained queries, documents retrieved from the internet or a database are added to the input model to provide knowledge to help the model answer user queries. However, to satisfy the user’s task-oriented requests, the Dialogue Response model needs to include a main component, called ‘system action’. This component represents the action that the system will take in response to the user, after receiving the user’s actions and intentions.
Final Capstone Project 13 of 75 Fig 1. An example of Attraction Domain on Fusedchat datasets. The conversation builds from MutiWoz2.4 by rewriting the existing Task-oriented domain turns and adding new Open Dialogue domain turns. More specifically, Figure 1 shows an example conversation with the associated dialogue state of the attraction domain.
The user wants to find information about a specified name and request more data about the phone number, address, and area. At the same time, they ask if going to the museum is useful or not (general domain)?