Below shows the dataset for DBDC3. This dataset includes both development and evaluation data. After the DBDC3 workshop, we revised some of the annotations. The dataset below include the revised annotations.
This dataset was made by DBDC3 Task Organizers. The data can be used for both profit and nonprofit purposes under the MIT license. The data contain both English and Japanese dialogues with dialogue breakdown annotations for each system utterance.
The following four datasets were used as development data in DBDC3.
The following four datasets were used as evaluation data in DBDC3.
Each dialogue in CIC_115 and CIC_50 was collected by showing a context represented by a short paragraph to a user before the dialogue.
The following three datasets were used as evaluation data in DBDC3.
You can also refer to here for additional datasets in Japanese used in DBDC1 and DBDC2.
There are two folders, “dbdc3” and “dbdc3_revised”, in the data folder. “dbdc3” is the one used for the DBDC3 workshop and “dbdc3_revised” is the one we revised after the workshop.
The four datasets, CIC_115, YI_100 in dbdc3/en/dev/ and CIC_50, YI_50 in dbdc3/en/test/ were re-annotated and are stored under dbdc3_revised folder. In the original data, each annotator was allowed to annotate a part of a dialogue; however, in the revised data, each annotator was obliged to annotate all utterances of a dialogue in a row. This revision slightly increased the inter-annotator agreement.
This dataset was made by DBDC3 Task Organizers by using Stanford Question Answering Dataset (SQuAD). The data can be used for both profit and nonprofit purposes under the CC BY-SA 4.0 license. The data contain short paragraphs used as context for DBDC3.
We gratefully acknowledge the generous support provided by the following sponsors:
If you have any questions or comments, please contact us by firstname.lastname@example.org.