dbdc3

Dataset for DBDC3

Below shows the dataset for DBDC3. This dataset includes both development and evaluation data. After the DBDC3 workshop, we revised some of the annotations. The dataset below include the revised annotations.

Download

Main data (DBDC3.zip)

This dataset was made by DBDC3 Task Organizers. The data can be used for both profit and nonprofit purposes under the MIT license. The data contain both English and Japanese dialogues with dialogue breakdown annotations for each system utterance.

English data

The following four datasets were used as development data in DBDC3.

The following four datasets were used as evaluation data in DBDC3.

Each dialogue in CIC_115 and CIC_50 was collected by showing a context represented by a short paragraph to a user before the dialogue.

Japanese data

The following three datasets were used as evaluation data in DBDC3.

You can also refer to here for additional datasets in Japanese used in DBDC1 and DBDC2.

Revision after DBDC3

There are two folders, “dbdc3” and “dbdc3_revised”, in the data folder. “dbdc3” is the one used for the DBDC3 workshop and “dbdc3_revised” is the one we revised after the workshop.

The four datasets, CIC_115, YI_100 in dbdc3/en/dev/ and CIC_50, YI_50 in dbdc3/en/test/ were re-annotated and are stored under dbdc3_revised folder. In the original data, each annotator was allowed to annotate a part of a dialogue; however, in the revised data, each annotator was obliged to annotate all utterances of a dialogue in a row. This revision slightly increased the inter-annotator agreement.

Context data (DBDC3_context.zip)

This dataset was made by DBDC3 Task Organizers by using Stanford Question Answering Dataset (SQuAD). The data can be used for both profit and nonprofit purposes under the CC BY-SA 4.0 license. The data contain short paragraphs used as context for DBDC3.

Sponsors

We gratefully acknowledge the generous support provided by the following sponsors:

This track is endorsed by Special Interest Group on Spoken Language Understanding and Dialogue Processing (SIG-SLUD) of the Japanese Society of Artificial Intelligence (JSAI).

Contact

If you have any questions or comments, please contact us by dbdc3-organizers@googlegroups.com.