This repo contains the testing code for our paper DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning (paper link). A website (website link) is also available for more information. If you have more questions or suggestions, please email me at [email protected].
The above figures are examples of our dataset. Here we offer code for the reproduction of results mentioned in our paper.
Create a new environment with anaconda (For more detail on anaconda, please refer to this link), and download the needed libraries.
conda create -n PragBot python=3.10
conda activate PragBot
conda install -c conda-forge datasets transformers wandb evaluate tqdm
conda install -c anaconda numpy
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia # For cuda
# We recommend using GPU, but it's still be able to run on cpu
# Please uncomment the following line for on cpu running
# conda install pytorch torchvision torchaudio cpuonly -c pytorch
Feel free to contact us at [email protected]., if there is any problem.
Due to the limitation of GitHub repo, our dataset is stored in Google Drive, therefore please download through this link. Once all datasets are downloaded, please put them in the dataset folder.
Please refer to our paper for more detailIn this task, models are provided with dialogues and are required to identify turns whose actual meanings deviate from their literal interpretations, commonly referred to as pragmatic turns. If their selections are accurate, a set of rationales is presented and they are expected to choose the most plausible reason for each pragmatic turn.
The first subprocess is named as Conversation
- Subtask 1: C
$→$ P
cd PIR_subtask1 # move to the folder of subtask 1
# run the code with bert_base model
python main_test.py --model_checkpoint bert-base-uncased --seed 42 --batch_size 24
# run the code with gpt2
python main_test.py --model_checkpoint gpt2 --seed 42 --batch_size 24
# run the code with DialoGPT
python main_test.py --model_checkpoint microsoft/DialoGPT-medium --seed 42 --batch_size 24
- Subtask 2: CP
$→$ R
cd PIR_subtask2 # move to the folder of subtask 2
# PS: Run bert-like model with the following code
# run the code with bert base model
python test_subtask2.py --model_checkpoint bert-base-uncased --seed 42 --batch_size 16
# PS: Run GPT-like model with the following code
# run the code with gpt base model
python gpt_test_pytorch.py --model_checkpoint gpt2 --seed 42 --batch_size 8
# run the code with DialoGPT
python gpt_test_pytorch.py --model_checkpoint microsoft/DialoGPT-medium --seed 42 --batch_size 8
- Subtask 3: C
$→$ PR
- Noticed that subtask3 doesn't involve in training, therefore please offer two checkpoints for this subtask.
- The two checkpoints can be gained by running code of Subtask 1, and Subtask 2.
- pi_checkpoint: the checkpoint for C
$→$ P (subtask 1) - r_checkpoint: the checkpoint for CP
$→$ R (subtask2) .
cd PIR_subtask3 # move to the folder of subtask 3
# PS: Run bert-like model with the following code
python C2IR.py --pi_checkpoint {pi_checkpoint} --r_checkpoint {r_checkpoint}
# PS: Run GPT-like model with the following code
python C2IR_gpt.py --pi_checkpoint {pi_checkpoint} --r_checkpoint {r_checkpoint}
Notice: When test without gold, the Rationale is out of reach for the model, while testing with gold, the Rationale is available.
# Test without gold
# run the code with t5-samll
python run_seq2seq_qa.py --model_checkpoint t5-small --seed 42 --batch_size 24
# Test with gold
# run the code with t5-samll
python run_seq2seq_qa_with_gold.py --model_checkpoint t5-small --seed 42 --batch_size 24
Feel free to contact us by email at [email protected] or raise a github issue, we will solve the problems as soon as possible
@inproceedings{li2023diplomat,
title={DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning},
author={Hengli Li, Song-Chun Zhu, Zilong Zheng},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2023}
}