ChatEval (A Tool for Evaluating Chatbots)

Chatbot evaluation is really hard. There is no standard, and this is our attempt to at least address small parts of this issue.

Right now we using ParlAi as our framework for data as well as experiments. We used OpenMNT-py for training models. All of our checkpoints will be made publicly available including all configurations. See this link for checkpoints from the paper.

Submit your model! Please take a look our submission form.

Amazon Mechanical Turk is not free... we are actively looking for funding.

Please find our paper here.

What does ChatEval solve?

Shared and publicly available model code and checkpoints.
Standard evaluation datasets.
Standard human annotator framework (currently using Amazon Mechanical Turk).
Model comparisons of the performance of Model A vs Model B. Both a summary and all data are available.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
amt_eval		amt_eval
eval_data		eval_data
paper		paper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Running.md		Running.md
_config.yml		_config.yml
human_evaluator_inconsistency.gif		human_evaluator_inconsistency.gif
index.md		index.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChatEval (A Tool for Evaluating Chatbots)

About

Releases

Packages

Contributors 4

Languages

License

jsedoc/ChatEval

Folders and files

Latest commit

History

Repository files navigation

ChatEval (A Tool for Evaluating Chatbots)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages