Skip to content

Public evaluation tool for non task driven neural open domain chatbots

License

Notifications You must be signed in to change notification settings

jsedoc/ChatEval

Repository files navigation

Chatbot evaluation is really hard. There is no standard, and this is our attempt to at least address small parts of this issue.

Right now we using ParlAi as our framework for data as well as experiments. We used OpenMNT-py for training models. All of our checkpoints will be made publicly available including all configurations. See this link for checkpoints from the paper.

Submit your model! Please take a look our submission form.

Amazon Mechanical Turk is not free... we are actively looking for funding.

Please find our paper here.

What does ChatEval solve?

  1. Shared and publicly available model code and checkpoints.
  2. Standard evaluation datasets.
  3. Standard human annotator framework (currently using Amazon Mechanical Turk).
  4. Model comparisons of the performance of Model A vs Model B. Both a summary and all data are available.

About

Public evaluation tool for non task driven neural open domain chatbots

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •