Display: For code generation view, add "correct/incorrect" labels (and potentially execution outputs) #813

neubig · 2023-05-31T10:48:08Z

In the code generation view, currently the output code and expected code are displayed.

However, in most code generation datasets, such as HumanEval or Odex, evaluation is performed based on running the code and generating the output and comparing whether the output is correct. Based on this:

At the very least, it should be possible to view whether the generated code was judged as correct by showing a "correct/incorrect" label.
Even better would be the functionality to view:
1. Expected code
2. Predicted code
3. Output of expected code
4. Output of predicted code
5. Correctness/incorrectness value

This probably requires the data structure for code output_column and data_column to not be str, but a different data structure that includes the code, output, and correctness value.

The text was updated successfully, but these errors were encountered:

neubig · 2023-06-21T04:02:56Z

Here is an example of what the outputs look like now:

It would be nice to have the correctness/incorrectness value and error message also able to be displayed as well.

neubig changed the title ~~Display: For code generation view, add "correct/incorrect" labels (and potentially model outputs)~~ Display: For code generation view, add "correct/incorrect" labels (and potentially execution outputs) May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Display: For code generation view, add "correct/incorrect" labels (and potentially execution outputs) #813

Display: For code generation view, add "correct/incorrect" labels (and potentially execution outputs) #813

neubig commented May 31, 2023 •

edited

Loading

neubig commented Jun 21, 2023

Display: For code generation view, add "correct/incorrect" labels (and potentially execution outputs) #813

Display: For code generation view, add "correct/incorrect" labels (and potentially execution outputs) #813

Comments

neubig commented May 31, 2023 • edited Loading

neubig commented Jun 21, 2023

neubig commented May 31, 2023 •

edited

Loading