Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 bug: typos in initial prompt templates #213

Closed
wants to merge 1 commit into from

Conversation

azigler
Copy link
Contributor

@azigler azigler commented Jun 27, 2024

Closes #212

Description

I cleaned up some typos and punctuation and standardized some of the ways we request things from the LLM.

I also think in standard_personality_without_locale.tmpl we should consider pulling the bot's name as a variable, instead of hardcoding it as "Copilot"

@azigler azigler requested a review from crspeller June 27, 2024 00:13
@azigler azigler self-assigned this Jun 27, 2024
@crspeller
Copy link
Member

@azigler Merge conflicts from the other PR.
Also wondering if you have done some testing to see if these new prompts are better? The prompts are surprisingly finicky.
I am weary of merging so many random fixes at once without having a good idea that it is going to improve things.

@azigler
Copy link
Contributor Author

azigler commented Jul 8, 2024

@crspeller I agree, and wanted to get your opinion on how to best A/B test these. Since each model could handle each given prompt differently, I'm not sure what's the best benchmark to measure against. Should we test the prompts against a few models to get a sense of variety? I've seen a lot of things like LLM "tournaments" but they compare two models using an identical prompt. In this case, we want to compare multiple models with multiple prompts, so it's more like a matrix.

@crspeller
Copy link
Member

@azigler Ideally yes, but we can probably start with just evaluating against one model. So far all my evaluations have been manual but it would be great if we could setup some framework like https://github.com/confident-ai/deepeval or something of the sort to test prompt changes.

At the bare minimum we should manually test prompt changes to make sure they are improvements.

@azigler
Copy link
Contributor Author

azigler commented Aug 5, 2024

Going to close this PR for now and revisit the ask for testing, since these prompts were updated in a parallel PR.

@azigler azigler closed this Aug 5, 2024
@azigler azigler removed their assignment Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐛 bug: typos in initial prompt templates
2 participants