Easy to evaluate with fast inference settings CodeLLMs
CodeLLM Evaluator provide the ability for fast and efficiently evaluation on code generation task. Inspired by lm-evaluation-harness and bigcode-eval-harness, we designed our framework for multiple use-case, easy to add new metrics and customized task.
Features:
- Implemented HumanEval, MBPP benchmarks for Coding LLMs.
- Support for models loaded via transformers, DeepSpeed.
- Support for evaluation on adapters (e.g. LoRA) supported in HuggingFace's PEFT library.
- Support for inference with distributed native transformers or fast inference with VLLMs backend.
- Easy support for custom prompts, task and metrics.
Install code-eval package from the github repository via pip:
$ git clone https://github.com/FSoft-AI4Code/code-llm-evaluator.git
$ cd code-llm-evaluator
$ pip install -e .
To evaluate a supported task in python, you can load our :py:func:`code_eval.Evaluator` to generate and compute evaluate metrics on the run.
from code_eval import Evaluator
from code_eval.task import HumanEval
task = HumanEval()
evaluator = Evaluator(task=task)
output = evaluator.generate(num_return_sequences=3,
batch_size=16,
temperature=0.9)
result = evaluator.evaluate(output)
Load model and generate answer using native transformers (tf
), pass model local path or
HuggingFace Hub name. We select transformers as default backend, but you can pass backend="tf"
to specify it:
$ code-eval --model_name microsoft/phi-1 \
--task humaneval \
--batch_size 8 \
--backend hf \
Tip
Load LoRA adapters by add --peft_model
argument. The --model_name
must point
to full model architecture.
$ code-eval --model_name microsoft/phi-1 \
--peft_model <adapters-name> \
--task humaneval \
--batch_size 8 \
--backend hf \
We recommend using vLLM engine for fast inference. vLLM supported tensor parallel, data parallel or combination of both. Reference to vLLM documenation for more detail.
To use code-eval
with vLLM engine, please refer to vLLM engine documents to instal it.
Note
You can install vLLM using pip:
$ pip install vllm
With model supported by vLLM (See more: vLLM supported model) run:
$ code-eval --model_name microsoft/phi-1 \
--task humaneval \
--batch_size 8 \
--backend vllm
Tip
You can use LoRA with similar syntax.
$ code-eval --model_name microsoft/phi-1 \
--peft_model <adapters-name> \
--task humaneval \
--batch_size 8 \
--backend vllm \
@misc{code-eval, author = {Dung Nguyen Manh}, title = {A framework for easily evaluation code generation model}, month = 3, year = 2024, publisher = {github}, version = {v0.0.1}, url = {https://github.com/FSoft-AI4Code/code-llm-evaluator} }