Quantitative Certification of Bias in Large Language Models

Introduction

QCB is a framework for quantitatively certifying the bias in large language models (LLMs).

Setup

To setup QCB, please set up a conda environment with the following command: conda env create -f environment.yml

Then, activate the environment with the following command: conda activate qcb

Add the API keys for closed-source models in the file api_keys.py. Make a folder to store the certification results as mkdir results/.

Certifying supported models

We currently support the following models. We are working towards extending to models.

Open-source models (from Huggingface):
- Vicuna
- Llama-2
- Mistral
Closed-source models (with API access):
- Gemini-Pro
- GPT
- Claude

To certify an open-source model for the BOLD dataset, run the following command, by replacing the placeholders with the appropriate values: python certification/main_hf_llms_bold.py <expt_name> <expt_mode> <model_name> The arguments to the above command are described next:

<expt_name>: Name of the experiment (to name the result files appropriately)
<expt_mode>: Indicates the prefix distribution wrt which certification is to be done. Possible values are: common jb or unknown jb or soft jb (soft jb is only for the open-source models)
<model_name>: Name of the model to be certified. Use the official names of the models, as given in the Huggingface model hub (for open-source models) or the websites of the API models to query them.

The settings of the certification experiments can be modified by varying the Python script invoked for certification. The scripts are named as main_<type of LLM>_<dataset>. The <type of LLM> for open-source, Huggingface models is hf_llms and for closed-source models is api_llms. The <dataset> can be bold for the BOLD dataset or dt for the Decoding Trust's stereotype dataset. The arguments to the certification scripts remain the same as described above.

We validate our bias detector for the specifications from BOLD with a human study on Amazon Mechanical Turk. mturk_expt_files/ contains the files for the experiment, including the HTML file that renders the instructions to annotate bias to the human evaluators on Mechanical Turk.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
ablations		ablations
certification		certification
data		data
mturk_expt_files		mturk_expt_files
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
api_keys.py		api_keys.py
common.py		common.py
environment.yml		environment.yml
run.sh		run.sh
run_mistral.sh		run_mistral.sh
utils.py		utils.py
utils_api.py		utils_api.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quantitative Certification of Bias in Large Language Models

Introduction

Setup

Certifying supported models

About

Releases

Packages

Languages

License

uiuc-focal-lab/QuaCer-B

Folders and files

Latest commit

History

Repository files navigation

Quantitative Certification of Bias in Large Language Models

Introduction

Setup

Certifying supported models

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages