CE4RS-Eval

Counterfactual Explanation for Recommender Systems - Evaluation

In this paper, we critically examine the evaluation of counterfactual explainers through consistency and explanation sparsity as key principles of effective explanation. Through extensive experiments, we assess how incorporating Top-k recommendations impacts the consistency of existing evaluation metrics; and analyze the impact of explanation size on explainer's performance, highlighting its importance as a key determinant of explanation quality.

Repository

This repository contains code of the paper "A Closer Look at Counterfactual Explanation Metrics for Recommender Systems" paper. We have evaluated our claim on three publicly available benchmarks, MovieLens1M, a subset of Yahoo!Music dataset and a subset of Pinterest dataset, using two different recommenders, Matric Factorization (MF) and Variational Auto Encoder (VAE).

Folders

Experiments Results: contains all the results of recommenders we used for the tables and figures in the paper and other configurations discussed in paper.
code: contains several code files:
- data_processing - code related to the preprocessing step for preparing data to run with our models.
- recommenders_architecture - specifies the architecture of the recommenders that were used in the paper(MF, VAE).
- recommenders_training - contains code related to VAE and MLP recommenders training.
- LXR_training - contains code for training LXR model for explaining a specified recommender(This is the only recommender that needs training).
- metrics - contains code related to model evaluation based on baseline methods approach.
- metricsTopK.py - code for evaluation of of explainers based on K-th item of recommender list to address consistency
- metricsXpSize.py - code for evaluation of methods on different size values (Explanation Sparsity Metric).
- help_functions - includes the framework's functions that are being used in all codes.
checkpoints: It is the designated location for saving and loading the trained model's checkpoints.

Requirements

python 3.10
Pytorch 1.13
wandb 0.16.3 (the package we used for monitoring the train process)
Installation

Main libraries:

PyTorch: as the main ML framework
Comet.ml: tracking code, logging experiments
OmegaConf: for managing configuration files

First create a virtual env for the project.

python3 -m venv .venv
source .venv/bin/activate

Then install the latest version of PyTorch from the official site. Finally, run the following:

pip install -r requirements.txt

Usage

To use this code, follow these steps:

Create data to work with by running the data_processing code.
On every code, please specify the "data_name" variable to be 'ML1M'/'Yahoo'/'Pinterest', and the "recommender_name" variable to be 'MLP'/'VAE' or pass it through arguments of "recommender" and "data"

Reproducing the Results:

After running the preprocessing step, simply run the recommenders_training.py and specify the "data_name" variable to be 'ML1M'/'Yahoo'/'Pinterest', and the "recommender_name" variable to be 'MLP'/'VAE'.
From the output checkpoints check which recommenders you want to pick for explanation. Then set the file name of the checkpoint in LXR_training.py or pass it as a argument by --directory and run to train the explainers.
Then to get other explainers and evaluate LXR evaluation, run the metrics.py file. This will print all the numbers you want. We have all these outputs in "Experiments Results" folder.

Resutls

Top-K recommenders on metric consistency

Comparison of CE methods based on POS@5 (lower value is the better) across 4 performance levels of the VAE recommender on ML-1M dataset. The figure shows the impact of going beyond Top-1 (a) and considering Top-k (b-d) recommendations on improving consistency when evaluating CE models. To facilitate clearer comparisons, the values are normalized using Min-max normalization, and shading is used to represent the variance in the results.

Explanation Sparsity Metric

Performance of CE methods across three datasets based on explanation sparsity metric using VAE recommender. The evaluation is conducted over eight explanation sizes, providing a comparative analysis of the methods. To facilitate clearer comparisons, the values are normalized using Min-max normalization. The results highlight dataset-specific performance variations, reflecting the effectiveness of each CE method on specific sparsity levels.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Resutls		Resutls
checkpoints		checkpoints
code		code
processed_data		processed_data
.gitignore		.gitignore
LICENSE		LICENSE
Queries.md		Queries.md
README.md		README.md
git.ignore		git.ignore

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CE4RS-Eval

Counterfactual Explanation for Recommender Systems - Evaluation

Repository

Folders

Requirements

Installation

Usage

Reproducing the Results:

Resutls

Top-K recommenders on metric consistency

Explanation Sparsity Metric

Metric Consistency on MF Recommender

POS consistency on Pinterest dataset

Consistency Evaluation effects of Top1 to Top5

Evaluation based on ONLY Top-1

Acknowledgements

Citation

About

Releases

Packages

Languages

License

dbis-uibk/CE4RS-Eval

Folders and files

Latest commit

History

Repository files navigation

CE4RS-Eval

Counterfactual Explanation for Recommender Systems - Evaluation

Repository

Folders

Requirements

Installation

Usage

Reproducing the Results:

Resutls

Top-K recommenders on metric consistency

Explanation Sparsity Metric

Metric Consistency on MF Recommender

POS consistency on Pinterest dataset

Consistency Evaluation effects of Top1 to Top5

Evaluation based on ONLY Top-1

Acknowledgements

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages