This repository contains the code of GS-VQA model implementation (version 2 with Unified estimator) described in Graph Strategy for Interpretable Visual Question Answering paper.
To set up the environment run the commands below.
git clone https://github.com/cds-mipt/x-vqa
-
Anaconda environment:
conda create env --name gs-vqa conda activate gs-vqa conda install pytorch=1.10.2 torchvision=0.11.* cudatoolkit -c pytorch
-
Install dependencies for VL-BERT. Inside vlbert folder you can find the files from original repo with VL-BERT implementation.
cd vlbert pip install -r requirements.txt pip install Cython pip install pyyaml==5.4.1
-
Build the library for VL-BERT:
./scripts/init.sh
To run UnCoRd-VL on custom dataset, you need to prepare the following data:
-
Question-to-graph model checkpoint - to be placed in ende_ctranslate2 folder.
-
Faster-RCNN model checkpoint - to be placed in estimators folder.
-
VL-BERT checkpoint - to be placed in vlbert/model/pretrained_model folder.
-
List of property names and their possible values
You need to prepare a .txt file that lists all possible categorical properties in the dataset and the values they can take. Each line must be completed in the following format:
property_name value_1 ... value_n
-
VL-BERT answers vocabulary
Answers for VL-BERT is a set consisting of all possible values of all properties from the given dataset, as well as the words 'yes' and 'no'. An example of VL-BERT vocabulary for CLEVR.
-
Image directory
-
JSON file with indexes of questions, texts of questions and indexes of images that correspond to these questions (answers are optional for test mode).
{'questions': [{'question_index': 0, 'question': 'What is the color of the large metal sphere?', 'image_index': 0, 'answer': 'brown'}, ... ]}
Functions for extracting questions: dataset.py.
In addition, to work with VL-BERT, you need to download pretrained BERT and ResNet-101 and put them in folders vlbert/model/pretrained_model/bert-base-uncased и vlbert/model/pretrained_model respectively.
The script should be launched from the root folder.
Testing model example:
python main.py --image_dir IMAGE_DIR \
--questions_file QUESTIONS_DIR/file_with_questions_and_images_indices.json \
--test_mode True --device "cuda" \
--answer_vocab_file answer_vocab_file.txt \
--properties_file properties_file.txt
The script outputs the file with model answers written line by line that can be used later for model evaluation.
Here you can find pre-trained models for CLEVR question answering. Download them and place in the appropriate folders indicated in Data processing.