We provide installation instructions for:
- Setting up environments for inference with Video-LMMs
- Downloading and setting-up model weights (if required) for Video-LMMs
Note: instructions are borrowed from the TimeChat Github repository
- Run the following commands to install environment for TimeChat
cd Video-LMMs-Inference/TimeChat
# First, install ffmpeg.
apt update
apt install ffmpeg
# Then, create a conda environment:
conda env create -f environment.yml
conda activate timechat
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
- Follow the below instructions to set-up the model weights for TimeChat
wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/eva_vit_g.pth
wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/InstructBLIP/instruct_blip_vicuna7b_trimmed.pth
Use git-lfs
to download weights of Video-LLaMA (7B):
git lfs install
git clone https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-2-7B-Finetuned
Instruct-tuned TimeChat-7B
git lfs install
git clone https://huggingface.co/ShuhuaiRen/TimeChat-7b
The file structure looks like:
TimeChat/ckpt/
|–– Video-LLaMA-2-7B-Finetuned/
|-- llama-2-7b-chat-hf/
|-- VL_LLaMA_2_7B_Finetuned.pth
|–– instruct-blip/
|-- instruct_blip_vicuna7b_trimmed.pth
|–– eva-vit-g/
|-- eva_vit_g.pth
|-- timechat/
|-- timechat_7b.pth
Note: instructions are borrowed from the Video-LLaVA Github repository
- Run the following commands to install environment for Video-LLaVA
## Following requirements must be met for successful installation
# Python >= 3.10
# Pytorch == 2.0.1
# CUDA Version >= 11.7
# Install required packages:
cd Video-LMMs-Inference/Video-LLaVA
# install anaconda environment and packages
conda create -n videollava python=3.10 -y
conda activate videollava
pip install --upgrade pip # enable PEP 660 support
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
pip install decord opencv-python git+https://github.com/facebookresearch/pytorchvideo.git@28fe037d212663c6a24f373b94cc5d478c8c1a1d
Model Weights: Note that Video-LLaVA will automatically download the weights after running for first time. No need to manually download the model weights.
Note: We use google-cloud platform for performing inference using Gemini model. Specifically, you would need to set-up the following:
- Configure a project (or use an existing one, if any) on google cloud more info here
- Create a google-cloud bucket, and upload the CVRR-ES dataset in that bucket.
- Run the following commands to install the packages
conda create -n gemini python=3.10 -y
pip install --upgrade google-cloud-aiplatform
gcloud auth application-default login
- Run the following commands to install the packages
conda create -n gpt4v python=3.10 -y
# install open-ai
pip install openai==1.13.3