Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad performance for running run_retrieve_tevatron.sh #6

Open
acphile opened this issue Nov 20, 2024 · 0 comments
Open

Bad performance for running run_retrieve_tevatron.sh #6

acphile opened this issue Nov 20, 2024 · 0 comments

Comments

@acphile
Copy link

acphile commented Nov 20, 2024

Hi, I try to build the index of the wiki corpus using the script you provide in scripts/run_retrieve_tevatron.sh. However, I find the performance of retrieval evaluation is very bad.
The command I run is

for s in $(seq -f "%02g" 0 4)
do
CUDA_VISIBLE_DEVICES=${s} python -m tevatron.retriever.driver.encode \
  --output_dir=temp \
  --model_name_or_path BAAI/bge-large-en-v1.5 \
  --normalize True \
  --fp16 \
  --per_device_eval_batch_size 128 \
  --passage_max_len 512 \
  --dataset_name "TIGER-Lab/LongRAG" \
  --dataset_config "hotpot_qa_corpus" \
  --dataset_split "train" \
  --dataset_number_of_shards 4 \
  --encode_output_path emb_bge_official/corpus_emb_${s}.pkl \
  --dataset_shard_index ${s} >${s}.log 2>&1 &
done

CUDA_VISIBLE_DEVICES=0 python -m tevatron.retriever.driver.encode \
  --output_dir=temp \
  --model_name_or_path BAAI/bge-large-en-v1.5  \
  --normalize True \
  --query_prefix "Represent this sentence for searching relevant passages: " \
  --fp16 \
  --per_device_eval_batch_size 256 \
  --dataset_name "TIGER-Lab/LongRAG" \
  --dataset_config "hotpot_qa" \
  --dataset_split "subset_1000" \
  --encode_output_path query_hotpot_1000.pkl \
  --query_max_len 32 \
  --encode_is_query

CUDA_VISIBLE_DEVICES=0 python -m tevatron.retriever.driver.search \
  --query_reps query_hotpot_1000.pkl \
  --passage_reps "emb_bge_official/corpus_emb*.pkl" \
  --depth 200 \
  --batch_size -1 \
  --save_text \
  --save_ranking_to hqa_official_rank_200_new.txt

After checking the implementation of tevatron, I think it does not have the implementation of the max_p design as described in 'Similarity search' Section 2.1 of your paper. Do you mind providing your implementation and commands for similarity search which can be used to reproduce the BGE-large row in your Table 4? Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant