Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Evaluation Results on Slake1.0 and PathVQA Datasets #9

Open
taindp98 opened this issue Sep 17, 2024 · 2 comments
Open

Comments

@taindp98
Copy link

taindp98 commented Sep 17, 2024

Hello,

I attempted to replicate the evaluation results presented in the paper using two datasets: Slake1.0 and PathVQA. For this process, I utilized the released data at the provided URL: #6 (comment). However, my results do not match those reported in the paper. Below are the details:

  1. Slake1.0 Dataset: It appears that the checkpoint provided is finetuned without pretraining on the MedTrinity-25M dataset, as my results are very close to the results of LLaVA-Med++ (Ours, w/o) in Table 3 of the paper.
  2. PathVQA Dataset: For the Closed set, I was able to replicate the accuracy as expected. However, in the Open set, the recall was significantly lower than the published results.
slake1 0_results pathvqa_results

To help diagnose these issues, I have attached two images for reference. Each image corresponds to the evaluation process on the two datasets mentioned above.

Could you kindly verify whether the provided fine-tuning checkpoint for Slake1.0 is correct? Additionally, it would be helpful to understand any specific steps necessary to replicate the reported recall values for the PathVQA Open set.

Thank you in advance for your assistance!

@yunfeixie233
Copy link
Contributor

Hi @taindp98,

I apologize for any inconvenience. I will review the issues shortly.

@jinghaoliu
Copy link

Hi @taindp98,

I apologize for any inconvenience. I will review the issues shortly.

Hi Yunfei,

Thank you for your great work. I was just wondering if this issue has been resolved yet? I seem to be getting similar results like taindp98.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants