Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluation script gives wrong accuracy #7

Open
ovidiunitu opened this issue Jun 5, 2018 · 4 comments
Open

evaluation script gives wrong accuracy #7

ovidiunitu opened this issue Jun 5, 2018 · 4 comments

Comments

@ovidiunitu
Copy link

While testing my solution I noticed this odd behavior. (See the picture below)

image
As you can see, my generated answer is 'none'.
According to the evaluation metric the correct accuracy should be 30% because there is one answer the same as mine.
I think this is happening because of the processing done before evaluation. In file vqaEval.py line 42, the answer 'none' is replaced with '0' . Because there is not any '0' in ground truth answers, the accuracy is set to 0.00%. If I remove 'none': '0', from manualMap dictionary I get the right accuracy for this question (30%).

If it helps, the id of this question is 411188011, and the name of the picture is COCO_val2014_000000411188.jpg

Can you look more into it? I hope I didn't miss anything.

@AishwaryaAgrawal
Copy link
Contributor

Thanks for bringing up the issue and looking into potential reason! I will look more into it and get back to you.

@guoyang9
Copy link

guoyang9 commented Dec 1, 2018

BTW, should we replace all the number words to digits, e.g., 'one on left' to '1 on left'?

Another concern is that some of the answers in the Annotations are just 'a' or 'the', is it appropriate to just delete these ones?

Hope I didn't mistake them.

@baiyuting
Copy link

did it get solved?

@chilljudaoren
Copy link

did it get solved?

Where is the annFile?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants