Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated image IDs are non-unique #3

Open
msnidal opened this issue Mar 4, 2020 · 8 comments
Open

Generated image IDs are non-unique #3

msnidal opened this issue Mar 4, 2020 · 8 comments

Comments

@msnidal
Copy link

msnidal commented Mar 4, 2020

Firstly, thanks for creating this script, it was a great help to me.

When I first ran it, it worked almost perfectly, but with one problem - the COCO format image IDs were all over the place, many non-unique (many 0s for example) which breaks the COCO format. I saw how you're generating them as a function of the filename, and given the image IDs have no VOC equivalent, I think it would make more sense to do a strict ordering per image.

I did a hacky solution for now, I'm leaving this issue open so I can come back to it later and open a PR with a fix. If anybody else is having this problem, look for img_id and you can try incrementing it manually for the moment.

@yukkyo
Copy link
Owner

yukkyo commented Mar 7, 2020

@msnidal
Thanks for sharing !

I will also check this problem.
I would be grateful if you could share a way or data to reproduce this problem.

@davidhuangal
Copy link

@msnidal can you please show us your hacky solution?

@amitkumar-delhivery
Copy link

@davidhuangal , this code works on assumption that your file names are according to serial integer. like image1,image2,image3 or any_name1,any_name2... , so if you're having file which is like a_1.jpg,b_1.jpg then reges used in the code assigns the same id. so if you want to solve it then you can use this method:

img_id_dict={}
for filename is filename_list:
    img_id_dict[filename.split(".")[0]]=len(img_id_dict)+1

replace

    if extract_num_from_imgid and isinstance(img_id, str):
        img_id = img_id_dict[img_id]

@dinis-rodrigues
Copy link

Yeah having the same issue.
My images are named like (example):

480_0_36.png
480_0_37.png
...
499_0_5.png
499_0_6.png

And for each filename ("X_Y_Z.png") it assumes the id is always X.

@AntonioNuAc
Copy link

Is there any solution for this?
Does it affect when using 'annotation paths list'?

@SubramanianKrish
Copy link

Yeah. I'm seeing the same here. My test image IDs are J073-xxxxxxxxxx. This fix works

95: for img_id, a_path in enumerate(tqdm(annotation_paths)):
102: img_info['id'] = img_id

@karen-gishyan
Copy link

I see that the issue is still open, which I encountered as well. I share a quite simple solution, which seems to do the job. Adding a simple count generates unique ids.

`
count=0
def get_image_info(annotation_root, extract_num_from_imgid=True):

global count
path = annotation_root.findtext('path')
if path is None:
    filename = annotation_root.findtext('filename')
else:
    filename = os.path.basename(path)
img_name = os.path.basename(filename)
img_id = count
count+=1

# if extract_num_from_imgid and isinstance(img_id, str):
#     img_id = int(re.findall(r'\d+', img_id)[0])

`

If you guys encounter another issue, let me know so we can take a look.

@XudongWang97
Copy link

I have the same issue here. I fixed this issue in my forked repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants