Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during exporting of split dataset (with null class values). #32

Open
isspid opened this issue Feb 27, 2022 · 8 comments
Open

Error during exporting of split dataset (with null class values). #32

isspid opened this issue Feb 27, 2022 · 8 comments

Comments

@isspid
Copy link

isspid commented Feb 27, 2022

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\Anaconda3\envs\yolov5\lib\site-packages\pandas\core\ops\array_ops.py in na_arithmetic_op(left, right, op, is_cmp)
    142     try:
--> 143         result = expressions.evaluate(op, left, right)
    144     except TypeError:

~\Anaconda3\envs\yolov5\lib\site-packages\pandas\core\computation\expressions.py in evaluate(op, a, b, use_numexpr)
    232         if use_numexpr:
--> 233             return _evaluate(op, op_str, a, b)  # type: ignore
    234     return _evaluate_standard(op, op_str, a, b)

~\Anaconda3\envs\yolov5\lib\site-packages\pandas\core\computation\expressions.py in _evaluate_standard(op, op_str, a, b)
     67     with np.errstate(all="ignore"):
---> 68         return op(a, b)
     69 

TypeError: can't multiply sequence by non-int of type 'float'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_59916/1620987889.py in <module>
----> 1 dataset.export.ExportToYoloV5(output_path='model_training/labels',yaml_file='dataset.yaml', use_splits=True)

~\Anaconda3\envs\yolov5\lib\site-packages\pylabel\exporter.py in ExportToYoloV5(self, output_path, yaml_file, copy_images, use_splits, cat_id_index)
    486 
    487         yolo_dataset["center_x_scaled"] = (
--> 488             yolo_dataset["ann_bbox_xmin"] + (yolo_dataset["ann_bbox_width"] * 0.5)
    489         ) / yolo_dataset["img_width"]
    490         yolo_dataset["center_y_scaled"] = (

~\Anaconda3\envs\yolov5\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
     63         other = item_from_zerodim(other)
     64 
---> 65         return method(self, other)
     66 
     67     return new_method

~\Anaconda3\envs\yolov5\lib\site-packages\pandas\core\ops\__init__.py in wrapper(left, right)
    341         lvalues = extract_array(left, extract_numpy=True)
    342         rvalues = extract_array(right, extract_numpy=True)
--> 343         result = arithmetic_op(lvalues, rvalues, op)
    344 
    345         return left._construct_result(result, name=res_name)

~\Anaconda3\envs\yolov5\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
    188     else:
    189         with np.errstate(all="ignore"):
--> 190             res_values = na_arithmetic_op(lvalues, rvalues, op)
    191 
    192     return res_values

~\Anaconda3\envs\yolov5\lib\site-packages\pandas\core\ops\array_ops.py in na_arithmetic_op(left, right, op, is_cmp)
    148             #  will handle complex numbers incorrectly, see GH#32047
    149             raise
--> 150         result = masked_arith_op(left, right, op)
    151 
    152     if is_cmp and (is_scalar(result) or result is NotImplemented):

~\Anaconda3\envs\yolov5\lib\site-packages\pandas\core\ops\array_ops.py in masked_arith_op(x, y, op)
    110         if mask.any():
    111             with np.errstate(all="ignore"):
--> 112                 result[mask] = op(xrav[mask], y)
    113 
    114     result, _ = maybe_upcast_putmask(result, ~mask, np.nan)

TypeError: can't multiply sequence by non-int of type 'float'
@isspid
Copy link
Author

isspid commented Feb 27, 2022

This happens when exporting to YOLOv5.

@isspid
Copy link
Author

isspid commented Feb 27, 2022

Running the example notebook on Google Colab works fine, for information. My dataset was imported as YOLOv5, will try now to import as COCO and see if there is a difference.

@alexheat
Copy link
Contributor

I suspect that this is also caused by your images without annotations, which creates null values.

If it works on Colab it may be because Colab has a different version of pandas. I see this error message above "TypeError: can't multiply sequence by non-int of type 'float'" pandas sometimes changes the datatype of a column which causes errors like this.

@isspid
Copy link
Author

isspid commented Feb 27, 2022

So I imported the dataset using ImportYOLOv5 successfully, then exported to COCO, also successfully. The problem is that when I import the newly exported dataset, errors start to occur such as:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
pandas\_libs\lib.pyx in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "nan"

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_46392/991218777.py in <module>
      1 print(f"Number of images: {dataset1.analyze.num_images}")
      2 print(f"Number of classes: {dataset1.analyze.num_classes}")
----> 3 print(f"Classes:{dataset1.analyze.classes}")
      4 print(f"Class counts:\n{dataset1.analyze.class_counts}")

c:\Users\istaka\tmp\pylabel\pylabel\pylabel\analyze.py in classes(self)
     27             r"^\s*$", np.nan, regex=True
     28         )
---> 29         pd.to_numeric(self.dataset.df["cat_id"])
     30 
     31         filtered_df = self.dataset.df[self.dataset.df["cat_id"].notnull()]

~\Anaconda3\envs\yolov5\lib\site-packages\pandas\core\tools\numeric.py in to_numeric(arg, errors, downcast)
    151         try:
    152             values = lib.maybe_convert_numeric(
--> 153                 values, set(), coerce_numeric=coerce_numeric
    154             )
    155         except (ValueError, TypeError):

pandas\_libs\lib.pyx in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "nan" at position 13

I think these problems stem from the NAN class. Maybe a better solution would be to not consider it a class rather ignore the images that don't contain labels.

@isspid
Copy link
Author

isspid commented Feb 27, 2022

I suspect that this is also caused by your images without annotations, which creates null values.

If it works on Colab it may be because Colab has a different version of pandas. I see this error message above "TypeError: can't multiply sequence by non-int of type 'float'" pandas sometimes changes the datatype of a column which causes errors like this.

I have to add that I tried the Google Colab notebook with your sample dataset, not mine.

@alexheat
Copy link
Contributor

In terms of machine learning principles, I think it is valid to have images without labels to train a model. So I will try to get it to work and add more test cases.

But in the mean time, you can filter out the the rows without labels like this after you import it.

dataset.df = dataset.df[dataset.df.cat_id.notnull()]

@isspid
Copy link
Author

isspid commented Feb 27, 2022

Appreciate the concern and thanks for the tip. Will try it out and let you know if sth goes wrong. In the mean time, should I close this issue or leave it open?

@alexheat alexheat changed the title Error during exporting of split dataset. Error during exporting of split dataset (with null class values). Feb 27, 2022
@alexheat
Copy link
Contributor

You can leave it open and thank you for reporting it. There is a lot of diversity in how people create and save datasets so it is helpful to see other peoples examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants