Skip to content

Latest commit

 

History

History
170 lines (130 loc) · 13.8 KB

README.md

File metadata and controls

170 lines (130 loc) · 13.8 KB

Designing a Lightweight Edge-Guided CNN for Segmenting Mirrors and Reflective Surfaces

badge badge PyTorch
Actions Status badge

This work was accepted for full paper presentation at the 2023 International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG 2023), held virtually and in-person in Pilsen, Czech Republic:

  • The final version of our paper (as published in Computer Science Research Notes) can be accessed via this link.
    • Our WSCG 2023 presentation slides can be accessed via this link.
    • Our WSCG 2023 presentation video can be viewed on YouTube.
  • Our dataset of mirrors and reflective surfaces is publicly released for future researchers.

If you find our work useful, please consider citing:

@ARTICLE{2023-E59,
  author={Gonzales, Mark Edward M. and  Uy, Lorene C. and  Ilao, Joel P.},
  title={Designing a Lightweight Edge-Guided Convolutional Neural Network for Segmenting Mirrors and Reflective Surfaces},
  journal={Computer Science Research Notes},
  year={2023},
  volume={3301},
  pages={107-116},
  doi={10.24132/CSRN.3301.14},
  publisher={Union Agency, Science Press},
  issn={2464-4617},
  abbrev_source_title={CSRN},
  document_type={Article},
  source={Scopus}}

This repository is also archived on Zenodo.

Table of Contents

Description

ABSTRACT: The detection of mirrors is a challenging task due to their lack of a distinctive appearance and the visual similarity of reflections with their surroundings. While existing systems have achieved some success in mirror segmentation, the design of lightweight models remains unexplored, and datasets are mostly limited to clear mirrors in indoor scenes. In this paper, we propose a new dataset consisting of 454 images of outdoor mirrors and reflective surfaces. We also present a lightweight edge-guided convolutional neural network based on PMDNet. Our model uses EfficientNetV2-Medium as its backbone and employs parallel convolutional layers and a lightweight convolutional block attention module to capture both low-level and high-level features for edge extraction. It registered maximum F-measure scores of 0.8483, 0.8117, and 0.8388 on the Mirror Segmentation Dataset (MSD), Progressive Mirror Detection (PMD) dataset, and our proposed dataset, respectively. Applying filter pruning via geometric median resulted in maximum F-measure scores of 0.8498, 0.7902, and 0.8456, respectively, performing competitively with the state-of-the-art PMDNet but with 78.20× fewer floating-point operations per second and 238.16× fewer parameters.

INDEX TERMS: Mirror segmentation, Object detection, Convolutional neural network (CNN), CNN filter pruning

Teaser Figure

Running the Model

Training

Run the following command to train the unpruned model:

python train.py
  • The images should be saved in <training_path>/image.
  • The ground-truth masks should be saved in <training_path>/mask.
  • The ground-truth edge maps should be saved in <training_path>/edge.
  • The training checkpoints will be saved in <checkpoint_path>.
  • training_path and checkpoint_path can be set in config.py.

To retrain the pruned model, follow the instructions in prune.py.

Prediction

Run the following command to perform prediction using the unpruned model:

python predict.py

Run the following command to perform prediction using the pruned model:

python prune.py
  • The images should be saved in <testing_path>/<dataset_name>/image.
  • The file path to the unpruned model weights should be <weights_path>.
  • The file path to the pruned model weights should be <pruned_weights_path>.
  • The predicted masks will be saved in <result_path>/<dataset_name>.
  • testing_path, dataset_name, weights_path, pruned_weights_path, and result_path can be set in config.py.

Evaluation

Run the following command to perform model evaluation:

python misc.py
  • The predicted masks should be saved in <result_path>/<dataset_name>.
  • The ground-truth masks should be saved in <testing_path>/<dataset_name>/mask.
  • result_path, testing_path, and dataset_name can be set in config.py.

Models & Weights

By default, train.py, predict.py, and prune.py use the model defined in pmd.py, which employs an EfficientNetV2-Medium backbone and our proposed edge extraction and fusion module.

To explore the other feature extraction backbones that we considered in our experiments, refer to the models in models_experiments and the weights in this Drive:

Model Weights
[Best] EfficientNetV2-Medium Link
[Best, Pruned] EfficentNetV2-Medium Link
ResNet-50 Link
ResNet-50 (+ PMD's original EDF module) Link
Xception-65 Link
VoVNet-39 Link
MobileNetV3 Link
EfficientNet-Lite Link
EfficientNetEdge-Large Link

EDF stands for edge detection and fusion.

Note: With the exception of ResNet-50 (+ PMD's original EDF module), the models in the table above use our proposed edge extraction and fusion module.

Dataset

DOI

Our proposed dataset, DLSU-OMRS (De La Salle University – Outdoor Mirrors and Reflective Surfaces), can be downloaded from this link. The images have their respective licenses, and the ground-truth masks are licensed under the BSD 3-Clause "New" or "Revised" License. The use of this dataset is restricted to noncommercial purposes only.

The split PMD dataset, which we used for model training and evaluation, can be downloaded from this link. Our use of this dataset is under the BSD 3-Clause "New" or "Revised" License.

Dependencies

The following Python libraries and modules (other than those that are part of the Python Standard Library) were used:

Library/Module Description License
PyTorch Provides tensor computation with strong GPU acceleration and deep neural networks built on a tape-based autograd system BSD 3-Clause License
PyTorch Images Models Collection of state-of-the-art computer vision models, layers, and utilities Apache License 2.0
Neural Network Intelligence Provides tools for hyperparameter optimization, neural architecture search, model compression and feature engineering MIT License
Pillow Provides functions for opening, manipulating, and saving image files Historical Permission Notice and Disclaimer
scikit-image Provides algorithms for image processing BSD 3-Clause "New" or "Revised" License
PyDenseCRF Python wrapper to dense (fully connected) conditional random fields with Gaussian edge potentials. MIT License
tqdm Allows the creation of progress bars by wrapping around any iterable Mozilla Public Licence (MPL) v. 2.0, MIT License
NumPy Provides a multidimensional array object, various derived objects, and an assortment of routines for fast operations on arrays BSD 3-Clause "New" or "Revised" License
TensorBoardX Provides visualization and tooling needed for machine learning experimentation MIT License

The descriptions are taken from their respective websites.

Note: Although PyDenseCRF can be installed via pip or its official repository, we recommend Windows users to install it by running setup.py inside the pydensecrf directory of our repository to prevent potential issues with Eigen.cpp (refer to this issue for additional details).

Attributions

Attributions for reference source code are provided in the individual Python scripts and in the table below:

Reference License
H. Mei, G. P. Ji, Z. Wei, X. Yang, X. Wei, and D. P. Fang (2021). "Camouflaged object segmentation with distraction mining," in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, TN, USA: IEEE Computer Society, June 2021, pp. 8768–8877. BSD 3-Clause "New" or "Revised" License
J. Wei, S. Wang, and Q. Huang, "F³net: Fusion, feedback and focus for salient object detection," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 12321–12328, Apr. 2020. MIT License
J. Lin, G. Wang, and R. H. Lau, "Progressive mirror detection,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA, USA: IEEE Computer Society, June 2020, pp. 3694–3702. BSD 3-Clause "New" or "Revised" License

Authors

This is the major course output in a computer vision class for master's students under Dr. Joel P. Ilao of the Department of Computer Technology, De La Salle University. The task is to create an eight-week small-scale project that applies computer vision-based techniques to present a solution to an identified research problem.