Designing a Lightweight Edge-Guided CNN for Segmenting Mirrors and Reflective Surfaces

This work was accepted for full paper presentation at the 2023 International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG 2023), held virtually and in-person in Pilsen, Czech Republic:

The final version of our paper (as published in Computer Science Research Notes) can be accessed via this link.
- Our WSCG 2023 presentation slides can be accessed via this link.
- Our WSCG 2023 presentation video can be viewed on YouTube.
Our dataset of mirrors and reflective surfaces is publicly released for future researchers.

If you find our work useful, please consider citing:

@ARTICLE{2023-E59,
  author={Gonzales, Mark Edward M. and  Uy, Lorene C. and  Ilao, Joel P.},
  title={Designing a Lightweight Edge-Guided Convolutional Neural Network for Segmenting Mirrors and Reflective Surfaces},
  journal={Computer Science Research Notes},
  year={2023},
  volume={3301},
  pages={107-116},
  doi={10.24132/CSRN.3301.14},
  publisher={Union Agency, Science Press},
  issn={2464-4617},
  abbrev_source_title={CSRN},
  document_type={Article},
  source={Scopus}}

This repository is also archived on Zenodo.

Description

ABSTRACT: The detection of mirrors is a challenging task due to their lack of a distinctive appearance and the visual similarity of reflections with their surroundings. While existing systems have achieved some success in mirror segmentation, the design of lightweight models remains unexplored, and datasets are mostly limited to clear mirrors in indoor scenes. In this paper, we propose a new dataset consisting of 454 images of outdoor mirrors and reflective surfaces. We also present a lightweight edge-guided convolutional neural network based on PMDNet. Our model uses EfficientNetV2-Medium as its backbone and employs parallel convolutional layers and a lightweight convolutional block attention module to capture both low-level and high-level features for edge extraction. It registered maximum F-measure scores of 0.8483, 0.8117, and 0.8388 on the Mirror Segmentation Dataset (MSD), Progressive Mirror Detection (PMD) dataset, and our proposed dataset, respectively. Applying filter pruning via geometric median resulted in maximum F-measure scores of 0.8498, 0.7902, and 0.8456, respectively, performing competitively with the state-of-the-art PMDNet but with 78.20× fewer floating-point operations per second and 238.16× fewer parameters.

INDEX TERMS: Mirror segmentation, Object detection, Convolutional neural network (CNN), CNN filter pruning

Running the Model

Training

Run the following command to train the unpruned model:

python train.py

The images should be saved in <training_path>/image.
The ground-truth masks should be saved in <training_path>/mask.
The ground-truth edge maps should be saved in <training_path>/edge.
The training checkpoints will be saved in <checkpoint_path>.
training_path and checkpoint_path can be set in config.py.

To retrain the pruned model, follow the instructions in prune.py.

Prediction

Run the following command to perform prediction using the unpruned model:

python predict.py

Run the following command to perform prediction using the pruned model:

python prune.py

The images should be saved in <testing_path>/<dataset_name>/image.
The file path to the unpruned model weights should be <weights_path>.
The file path to the pruned model weights should be <pruned_weights_path>.
The predicted masks will be saved in <result_path>/<dataset_name>.
testing_path, dataset_name, weights_path, pruned_weights_path, and result_path can be set in config.py.

Evaluation

Run the following command to perform model evaluation:

python misc.py

The predicted masks should be saved in <result_path>/<dataset_name>.
The ground-truth masks should be saved in <testing_path>/<dataset_name>/mask.
result_path, testing_path, and dataset_name can be set in config.py.

Models & Weights

By default, train.py, predict.py, and prune.py use the model defined in pmd.py, which employs an EfficientNetV2-Medium backbone and our proposed edge extraction and fusion module.

To explore the other feature extraction backbones that we considered in our experiments, refer to the models in models_experiments and the weights in this Drive:

Model	Weights
[Best] EfficientNetV2-Medium	Link
[Best, Pruned] EfficentNetV2-Medium	Link
ResNet-50	Link
ResNet-50 (+ PMD's original EDF module)	Link
Xception-65	Link
VoVNet-39	Link
MobileNetV3	Link
EfficientNet-Lite	Link
EfficientNetEdge-Large	Link

EDF stands for edge detection and fusion.

Note: With the exception of ResNet-50 (+ PMD's original EDF module), the models in the table above use our proposed edge extraction and fusion module.

Dataset

Our proposed dataset, DLSU-OMRS (De La Salle University – Outdoor Mirrors and Reflective Surfaces), can be downloaded from this link. The images have their respective licenses, and the ground-truth masks are licensed under the BSD 3-Clause "New" or "Revised" License. The use of this dataset is restricted to noncommercial purposes only.

The split PMD dataset, which we used for model training and evaluation, can be downloaded from this link. Our use of this dataset is under the BSD 3-Clause "New" or "Revised" License.

Dependencies

The following Python libraries and modules (other than those that are part of the Python Standard Library) were used:

Library/Module	Description	License
PyTorch	Provides tensor computation with strong GPU acceleration and deep neural networks built on a tape-based autograd system	BSD 3-Clause License
PyTorch Images Models	Collection of state-of-the-art computer vision models, layers, and utilities	Apache License 2.0
Neural Network Intelligence	Provides tools for hyperparameter optimization, neural architecture search, model compression and feature engineering	MIT License
Pillow	Provides functions for opening, manipulating, and saving image files	Historical Permission Notice and Disclaimer
scikit-image	Provides algorithms for image processing	BSD 3-Clause "New" or "Revised" License
PyDenseCRF	Python wrapper to dense (fully connected) conditional random fields with Gaussian edge potentials.	MIT License
tqdm	Allows the creation of progress bars by wrapping around any iterable	Mozilla Public Licence (MPL) v. 2.0, MIT License
NumPy	Provides a multidimensional array object, various derived objects, and an assortment of routines for fast operations on arrays	BSD 3-Clause "New" or "Revised" License
TensorBoardX	Provides visualization and tooling needed for machine learning experimentation	MIT License

The descriptions are taken from their respective websites.

Note: Although PyDenseCRF can be installed via pip or its official repository, we recommend Windows users to install it by running setup.py inside the pydensecrf directory of our repository to prevent potential issues with Eigen.cpp (refer to this issue for additional details).

Attributions

Attributions for reference source code are provided in the individual Python scripts and in the table below:

Reference	License
H. Mei, G. P. Ji, Z. Wei, X. Yang, X. Wei, and D. P. Fang (2021). "Camouflaged object segmentation with distraction mining," in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, TN, USA: IEEE Computer Society, June 2021, pp. 8768–8877.	BSD 3-Clause "New" or "Revised" License
J. Wei, S. Wang, and Q. Huang, "F³net: Fusion, feedback and focus for salient object detection," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 12321–12328, Apr. 2020.	MIT License
J. Lin, G. Wang, and R. H. Lau, "Progressive mirror detection,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA, USA: IEEE Computer Society, June 2020, pp. 3694–3702.	BSD 3-Clause "New" or "Revised" License

Authors

Mark Edward M. Gonzales
[email protected]
Lorene C. Uy
[email protected]
Dr. Joel P. Ilao
[email protected]

This is the major course output in a computer vision class for master's students under Dr. Joel P. Ilao of the Department of Computer Technology, De La Salle University. The task is to create an eight-week small-scale project that applies computer vision-based techniques to present a solution to an identified research problem.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.github/workflows		.github/workflows
backbone		backbone
models_experiments		models_experiments
pydensecrf		pydensecrf
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Presentation Slides.pdf		Presentation Slides.pdf
README.md		README.md
config.py		config.py
datasets.py		datasets.py
joint_transforms.py		joint_transforms.py
loss.py		loss.py
misc.py		misc.py
pmd.py		pmd.py
predict.py		predict.py
prune.py		prune.py
teaser.png		teaser.png
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Designing a Lightweight Edge-Guided CNN for Segmenting Mirrors and Reflective Surfaces

Table of Contents

Description

Running the Model

Training

Prediction

Evaluation

Models & Weights

Dataset

Dependencies

Attributions

Authors

About

Releases 1

Packages

Languages

License

memgonzales/mirror-segmentation

Folders and files

Latest commit

History

Repository files navigation

Designing a Lightweight Edge-Guided CNN for Segmenting Mirrors and Reflective Surfaces

Table of Contents

Description

Running the Model

Training

Prediction

Evaluation

Models & Weights

Dataset

Dependencies

Attributions

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages