BD3: Building Defects Detection Dataset

The inspection of urban built environments is critical to maintaining structural integrity and safety. However, traditional manual inspection methods are often time-consuming, labor-intensive, prone to human error, and difficult to scale across large urban environments. These limitations become particularly evident in fast-growing cities, where aging infrastructure and the demand for regular inspection outpace the capacity of human inspectors. To overcome these challenges, the use of automated building inspection techniques powered by computer vision has gained significant interest. By employing technologies such as drones and multi-robot systems, these techniques promise to make building inspections faster, more accurate, and scalable. Despite these advances, a major barrier to the development and deployment of robust automated inspection systems is the lack of comprehensive and publicly available datasets. Most existing datasets fail to capture a wide range of structural defects or do not provide enough diversity in image samples, limiting the ability of machine learning models to generalize across different environments and defect types.

To address this gap, we present BD3: Building Defects Detection Dataset, specifically designed to evaluate and develop computer vision techniques for automated building inspections. BD3 consists of two subsets:

Original dataset: 3,965 RGB images annotated to cover six common structural defects, along with normal wall images.
Augmented dataset: 14,000 images created using geometric transformations (rotations, flips) and color adjustments (brightness, contrast, saturation, and hue). This augmentation datset is intended to enhance the dataset's diversity, improving the robustness and generalizability of models trained on BD3.

By making BD3 publicly available, we aim to accelerate the development of reliable and scalable automated inspection systems that can help ensure the safety and longevity of our urban infrastructure.

Building Defects Details

The BD3 dataset contains six defect classes and normal wall images. Below are the descriptions of these defect classes along with the number of image samples available for each:

Defect Name	Description	Number of Images
Algae	Fungi resembling green, brown, or black patches or slime on the surface	624
Major Crack	Cracks with visible gaps	620
Minor Crack	Cracks without visible gaps	580
Peeling	Loss of the outer covering of paint	520
Spalling	Surface break exposing inner material	500
Stain	Visible man-made or natural color marks	521
Normal	Clean walls with no visible signs of defects	600

Sample Images

(a) Algae	(b) Major Crack	(c) Minor Crack	(d) Peeling
(e) Spalling	(f) Stain	(g) Normal

Dataset preparation

The image dataset collection began by inspecting and identifying building structures that were in a maintained condition. More than 50 buildings, constructed at different times and with ages ranging from 10 to 60 years, were visited. For image capture, we used a smartphone with a high-resolution camera, and all samples were taken approximately 1 meter away from the walls. The images were collected both indoors and outdoors across various campus buildings, which had different material surfaces such as concrete and stone. Afterward, the collected data were assembled for preprocessing and cleaning. Annotation was then performed with respect to the specific defect classes, generating the final dataset.

Figure : Dataset preparation work-flow

Benchmarking

To assess the utility and practical usefulness of the BD3 dataset, we benchmarked five deep learning-based image classifiers: Vision Transformers (ViT), VGG16, ResNet18, AlexNet, and MobileNetV2. These models are implemented using pre-trained torchvision.models. The training, validation and test splits are: 60%, 20% and 20%.

Comparison of model performance on the original and augmented datasets.

Model	Original dataset			Augmented dataset
Model	Precision	Recall	F1-score	Precision	Recall	F1-score
ResNet18	0.8320	0.8308	0.8301	0.9915	0.9516	0.9711
VGG16	0.8409	0.8359	0.8363	0.9066	0.9057	0.9056
MobileNetV2	0.8479	0.8422	0.8419	0.8756	0.8750	0.8746
AlexNet	0.8842	0.8801	0.8803	0.9399	0.9389	0.9391
ViTpatch16	0.9342	0.9318	0.9323	0.9880	0.9879	0.9879

Class-wise comparison of the ViT model's performance on the original and augmented datasets.

Class	Original dataset			Augmented dataset
Class	Precision	Recall	F1-score	Precision	Recall	F1-score
Algae	0.9915	0.9516	0.9711	1.0000	0.9975	0.9987
Major crack	0.8761	0.8534	0.8646	0.9794	0.9550	0.9670
Minor crack	0.8417	0.9435	0.8897	0.9612	0.9925	0.9766
Peeling	0.9595	0.9134	0.9359	0.9851	0.9925	0.9887
Stain	0.9166	0.9519	0.9339	0.9950	0.9975	0.9962
Normal	1.0000	0.9916	0.9958	0.9974	0.9925	0.9949

Confusion matrix

Figure: Vision Transformer model confusion matrices on Original and Augmented dataset.

Code

Data Pre-processing - This folder contains Python scripts for renaming images, resizing, and other preprocessing functions.
Image Augmentation -Python code for generating augmented dataset.
Dataset Splitting - Python code for splitting the dataset into training, validation, and test sets.
Model Training and Evaluation - Scripts for training and evaluation of different deep learning models.
Result Analysis - Python scripts to analyze and visualize the results.

Directory structure

.
├── code                         # All Python codes
│   ├── data-process             # Data pre-processing code
│   ├── data-augment-Technq      # Image augmentation code
│   ├── train-test-split         # Data split code
│   ├── model-train              # Model training and evaluation code
│   └── Results                  # Results analysis
├── sample images               # Dataset image sample files
|   ├── class_images
|   |    ├── Algae
|   |    |    ├── ...cls00_001.jpg
|   |    :    :          :
|   |    :    :
├── Results
    ├── model-wise results
    |          :

Citation

TODO...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BD3: Building Defects Detection Dataset

Building Defects Details

Sample Images

Dataset preparation

Benchmarking

Confusion matrix

Code

Directory structure

Citation

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
Results		Results
code		code
sample images		sample images
README.md		README.md
requirements.txt		requirements.txt

Praveenkottari/BD3-Dataset

Folders and files

Latest commit

History

Repository files navigation

BD3: Building Defects Detection Dataset

Building Defects Details

Sample Images

Dataset preparation

Benchmarking

Confusion matrix

Code

Directory structure

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages