Download the BD3-Dataset.
The inspection of urban built environments is critical to maintaining structural integrity and safety. However, traditional manual inspection methods are often time-consuming, labor-intensive, prone to human error, and difficult to scale across large urban environments. These limitations become particularly evident in fast-growing cities, where aging infrastructure and the demand for regular inspection outpace the capacity of human inspectors. To overcome these challenges, the use of automated building inspection techniques powered by computer vision has gained significant interest. By employing technologies such as drones and multi-robot systems, these techniques promise to make building inspections faster, more accurate, and scalable. Despite these advances, a major barrier to the development and deployment of robust automated inspection systems is the lack of comprehensive and publicly available datasets. Most existing datasets fail to capture a wide range of structural defects or do not provide enough diversity in image samples, limiting the ability of machine learning models to generalize across different environments and defect types.
To address this gap, we present BD3: Building Defects Detection Dataset, specifically designed to evaluate and develop computer vision techniques for automated building inspections. BD3 consists of two subsets:
- Original dataset: 3,965 RGB images annotated to cover six common structural defects, along with normal wall images.
- Augmented dataset: 14,000 images created using geometric transformations (rotations, flips) and color adjustments (brightness, contrast, saturation, and hue). This augmentation datset is intended to enhance the dataset's diversity, improving the robustness and generalizability of models trained on BD3.
By making BD3 publicly available, we aim to accelerate the development of reliable and scalable automated inspection systems that can help ensure the safety and longevity of our urban infrastructure.
The BD3 dataset contains six defect classes and normal wall images. Below are the descriptions of these defect classes along with the number of image samples available for each:
Defect Name | Description | Number of Images |
---|---|---|
Algae | Fungi resembling green, brown, or black patches or slime on the surface | 624 |
Major Crack | Cracks with visible gaps | 620 |
Minor Crack | Cracks without visible gaps | 580 |
Peeling | Loss of the outer covering of paint | 520 |
Spalling | Surface break exposing inner material | 500 |
Stain | Visible man-made or natural color marks | 521 |
Normal | Clean walls with no visible signs of defects | 600 |
(a) Algae |
(b) Major Crack |
(c) Minor Crack |
(d) Peeling |
(e) Spalling |
(f) Stain |
(g) Normal |
The image dataset collection began by inspecting and identifying building structures that were in a maintained condition. More than 50 buildings, constructed at different times and with ages ranging from 10 to 60 years, were visited. For image capture, we used a smartphone with a high-resolution camera, and all samples were taken approximately 1 meter away from the walls. The images were collected both indoors and outdoors across various campus buildings, which had different material surfaces such as concrete and stone. Afterward, the collected data were assembled for preprocessing and cleaning. Annotation was then performed with respect to the specific defect classes, generating the final dataset.
Figure : Dataset preparation work-flow
To assess the utility and practical usefulness of the BD3 dataset, we benchmarked five deep learning-based image classifiers: Vision Transformers (ViT), VGG16, ResNet18, AlexNet, and MobileNetV2. These models are implemented using pre-trained torchvision.models
. The training, validation and test splits are: 60%, 20% and 20%.
Model | Original dataset | Augmented dataset | ||||
---|---|---|---|---|---|---|
Precision | Recall | F1-score | Precision | Recall | F1-score | |
ResNet18 | 0.8320 | 0.8308 | 0.8301 | 0.9915 | 0.9516 | 0.9711 |
VGG16 | 0.8409 | 0.8359 | 0.8363 | 0.9066 | 0.9057 | 0.9056 |
MobileNetV2 | 0.8479 | 0.8422 | 0.8419 | 0.8756 | 0.8750 | 0.8746 |
AlexNet | 0.8842 | 0.8801 | 0.8803 | 0.9399 | 0.9389 | 0.9391 |
ViTpatch16 | 0.9342 | 0.9318 | 0.9323 | 0.9880 | 0.9879 | 0.9879 |
Class | Original dataset | Augmented dataset | ||||
---|---|---|---|---|---|---|
Precision | Recall | F1-score | Precision | Recall | F1-score | |
Algae | 0.9915 | 0.9516 | 0.9711 | 1.0000 | 0.9975 | 0.9987 |
Major crack | 0.8761 | 0.8534 | 0.8646 | 0.9794 | 0.9550 | 0.9670 |
Minor crack | 0.8417 | 0.9435 | 0.8897 | 0.9612 | 0.9925 | 0.9766 |
Peeling | 0.9595 | 0.9134 | 0.9359 | 0.9851 | 0.9925 | 0.9887 |
Stain | 0.9166 | 0.9519 | 0.9339 | 0.9950 | 0.9975 | 0.9962 |
Normal | 1.0000 | 0.9916 | 0.9958 | 0.9974 | 0.9925 | 0.9949 |
Figure: Vision Transformer model confusion matrices on Original and Augmented dataset.
- Data Pre-processing - This folder contains Python scripts for renaming images, resizing, and other preprocessing functions.
- Image Augmentation -Python code for generating augmented dataset.
- Dataset Splitting - Python code for splitting the dataset into training, validation, and test sets.
- Model Training and Evaluation - Scripts for training and evaluation of different deep learning models.
- Result Analysis - Python scripts to analyze and visualize the results.
.
├── code # All Python codes
│ ├── data-process # Data pre-processing code
│ ├── data-augment-Technq # Image augmentation code
│ ├── train-test-split # Data split code
│ ├── model-train # Model training and evaluation code
│ └── Results # Results analysis
├── sample images # Dataset image sample files
| ├── class_images
| | ├── Algae
| | | ├── ...cls00_001.jpg
| | : : :
| | : :
├── Results
├── model-wise results
| :
TODO...