This project is an implementation from scratch of batch normalization in LeNet network using tensorflow.keras on mnist dataset. The goal is to learn and characterize batch normalization impact on a neural network performance.
Parameter | Explanation | Default |
---|---|---|
batch_size | Batch size for training | 64 |
output_size | output size, number of classes | 10 |
epochs | number of epochs for training | 10 |
optimizer | optimizer for training | 'adam' |
epsilon | number of epochs for moving standardization during inference time | 0.00000001 |
window_size | window size for averaging in batchnorm algorithm | 5 |
verbose | verbosity | 1 |
flag_visualizations | flag for presenting plots of the training process | 1 |
-
main.py - for the execution it is needed to choose the network for training and evaluation from the following possibilities:
- 'lenet' - the original LeNet network with no batch normalization involved;
- 'lenet_bn1' - the same network but adding batch normalization on the first convolutional layer;
- 'lenet_bn2' - the same as (2) but adding batch normalization also on the the second convolutional layer;
- lenet_fc_bn1 - the same as (3) but adding batch normalization on the first fully connected layer;
- 'lenet_fc_bn2' - the same as (4) but adding batch normalization also on the second fully connected layer.
-
models_handling.py - lower level than main. Here the selected model is recognized, compiled, trained and evaluated.
-
lenet.py - where the network classes are defined. Here you can find the original network, and the variations that include the batch normalization.
-
utils.py - functions mainly for loading and preprocessing the data. e. visualizations.py - where the functions for plotting accuracy and loss graphs are.
-
results - where it is possible to visualize the training and validation accuracy during the training of each of the networks used.
Comparing the performance of LeNet with additions of BatchNormalization layers:
The best results are for LeNet with BatchNormalization on each of the layers, both fully connected and convolutional layers. However BatchNormalization on the last (fully connected) layer does not seem to be very significant. In general there is an evident trend that shows that BatchNormalization both accelerate the learning and increases the final accuracy.
Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." In International conference on machine learning, pp. 448-456. PMLR, 2015.
@inproceedings{ioffe2015batch,
title={Batch normalization: Accelerating deep network training by reducing internal covariate shift},
author={Ioffe, Sergey and Szegedy, Christian},
booktitle={International conference on machine learning},
pages={448--456},
year={2015},
organization={PMLR}
}
LeCun, Yann, Léon Bottou, Yoshua Bengio, and Patrick Haffner. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86, no. 11 (1998): 2278-2324.
@ARTICLE{726791,
author={Lecun, Y. and Bottou, L. and Bengio, Y. and Haffner, P.},
journal={Proceedings of the IEEE},
title={Gradient-based learning applied to document recognition},
year={1998},
volume={86},
number={11},
pages={2278-2324},
doi={10.1109/5.726791}}