All required packages can be found here
In this project, I wanted to use Convulutional Neural Networks (CNNs) to classify the age using face images. The easiest and fastest implementation of the CNN was through using FastAI.
I leveraged transfer learning to train my model faster. I also performed data augmentations/transformations of the images to obtain a better generalization.
I formed 14 age brackets as the outputs:
0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59, 60-64, and 65+
After training the models, I wanted to create a webapp using streamlit so people can try it out by using test images or upload images of their own.
Open the faceapp folder and in your python terminal go to the path and run:
streamlit run run.py
The webapp is deployed on heroku at the above link. It may take a while to start.
There were 3 datasets that I tried to use.
- UTKFace - This was the cleanest of the datasets. In some of the ages, there were some errors but was the most structured. ~30,000 images
- IMDb - Large dataset of images of celebrities. A lot are incorrectly labelled or broken ~ 460k images
- WIKI - Less messy than IMDb but not the cleanest either ~62k
Both the IMDb and Wiki datasets were very messy and some images were incorrectly labelled or broken.
Moreoever, the faces in the images were not aligned or cropped as good as the UTKFace dataset.
This made incorporating the images difficult and raises the question:
Is more data better, even in the case that it can be incorrectly labelled or unstructured?
Due to this problem, I attempted 3 versions of the CNN.
- Only using the UTKFace dataset but manually deleteed images that were incorrectly labelled (which wasn't much). The number of images used were ~19,000
- Incorporating some IMDb and Wiki images but manually adding the images that seem correctly labelled and not broken.
- Using all of the data
- Approach v1 achieved over 80% accuracy and if trained for more epochs, probably would've achieved a better accuracy.
Data Transforms were the standard transforms that FastAI's
get_transform
uses. The pretrained model that was used to train on was resnet50's architecture This model performs well for predicting face images that are cropped and aligned perfectly. Although it attained a great accuracy score with this data, it is not very good at predicting new images that aren't structured perfectly. This can be seen in the webapp. - Approach v2 did not perform as well. It achieved a 48% accuracy score. However, this is due to the data being incorrectly labelled because if we look at out top losses it shows us that the model gave a very good guess and the actual class was false.
That being said, it still performed a lot better on test images, especially if they weren't cropped or aligned perfectly compared to v1.
Data Transforms were the standard transforms that FastAI'sget_transform
uses and also random_resize_crop since not all of the images were properly cropped or aligned. The pretrained model that was used to train on was mobilenet_v2's architecture
Below is a demo of the webapp using a picture of myself at age 23 (without a beard I do look younger). Here we see the two predictions of approach 1 and approach 2 respectively. Both models predicted age ranges next to the actual age bracket that I belonged to.
There are a few options to improve this model.
- Obviously cleaning the data is an option but it isn't very realistic, however maybe it is possible to use different applications of computer vision to help us generate similar data of the same classes by using GANs.
- Another option is to keep the first model with our cleaned data and use opencv face cascade to crop the images that we put into our webapp
- Also there wasn't much hyperparameter tuning or feature engineering done. These are also things that can improve the model