Transfer Learning with ResNet50 — Lucy (French Bulldog) Classifier

Published in

Nerd For Tech

4 min readJun 16, 2022

Disclaimer: this article isn’t going to spit out code and steps for how to fine tune an existing neural network — there are already plenty of such articles and you can find the code for this project on GitHub. Instead, this is about the thought process behind my project and what I learned from it.

Motivation for the Project

Everybody who knows me knows how much I love French Bulldogs. Even those who don’t know me would be able to tell just from looking at my website. I’ve become increasingly interested in computer vision and wanted to do a personal project, but didn’t want to toss another MNIST digit, fashion, or Iris classifier into the internet.

Since computer vision tasks vary in level of difficulty, with image classification at the “easy” end and full-blown object detection (which entails both classification and localization of an object within an image) towards the “hard” end, I thought it best to start with classification and work my way up. Note that the way I’m using the term “level of difficulty” has nothing to do with the computer vision task itself, but rather the actual process of building such a classifier or detector.

Data and Model Selection

I decided I would need a custom dataset if I were to make a truly unique image classifier. Naturally, I have loads of images of my Frenchie, Lucy, at my fingertips. Despite having many, I certainly don’t have enough to train my own neural network from scratch.

Transfer learning is perfect for such a circumstance. For those who aren’t familiar, transfer learning is the process of freezing the weights of a pre-trained neural network and training a custom classification head. ResNet50, an existing neural network, was trained on over one million images from the ImageNet database, including French Bulldogs.

I wanted the network to do more than classify an image of a French Bulldog (as it already could). More specifically, I wanted it to discriminate between Lucy and any other random Frenchie. In addition to labeling my own images of Lucy, I sourced images of random French Bulldogs from the Stanford Dogs Dataset. My dataset consisted of training, validation, and test images for two classes — Lucy and “other frenchie.”

Classification Head

It took quite a bit of experimentation to find the existing classification head on my final model. In an effort to heed the advice of Occam’s Razor, I started with the most parsimonious head possible — a Dense layer of 64 neurons, Dropout and Normalization layers to combat overfitting, and a final classification layer.

I wanted the final validation accuracy to be greater than 0.9. No matter how I played with the hyper parameters, I simply could not hit this metric without adding more Dense layers. As you can see (if you look at the Jupyter notebook), I ended up adding three more Dense layers and corresponding Dropout and Normalization layers after each.

Yes, I know I could have used cross validation, but I wanted to experiment with the behavior of the network firsthand.

Results

After fifteen epochs of training, my model reaches a training accuracy of 0.95, validation accuracy of 0.92, and a test accuracy of 0.96 (the larger test accuracy is likely a result of a smaller test dataset compared to the train and validation sets).

Loss decreases steadily with each epoch and accuracy increases incrementally, as can be seen in Figures 1 and 2 below.

Figure 1. Validation loss decreases with each epoch.

Figure 2. Accuracy on the validation set increases to a peak value of 0.95 and a final value of 0.92.

Learnings

Here is what I learned from what I thought would be a relatively simple project:

Data processing can take more time than actual model training and tuning. Finding, labeling, splitting, rescaling, and resizing images, within the correct folder structure (at least for ImageDataGenerators, as I used in this case) can be time-consuming and challenging. Moving forward, I will never underestimate the complexity that this can bring to a project.
Training and validation data sets need to be relatively similar. I knew this before the project, but I experienced it firsthand when I began training with images I scraped from Unsplash (before I used the Stanford dataset). The network’s validation and training accuracies were way too high to be believable. After closer examination, the pictures from Unsplash were very high quality and the Frenchie in each of them was front and center with very consistent lighting conditions. This is not representative of my images of Lucy.
Image classification (and machine learning tasks, generally) is easier understood than implemented. Having a deep understanding of how a neural network like ResNet50 classifies images is one thing, but actually building a classifier requires an entirely additional skillset. Doing it takes more than just knowing how it works — that’s why I did this project. I think I’m ready for the next one (object detection).

Stay tuned.