Model regularization and why you need it


Hello Reader,

Welcome to another edition of the AIFEE newsletter!

AIFEE stands for Artificial Intelligence For Everyone and Everything!

Another week, another Machine Learning topic!

The Curse of Overfitting

​

In machine learning, there is a problem that always pops-up which is overfitting!

This happens when your model starts "memorizing" your data instead of learning from it.

We usually capture this by computing some metric over the training set and then on the validation set. If there is a significant difference between the metric values in these 2 sets then your model has overfit the training set.

Regularization comes to the rescue!

Regularization is a set of techniques that are used to help the model avoid overfitting. There are several of these techniques but two of the widely used ones are : data augmentation and using dropout.

Regularization using data augmentation

In Tensorflow, you can do data augmentation in several different ways. And if you mostly work on computer vision applications (like I do) then Tensorflow offers a variety of ways to augment your images. Below are 3 of them.

Data augmentation using Tensorflow layers

Tensorflow gives you the ability to add augmentation of your data within the model itself. This can be done by incorporating some specific layers inside your model. Examples of these layers include : rescaling and random contrast.

Here's an example implementation:

​

This is a function that returns a Tensorflow model. The first layer is the input layer and the next 2 layers are actually augmentation layers. Now, it might be the case that some augmentation is only applied during training (random contrast in the example above). In this case, you can control which layers are used during training and which ones are used during testing by adding an argument to your function.

Data augmentation using tf.image

Tensorflow has a features rich module for operations applied to images. This module can be accessed using : tensorflow.image.

You can use the utilities in this module to create a function that augment your images. Then you can pass this function to your data pipeline.

For example, if you had created your dataset using the tf.data.Dataset utility then you can augment your dataset just like in the following figure.

​

Data augmentation using generators

You can also augment your images dataset using generators. Tensorflow has a utility class called ImageDateGenerator. You can use it to create generators that augment your images during training. Here's an example on how to use it.

​

In the code above, you would be reading your images from a directory 'data/images' and then augmenting them using ImageDataGenerator. I've personally used this approach many times.

Regularization using Dropout layer

Another approach to tackle the overfitting problem is by using the Dropout layer. This layer gets an argument called rate, which is used to set to zero a certain percentage of the inputs to this layer. The rest of the inputs are scaled by a certain factor to keep the overall sum of the inputs unchanged. Here's an example of how to it.

​

Dropout is another layer that you would only use during training, hence we condition its use by an argument (is_training).

Conclusion

This week's edition has been about how to use some regularization techniques like data augmentation and dropout to tackle the problem of overfitting. I also showed you some Tensorflow utilities that you can use to apply these techniques. If you have further questions or remarks do not hesitate to contact me on LinkedIn, Twitter or by responding to this email!

Machine Learning for Medical Imaging

👉 Learn how to build AI systems for medical imaging domain by leveraging tools and techniques that I share with you! | 💡 The newsletter is read by people from: Nvidia, Baker Hughes, Harvard, NYU, Columbia University, University of Toronto and more!

Read more from Machine Learning for Medical Imaging

Hi Reader! I hope you're doing well in this fine weekend! In the past weeks I've been working on implementing basic image segmentation models for 2D and 3D from scratch. While doing so, I found a few things that were delightfully surprising while other things were painfully irritating. I tell you all about it in this edition of the newsletter! What Building AI Models from Scratch has Thought me One of the reasons why I did these experimentations was to understand some of the nitty gritty...

Hi Reader, I haven't sent you a newsletter email for some time now. This is because there are major events happening in my personal life. We just had our first kid, so I'm still trying to adapt to the new routine set by this cute little creature! I also changed my office! I used to work from home, but now I am working in a coworking space. I'm hoping that this will help me deliver more value to the newsletter subscribers as well as our clients at PYCAD. Now, back to the newsletter! I've got...

Dental implant - Wikipedia

Hello Reader, Welcome to another edition of PYCAD newsletter where we cover interesting topics in Machine Learning and Computer Vision applied to Medical Imaging. The goal of this newsletter is to help you stay up-to-date and learn important concepts in this amazing field! I've got some cool insights for you below ↓ Applications of Machine Learning for Dentistry At PYCAD, we have worked a lot on the applications of AI to the dentistry domain. Here are 3 incredible ones. 1 - Diagnosis and...