Normalization

I realized I haven't mentioned normalization so far. Usually it is common in machine learning problems to pre-process the data as a first step, before training. This often means to zero-mean (center) the data and normalize by standard deviation. Zero-centering helps to reduce the effect of biases in the network. Since the network is learning from data that has been centered around zero, it is less likely to develop biases towards certain features or patterns in the data. The goal for normalization is to ensure that all features are within the same range thus contribute equally.

An interesting technique is batch normalization - where we try to keep the activations as a Gaussian function - it normalizes the input to each of the layers. Batch normalization enables the use of much higher learning rates during training - as normalizing the inputs prevents them from becoming too large or small. This directly helps to prevent exploding and vanishing gradient issues, often faced with high learning rates and complex architectures. Batch normalization can be added as a layer in the network.

I must admit this somehow wasn't very intuitive to me regarding my current project - as I am working with grayscale images with pixel values ranging from 0 to 255. Yet, the results speak for themselves.
when normalization used -> validation loss got 0.3 and validation loss got close to 0.9!

Confusion matrix - no normalization — Loss and accuracy - no normalization

I run the above multiple times to ensure I wasn't just lucky with the initializations.

Visualizing CNN

Converting the model

Discussions

Become a Hackaday.io Member