Building the Perfect AI – Part 3: Optimizing and Regularizing Neural Networks

  • Published
  • 5 mins read
You are currently viewing Building the Perfect AI – Part 3: Optimizing and Regularizing Neural Networks

Overview

In Part 2, you deepened your neural network by adding more layers, which gave it more capacity to learn complex patterns. However, as you noticed, the improvement in accuracy may not have been dramatic. This can happen for several reasons, such as the learning rate, insufficient training time, or even the way the model generalizes to unseen data.

In Part 3, we’ll focus on improving performance by using advanced optimization techniques and regularization. These strategies will help prevent overfitting (where the model memorizes training data but fails to generalize) and ensure faster, more efficient learning.

By the end, you’ll see tangible improvements in how the network performs, not just in accuracy but also in stability and learning efficiency.


Step 1: Optimizing with Advanced Optimizers

In Part 2, we used Stochastic Gradient Descent (SGD), a basic optimizer that updates weights based on gradients calculated for each mini-batch of data. While effective, it can be slow and doesn’t always find the most efficient path toward minimizing the loss function.

To address this, we’ll introduce Adam (Adaptive Moment Estimation), one of the most popular optimizers that combines the best features of both SGD and momentum-based optimization.

Code Walkthrough: Switching to Adam Optimizer

  1. Changing the Optimizer:
    In Part 2, you used SGD. Now let’s switch to Adam:
   optimizer = optim.Adam(model.parameters(), lr=0.001)

Explanation:

  • Adam adjusts the learning rate for each parameter individually, based on estimates of first and second moments of the gradients. This allows for faster convergence and more stable updates compared to vanilla SGD.
  • We lowered the learning rate from 0.01 to 0.001 since Adam typically works better with smaller values.

Step 2: Regularization – Preventing Overfitting

When a model performs well on training data but struggles with unseen data, it’s likely overfitting. This is a common problem when a model is too complex (e.g., has too many parameters) for the amount of training data.

To combat overfitting, we’ll introduce two regularization techniques:

  1. Dropout: Randomly “drops” neurons during training, forcing the network to rely on different neurons each time, which helps it generalize better.
  2. L2 Regularization (Weight Decay): Adds a penalty for large weights, preventing the model from relying too much on any single weight.

Code Walkthrough: Adding Dropout and Weight Decay

  1. Modify the Neural Network with Dropout:
   class RegularizedNN(nn.Module):
       def __init__(self):
           super(RegularizedNN, self).__init__()
           self.fc1 = nn.Linear(28 * 28, 256)
           self.dropout1 = nn.Dropout(0.5)
           self.fc2 = nn.Linear(256, 128)
           self.dropout2 = nn.Dropout(0.5)
           self.fc3 = nn.Linear(128, 10)

       def forward(self, x):
           x = x.view(-1, 28 * 28)
           x = torch.relu(self.fc1(x))
           x = self.dropout1(x)  # Apply dropout
           x = torch.relu(self.fc2(x))
           x = self.dropout2(x)  # Apply dropout
           x = self.fc3(x)
           return x

   model = RegularizedNN()

Explanation:

  • Dropout layers are added after each hidden layer. The Dropout(0.5) randomly disables 50% of the neurons in each forward pass during training, which encourages the network to avoid over-relying on specific neurons.
  1. Add L2 Regularization (Weight Decay) to the Optimizer:
   optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=0.01)

Explanation:

  • The weight_decay parameter penalizes large weights, encouraging the model to use smaller weights, which leads to better generalization.

Step 3: Training with Regularization

Let’s train this new regularized model. You’ll follow the same training procedure but with the added regularization techniques.

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=0.01)

# Training loop
num_epochs = 5
for epoch in range(num_epochs):
    running_loss = 0.0
    for inputs, labels in train_loader:
        optimizer.zero_grad()  # Reset gradients
        outputs = model(inputs)  # Forward pass
        loss = criterion(outputs, labels)  # Calculate loss
        loss.backward()  # Backpropagation
        optimizer.step()  # Update weights

        running_loss += loss.item()

    print(f'Epoch {epoch+1}, Loss: {running_loss/len(train_loader)}')

print('Finished Training')Code language: PHP (php)

Step 4: Evaluating the Regularized Model

After training, evaluate the model using the test set as before:

correct = 0
total = 0
with torch.no_grad():
    for inputs, labels in test_loader:
        outputs = model(inputs)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy on the test set: {100 * correct / total}%')Code language: JavaScript (javascript)

Tangible Output: Observing the Results

After running the code, you’ll notice something like this in Spyder:

Epoch 1, Loss: 0.2200
Epoch 2, Loss: 0.1900
...
Accuracy on the test set: 96.5%Code language: CSS (css)

What’s Different?

  • The model should now be more robust, and accuracy should improve noticeably. You might also observe more stability in the loss values during training.
  • The accuracy on the test set should increase due to better generalization, with the dropout layers helping the network to not overfit the training data.

Conceptual Understanding: Striking a Balance in Learning

The introduction of advanced optimizers like Adam and regularization techniques like dropout and weight decay is about balance—helping the model learn efficiently without overfitting.

  • Adam Optimizer improves how the model navigates the “error landscape,” helping it find optimal solutions more quickly.
  • Dropout introduces randomness, allowing the network to discover different pathways for learning.
  • L2 Regularization ensures the model doesn’t become too reliant on any particular connection (weight), encouraging more robust learning.

You’re starting to feel how a perfect AI would be built to balance learning and generalizing. Each added technique serves as a “safety net,” keeping the AI learning effectively without becoming too focused on specifics.


Next Steps

In the next tutorial, we’ll go beyond accuracy metrics. You’ll start visualizing what the neural network actually “sees” and how it processes information internally. We’ll introduce convolutional neural networks (CNNs) for image processing tasks, giving your AI a more powerful way to interpret and analyze visual data.

Each tutorial brings you closer to building a versatile, efficient, and perfect AI—one that feels more intuitive and capable with each step.

Now on to Part 4!

Leave a Reply