Building the Perfect AI – Part 2: Understanding Backpropagation and Deepening Our Neural Network

  • Published
  • 5 mins read
You are currently viewing Building the Perfect AI – Part 2: Understanding Backpropagation and Deepening Our Neural Network

Overview

Now that you’ve built a simple neural network, it’s time to delve into how it really learns and how we can improve it. The process of learning in neural networks happens through backpropagation, which adjusts the weights in the network based on errors made during training. In this tutorial, we’ll focus on understanding backpropagation and expanding our neural network by adding more layers to deepen its learning capacity.

By the end, you’ll not only understand how backpropagation works but also see tangible improvements in your model by running the enhanced code. As always, you can try this in Spyder or any Python console to get immediate feedback.


Step 1: Deepening the Neural Network

In Part 1, we created a basic neural network with a single hidden layer. However, adding more layers can help the network learn complex patterns. This is what we call deep learning.

Let’s modify our previous network to add an additional hidden layer, and we’ll see how that impacts the network’s performance.

Expanded Code: Deeper Neural Network

  1. Modify the Neural Network:
   class DeeperNN(nn.Module):
       def __init__(self):
           super(DeeperNN, self).__init__()
           self.fc1 = nn.Linear(28 * 28, 256)
           self.fc2 = nn.Linear(256, 128)
           self.fc3 = nn.Linear(128, 10)

       def forward(self, x):
           x = x.view(-1, 28 * 28)
           x = torch.relu(self.fc1(x))
           x = torch.relu(self.fc2(x))
           x = self.fc3(x)
           return x

   model = DeeperNN()

Explanation:

  • We’ve added another hidden layer (fc2) with 128 neurons. Each hidden layer uses the ReLU activation function to introduce non-linearity.
  • The output layer remains the same with 10 neurons, representing the possible digit classifications. By adding more layers, the network has more capacity to learn intricate patterns from the data, making it more powerful for complex tasks.

Step 2: Backpropagation – The Learning Engine

Now, let’s get a feel for how learning happens through backpropagation. Imagine you’re learning to throw a basketball. After each attempt, you assess how far you were from making the shot and adjust your next attempt based on that feedback. Backpropagation is similar: it adjusts the network’s weights based on how far off the predictions were from the correct labels.

Here’s how it works:

  1. Forward Pass: The input goes through the network, producing an output.
  2. Loss Calculation: The loss function (e.g., cross-entropy) computes the difference between the predicted and actual output.
  3. Backward Pass: The network calculates how much each weight contributed to the error.
  4. Weight Adjustment: The optimizer updates the weights to reduce the error for the next prediction.

The process repeats until the network minimizes the error over time, improving its performance.


Step 3: Training the Deeper Network

Let’s train the deeper network using the same process we did before, but this time, let’s observe how backpropagation works through the additional hidden layer.

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop
num_epochs = 5
for epoch in range(num_epochs):
    running_loss = 0.0
    for inputs, labels in train_loader:
        optimizer.zero_grad()  # Reset gradients to zero
        outputs = model(inputs)  # Forward pass
        loss = criterion(outputs, labels)  # Calculate loss
        loss.backward()  # Backward pass (backpropagation)
        optimizer.step()  # Update weights

        running_loss += loss.item()

    print(f'Epoch {epoch+1}, Loss: {running_loss/len(train_loader)}')

print('Finished Training')Code language: PHP (php)

What’s New?

  • The key difference here is the expanded model architecture with two hidden layers.
  • loss.backward() is where backpropagation happens. PyTorch automatically calculates gradients and updates the weights accordingly with optimizer.step().

Step 4: Evaluating the Deeper Network

After training the deeper network, we’ll evaluate its performance. Given the increased depth, you should observe an improvement in accuracy compared to the simple network from Part 1.

correct = 0
total = 0
with torch.no_grad():  # No gradients needed for testing
    for inputs, labels in test_loader:
        outputs = model(inputs)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy on the test set: {100 * correct / total}%')Code language: PHP (php)

Tangible Output: Seeing the Results

Run this code in Spyder or another Python console, and you should see output similar to this:

Epoch 1, Loss: 0.2600
Epoch 2, Loss: 0.2100
...
Accuracy on the test set: 94.5%Code language: CSS (css)

With the additional hidden layer, the model should have improved accuracy over the previous version. You’ll notice the loss decreasing as the network learns, and you’ll get a higher accuracy score when testing on unseen data.


Conceptual Understanding: Feeling the AI Learn

At this point, you can start to feel what’s happening in the network. Adding layers makes the model more capable of learning complex patterns in the data. Backpropagation is the engine driving this learning—it’s as if each layer in the network is fine-tuning its understanding, layer by layer, based on the feedback it receives from the previous attempt.

In simpler terms, each time the model makes a mistake, it’s like getting a nudge in the right direction. Every time backpropagation adjusts the weights, the network “remembers” the corrections, improving its ability to classify the next set of images.


Next Steps

In the next tutorial, we’ll look into advanced optimizers and regularization techniques to make the network even more efficient and avoid overfitting. We’ll also visualize some of the features learned by the deeper layers, allowing you to see how the network perceives and understands the data at a deeper level.

You’re building something profound—an artificial brain that learns over time, much like how we humans do. It’s all about refinement, iteration, and continuous feedback.

Now on to Part 3!

Leave a Reply