Activation functions are a key component of neural networks. They introduce non-linearity into the model, allowing it to learn complex patterns and relationships in data. Without activation functions, a neural network would behave like a simple linear model and fail to capture real-world complexities.
Why Activation Functions are Important
- Enable neural networks to learn non-linear patterns
- Help models make meaningful predictions
- Control how signals pass from one layer to another
- Improve model performance and convergence during training
1. ReLU (Rectified Linear Unit)
ReLU is the most commonly used activation function in deep learning.
Formula
f(x) = max(0, x)
How it Works
- If the input is positive, it returns the same value
- If the input is negative, it returns 0
Advantages
- Simple and computationally efficient
- Helps reduce vanishing gradient problem
- Speeds up training of deep networks
Limitations
- Can suffer from âdead neuronsâ (neurons that stop updating)
Example
import numpy as npdef relu(x):
return np.maximum(0, x)print(relu(np.array([-2, -1, 0, 1, 2])))
2. Sigmoid Function
Sigmoid is widely used for binary classification problems.
Formula
f(x) = 1 / (1 + e^(-x))
How it Works
- Maps input values to a range between 0 and 1
- Useful for probabilities and output layers
Advantages
- Smooth and differentiable
- Suitable for binary outputs
Limitations
- Can cause vanishing gradient problem
- Slower convergence compared to ReLU
Example
def sigmoid(x):
return 1 / (1 + np.exp(-x))print(sigmoid(np.array([-2, 0, 2])))
3. Tanh (Hyperbolic Tangent)
Tanh is similar to sigmoid but outputs values between -1 and 1.
Formula
f(x) = tanh(x)
How it Works
- Centers data around zero
- Useful in hidden layers
Advantages
- Zero-centered output
- Often performs better than sigmoid in hidden layers
Limitations
- Still suffers from vanishing gradient problem
Example
def tanh(x):
return np.tanh(x)print(tanh(np.array([-2, 0, 2])))
Comparison of Activation Functions
- ReLU: Fast, widely used, best for hidden layers
- Sigmoid: Outputs probabilities, best for binary classification output
- Tanh: Zero-centered, better than sigmoid for hidden layers
When to Use Which Function
- Use ReLU in hidden layers for most deep learning models
- Use Sigmoid in the output layer for binary classification
- Use Tanh when you need zero-centered outputs
Applications in Deep Learning
- Image classification and computer vision tasks
- Natural language processing models
- Speech recognition systems
- Neural networks for prediction and classification
Lesson Summary
In this lesson, you learned about activation functions and their role in neural networks. You explored ReLU, Sigmoid, and Tanh functions, their formulas, advantages, limitations, and use cases. Activation functions are essential for enabling neural networks to learn complex patterns and make accurate predictions.