Activation Functions are mathematical functions applied to neurons in a neural network. They introduce non-linearity into the network, allowing it to learn complex patterns and relationships in data. Without activation functions, neural networks would behave like a simple linear model regardless of their depth.
Why Activation Functions are Important
- Enable the network to learn non-linear patterns
- Control the output of each neuron
- Help the network converge faster during training
- Affect gradient flow, influencing backpropagation and learning
Types of Activation Functions
1. Sigmoid Function
- Formula: σ(x) = 1 / (1 + e^-x)
- Maps input to a value between 0 and 1
- Commonly used for binary classification
- Drawbacks: Can suffer from vanishing gradients for large positive or negative inputs
2. Tanh (Hyperbolic Tangent)
- Formula: tanh(x) = (e^x – e^-x) / (e^x + e^-x)
- Maps input to a value between -1 and 1
- Zero-centered output helps in faster convergence
- Still susceptible to vanishing gradient issues
3. ReLU (Rectified Linear Unit)
- Formula: ReLU(x) = max(0, x)
- Output is 0 for negative values and linear for positive values
- Advantages: Fast computation and reduces vanishing gradient problem
- Drawback: Can lead to dead neurons if values always stay negative
4. Leaky ReLU
- Formula: LeakyReLU(x) = x if x > 0 else αx (α small, e.g., 0.01)
- Solves the dead neuron problem of ReLU
- Allows a small gradient for negative inputs
5. Softmax Function
- Converts outputs into probabilities that sum to 1
- Used in multi-class classification
- Formula: softmax(x_i) = e^(x_i) / Σ e^(x_j) for all j
How Activation Functions Work in Neural Networks
- Inputs are multiplied by weights and added to bias
- The result (weighted sum) is passed through an activation function
- The output becomes input for the next layer or the final prediction
Implementation Example (Python)
import numpy as np# Input value
x = np.array([-1, 0, 1, 2])# Sigmoid
sigmoid = 1 / (1 + np.exp(-x))# Tanh
tanh = np.tanh(x)# ReLU
relu = np.maximum(0, x)# Softmax
softmax = np.exp(x) / np.sum(np.exp(x))print("Sigmoid:", sigmoid)
print("Tanh:", tanh)
print("ReLU:", relu)
print("Softmax:", softmax)
Applications
- Sigmoid: Binary classification, logistic regression
- Tanh: Hidden layers for faster convergence
- ReLU / Leaky ReLU: Hidden layers in deep networks, CNNs
- Softmax: Output layer for multi-class classification
Best Practices
- Use ReLU or Leaky ReLU for hidden layers in deep networks
- Use Sigmoid or Softmax in the output layer based on task type
- Avoid using multiple activation functions unnecessarily in the same layer
- Monitor vanishing or exploding gradients during training
Conclusion
Activation functions are essential for introducing non-linearity in neural networks. The choice of activation function impacts learning speed, gradient flow, and model performance. Proper selection based on task type ensures efficient and accurate neural network training.