Activation Functions are mathematical functions applied to neurons in a neural network. They introduce non-linearity into the network, allowing it to learn complex patterns and relationships in data. Without activation functions, neural networks would behave like a simple linear model regardless of their depth.

Why Activation Functions are Important

Enable the network to learn non-linear patterns
Control the output of each neuron
Help the network converge faster during training
Affect gradient flow, influencing backpropagation and learning

Types of Activation Functions

1. Sigmoid Function

Formula: σ(x) = 1 / (1 + e^-x)
Maps input to a value between 0 and 1
Commonly used for binary classification
Drawbacks: Can suffer from vanishing gradients for large positive or negative inputs

2. Tanh (Hyperbolic Tangent)

Formula: tanh(x) = (e^x – e^-x) / (e^x + e^-x)
Maps input to a value between -1 and 1
Zero-centered output helps in faster convergence
Still susceptible to vanishing gradient issues

3. ReLU (Rectified Linear Unit)

Formula: ReLU(x) = max(0, x)
Output is 0 for negative values and linear for positive values
Advantages: Fast computation and reduces vanishing gradient problem
Drawback: Can lead to dead neurons if values always stay negative

4. Leaky ReLU

Formula: LeakyReLU(x) = x if x > 0 else αx (α small, e.g., 0.01)
Solves the dead neuron problem of ReLU
Allows a small gradient for negative inputs

5. Softmax Function

Converts outputs into probabilities that sum to 1
Used in multi-class classification
Formula: softmax(x_i) = e^(x_i) / Σ e^(x_j) for all j

How Activation Functions Work in Neural Networks

Inputs are multiplied by weights and added to bias
The result (weighted sum) is passed through an activation function
The output becomes input for the next layer or the final prediction

Implementation Example (Python)

import numpy as np# Input value
x = np.array([-1, 0, 1, 2])# Sigmoid
sigmoid = 1 / (1 + np.exp(-x))# Tanh
tanh = np.tanh(x)# ReLU
relu = np.maximum(0, x)# Softmax
softmax = np.exp(x) / np.sum(np.exp(x))print("Sigmoid:", sigmoid)
print("Tanh:", tanh)
print("ReLU:", relu)
print("Softmax:", softmax)

Applications

Sigmoid: Binary classification, logistic regression
Tanh: Hidden layers for faster convergence
ReLU / Leaky ReLU: Hidden layers in deep networks, CNNs
Softmax: Output layer for multi-class classification

Best Practices

Use ReLU or Leaky ReLU for hidden layers in deep networks
Use Sigmoid or Softmax in the output layer based on task type
Avoid using multiple activation functions unnecessarily in the same layer
Monitor vanishing or exploding gradients during training

Conclusion

Activation functions are essential for introducing non-linearity in neural networks. The choice of activation function impacts learning speed, gradient flow, and model performance. Proper selection based on task type ensures efficient and accurate neural network training.

Home » Advanced Machine Learning > Deep Learning > Activation Functions

Free Video Tutorial

Want Mentorship on this Training?

Book a 1-on-1 Consultancy Session

Activation Functions