Activation Functions (ReLU, Sigmoid, Tanh)

Activation functions are a key component of neural networks. They introduce non-linearity into the model, allowing it to learn complex patterns and relationships in data. Without activation functions, a neural network would behave like a simple linear model and fail to capture real-world complexities.

Why Activation Functions are Important

  • Enable neural networks to learn non-linear patterns
  • Help models make meaningful predictions
  • Control how signals pass from one layer to another
  • Improve model performance and convergence during training

1. ReLU (Rectified Linear Unit)
ReLU is the most commonly used activation function in deep learning.

Formula
f(x) = max(0, x)

How it Works

  • If the input is positive, it returns the same value
  • If the input is negative, it returns 0

Advantages

  • Simple and computationally efficient
  • Helps reduce vanishing gradient problem
  • Speeds up training of deep networks

Limitations

  • Can suffer from “dead neurons” (neurons that stop updating)

Example

import numpy as npdef relu(x):
return np.maximum(0, x)print(relu(np.array([-2, -1, 0, 1, 2])))

2. Sigmoid Function
Sigmoid is widely used for binary classification problems.

Formula
f(x) = 1 / (1 + e^(-x))

How it Works

  • Maps input values to a range between 0 and 1
  • Useful for probabilities and output layers

Advantages

  • Smooth and differentiable
  • Suitable for binary outputs

Limitations

  • Can cause vanishing gradient problem
  • Slower convergence compared to ReLU

Example

def sigmoid(x):
return 1 / (1 + np.exp(-x))print(sigmoid(np.array([-2, 0, 2])))

3. Tanh (Hyperbolic Tangent)
Tanh is similar to sigmoid but outputs values between -1 and 1.

Formula
f(x) = tanh(x)

How it Works

  • Centers data around zero
  • Useful in hidden layers

Advantages

  • Zero-centered output
  • Often performs better than sigmoid in hidden layers

Limitations

  • Still suffers from vanishing gradient problem

Example

def tanh(x):
return np.tanh(x)print(tanh(np.array([-2, 0, 2])))

Comparison of Activation Functions

  • ReLU: Fast, widely used, best for hidden layers
  • Sigmoid: Outputs probabilities, best for binary classification output
  • Tanh: Zero-centered, better than sigmoid for hidden layers

When to Use Which Function

  • Use ReLU in hidden layers for most deep learning models
  • Use Sigmoid in the output layer for binary classification
  • Use Tanh when you need zero-centered outputs

Applications in Deep Learning

  • Image classification and computer vision tasks
  • Natural language processing models
  • Speech recognition systems
  • Neural networks for prediction and classification

Lesson Summary
In this lesson, you learned about activation functions and their role in neural networks. You explored ReLU, Sigmoid, and Tanh functions, their formulas, advantages, limitations, and use cases. Activation functions are essential for enabling neural networks to learn complex patterns and make accurate predictions.

Home » Deep Learning Foundations (Beginner) > Neural Networks Basics > Activation Functions (ReLU, Sigmoid, Tanh)