Activation Functions

Activation Functions are mathematical functions applied to neurons in a neural network. They introduce non-linearity into the network, allowing it to learn complex patterns and relationships in data. Without activation functions, neural networks would behave like a simple linear model regardless of their depth.

Why Activation Functions are Important

  • Enable the network to learn non-linear patterns
  • Control the output of each neuron
  • Help the network converge faster during training
  • Affect gradient flow, influencing backpropagation and learning

Types of Activation Functions

1. Sigmoid Function

  • Formula: σ(x) = 1 / (1 + e^-x)
  • Maps input to a value between 0 and 1
  • Commonly used for binary classification
  • Drawbacks: Can suffer from vanishing gradients for large positive or negative inputs

2. Tanh (Hyperbolic Tangent)

  • Formula: tanh(x) = (e^x – e^-x) / (e^x + e^-x)
  • Maps input to a value between -1 and 1
  • Zero-centered output helps in faster convergence
  • Still susceptible to vanishing gradient issues

3. ReLU (Rectified Linear Unit)

  • Formula: ReLU(x) = max(0, x)
  • Output is 0 for negative values and linear for positive values
  • Advantages: Fast computation and reduces vanishing gradient problem
  • Drawback: Can lead to dead neurons if values always stay negative

4. Leaky ReLU

  • Formula: LeakyReLU(x) = x if x > 0 else αx (α small, e.g., 0.01)
  • Solves the dead neuron problem of ReLU
  • Allows a small gradient for negative inputs

5. Softmax Function

  • Converts outputs into probabilities that sum to 1
  • Used in multi-class classification
  • Formula: softmax(x_i) = e^(x_i) / Σ e^(x_j) for all j

How Activation Functions Work in Neural Networks

  1. Inputs are multiplied by weights and added to bias
  2. The result (weighted sum) is passed through an activation function
  3. The output becomes input for the next layer or the final prediction

Implementation Example (Python)

import numpy as np# Input value
x = np.array([-1, 0, 1, 2])# Sigmoid
sigmoid = 1 / (1 + np.exp(-x))# Tanh
tanh = np.tanh(x)# ReLU
relu = np.maximum(0, x)# Softmax
softmax = np.exp(x) / np.sum(np.exp(x))print("Sigmoid:", sigmoid)
print("Tanh:", tanh)
print("ReLU:", relu)
print("Softmax:", softmax)

Applications

  • Sigmoid: Binary classification, logistic regression
  • Tanh: Hidden layers for faster convergence
  • ReLU / Leaky ReLU: Hidden layers in deep networks, CNNs
  • Softmax: Output layer for multi-class classification

Best Practices

  • Use ReLU or Leaky ReLU for hidden layers in deep networks
  • Use Sigmoid or Softmax in the output layer based on task type
  • Avoid using multiple activation functions unnecessarily in the same layer
  • Monitor vanishing or exploding gradients during training

Conclusion

Activation functions are essential for introducing non-linearity in neural networks. The choice of activation function impacts learning speed, gradient flow, and model performance. Proper selection based on task type ensures efficient and accurate neural network training.

Home » Advanced Machine Learning > Deep Learning > Activation Functions