NumPy & Data

Introduction

NumPy is a powerful Python library used for numerical computing and handling large datasets efficiently. It forms the foundation for data science, machine learning, and scientific computing in Python.

This training will help you understand how to use NumPy for data storage, manipulation, and analysis, making your data tasks faster and more efficient.

Objectives

By the end of this training, you will be able to:

  • Understand the basics of NumPy arrays
  • Perform operations on arrays
  • Handle and manipulate large datasets
  • Apply NumPy in real-world data analysis tasks

1. Understanding NumPy

NumPy, short for Numerical Python, allows you to work with multidimensional arrays and matrices. Unlike Python lists, NumPy arrays are faster and more memory-efficient.

Key features:

  • Supports large, multidimensional arrays and matrices
  • Provides mathematical functions to operate on arrays
  • Efficient memory usage and performance

2. Creating Arrays

You can create arrays using:

  • numpy.array() โ€“ converts a list or tuple into an array
  • numpy.zeros() โ€“ creates an array filled with zeros
  • numpy.ones() โ€“ creates an array filled with ones
  • numpy.arange() โ€“ creates arrays with a range of numbers
  • numpy.linspace() โ€“ creates arrays with evenly spaced numbers

Example:

import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr)

3. Array Operations

NumPy allows vectorized operations which are faster than loops:

  • Arithmetic operations: +, -, *, /
  • Statistical operations: mean(), sum(), max(), min()
  • Aggregation: sum(), cumsum(), prod()

Example:

arr = np.array([1, 2, 3, 4])
print(arr + 5)
print(arr.mean())

4. Indexing and Slicing

Accessing elements in NumPy arrays is easy:

  • Indexing: arr[0] gives the first element
  • Slicing: arr[1:3] gives elements from index 1 to 2
  • Boolean indexing: arr[arr > 2] filters elements greater than 2

5. Reshaping and Combining Arrays

  • reshape() changes the shape of an array
  • concatenate() combines multiple arrays
  • hstack() and vstack() stack arrays horizontally or vertically

Example:

arr = np.arange(6)
arr_reshaped = arr.reshape(2, 3)
print(arr_reshaped)

6. Handling Missing Data

NumPy provides ways to handle missing or invalid data:

  • np.isnan() to check NaN values
  • np.nan_to_num() to replace NaN with a number

7. Working with Large Datasets

NumPy is highly efficient for large datasets:

  • Stores data in contiguous memory
  • Fast mathematical operations on arrays
  • Integration with libraries like Pandas, SciPy, and Matplotlib

8. Real-World Applications

  • Data cleaning and preprocessing
  • Statistical analysis
  • Image processing
  • Machine learning input preparation

Conclusion

NumPy is an essential library for anyone working with data in Python. Mastering NumPy will help you handle large datasets efficiently, perform fast computations, and prepare data for further analysis in data science and machine learning projects.

Home ยป AI Foundations (Beginner Level) > Python for AI > NumPy & Data Handling