Introduction
NumPy is a powerful Python library used for numerical computing and handling large datasets efficiently. It forms the foundation for data science, machine learning, and scientific computing in Python.
This training will help you understand how to use NumPy for data storage, manipulation, and analysis, making your data tasks faster and more efficient.
Objectives
By the end of this training, you will be able to:
- Understand the basics of NumPy arrays
- Perform operations on arrays
- Handle and manipulate large datasets
- Apply NumPy in real-world data analysis tasks
1. Understanding NumPy
NumPy, short for Numerical Python, allows you to work with multidimensional arrays and matrices. Unlike Python lists, NumPy arrays are faster and more memory-efficient.
Key features:
- Supports large, multidimensional arrays and matrices
- Provides mathematical functions to operate on arrays
- Efficient memory usage and performance
2. Creating Arrays
You can create arrays using:
numpy.array()โ converts a list or tuple into an arraynumpy.zeros()โ creates an array filled with zerosnumpy.ones()โ creates an array filled with onesnumpy.arange()โ creates arrays with a range of numbersnumpy.linspace()โ creates arrays with evenly spaced numbers
Example:
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr)
3. Array Operations
NumPy allows vectorized operations which are faster than loops:
- Arithmetic operations:
+,-,*,/ - Statistical operations:
mean(),sum(),max(),min() - Aggregation:
sum(),cumsum(),prod()
Example:
arr = np.array([1, 2, 3, 4])
print(arr + 5)
print(arr.mean())
4. Indexing and Slicing
Accessing elements in NumPy arrays is easy:
- Indexing:
arr[0]gives the first element - Slicing:
arr[1:3]gives elements from index 1 to 2 - Boolean indexing:
arr[arr > 2]filters elements greater than 2
5. Reshaping and Combining Arrays
reshape()changes the shape of an arrayconcatenate()combines multiple arrayshstack()andvstack()stack arrays horizontally or vertically
Example:
arr = np.arange(6)
arr_reshaped = arr.reshape(2, 3)
print(arr_reshaped)
6. Handling Missing Data
NumPy provides ways to handle missing or invalid data:
np.isnan()to check NaN valuesnp.nan_to_num()to replace NaN with a number
7. Working with Large Datasets
NumPy is highly efficient for large datasets:
- Stores data in contiguous memory
- Fast mathematical operations on arrays
- Integration with libraries like Pandas, SciPy, and Matplotlib
8. Real-World Applications
- Data cleaning and preprocessing
- Statistical analysis
- Image processing
- Machine learning input preparation
Conclusion
NumPy is an essential library for anyone working with data in Python. Mastering NumPy will help you handle large datasets efficiently, perform fast computations, and prepare data for further analysis in data science and machine learning projects.