Window functions are a type of SQL function that perform calculations across a set of table rows related to the current row. Unlike aggregate functions, window functions do not collapse rows; they preserve the original row structure while providing additional insights.
Key Concepts
- Window
- A window defines the set of rows over which the window function operates.
- The window can be defined using
PARTITION BYandORDER BYclauses.
- Partitioning
PARTITION BYdivides the dataset into subsets (partitions) and the window function is applied within each subset.- Example:
PARTITION BY departmentcalculates results per department.
- Ordering
ORDER BYdefines the sequence of rows within each partition for calculation.- Useful for functions like ranking or running totals.
Common Window Functions
- ROW_NUMBER()
- Assigns a unique sequential number to rows within a partition.
- Example: Identify top-performing employees per department.
- RANK()
- Assigns a rank to each row within a partition, with gaps for ties.
- Useful for competitive ranking scenarios.
- DENSE_RANK()
- Similar to
RANK()but without gaps for ties. - Example: Assign continuous ranks to sales figures.
- Similar to
- SUM() OVER()
- Calculates cumulative sums without collapsing rows.
- Example: Running total of sales.
- AVG() OVER()
- Calculates the average of a column over a window.
- Example: Average salary per department.
- LEAD() and LAG()
LEAD()accesses data from the next row in the window.LAG()accesses data from the previous row.- Useful for comparisons and trend analysis.
Syntax Example
SELECT
employee_id,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
SUM(salary) OVER (PARTITION BY department ORDER BY salary) AS cumulative_salary,
LAG(salary, 1) OVER (PARTITION BY department ORDER BY salary) AS prev_salary
FROM
employees;
Best Practices
- Always define a clear window using
PARTITION BYandORDER BYto avoid unexpected results. - Use window functions when you need row-level insights along with aggregate calculations.
- Avoid overusing window functions on very large datasets as they can impact performance.
Use Cases
- Ranking top sales performers by region.
- Calculating running totals or cumulative averages.
- Comparing current row values with previous or next rows.
- Detecting trends and patterns within partitions of data.