FIT3003 - W8 - SQL Notes about RANK() OVER (PARTITION BY x ORDER BY y)

type

Post

Created date

Oct 3, 2022 08:22 AM

`RANK() OVER (PARTITION BY x ORDER BY y)`

What is PARTITION BY x ORDER BY y?

PostgreSQL: Documentation: 14: 3.5. Window Functions

As shown here, the rank function produces a numerical rank for each distinct ORDER BY value in the current row's partition, using the order defined by the ORDER BY clause. rank needs no explicit parameter, because its behavior is entirely determined by the OVER clause.

The rows considered by a window function are those of the “virtual table” produced by the query's FROM clause as filtered by its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the WHERE condition is not seen by any window function. A query can contain multiple window functions that slice up the data in different ways using different OVER clauses, but they all act on the same collection of rows defined by this virtual table.

We already saw that ORDER BY can be omitted if the ordering of rows is not important. It is also possible to omit PARTITION BY, in which case there is a single partition containing all rows.

There is another important concept associated with window functions: for each row, there is a set of rows within its partition called its window frame. Some window functions act only on the rows of the window frame, rather than of the whole partition. By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause.

when do you know to use rank and/or partition by? : SQL (reddit.com)

you should look up “window functions” or “analytical functions” because that’s what you’re talking about. Window functions (“partition by”) just let you group by something without having to use a group by. For ex: count(*) over (partition by sale_date) from table you can think of it as: count(*) from table group by sale_date. The difference is that the window function will return the result for every row of the table.

Rank is a window function that lets you order something. you use it when you want to order your data by a column.

if you use a partition with the rank then you’re basically grouping your data by the partitioned column first, and then ordering it. So if I do: rank() over (partition by sale_date order by price) then 1) you’re looking at each date, and 2) ordering the records according to the item price within that date only.

What is difference between PARTITION BY x ORDER BY y and GROUP BY ?

Syntax


SELECT carmodel, sum (NUMSEATS) over(partition by carbodytype)
FROM car;

Example 1: From FIT3003 textbook p.521

Example 2: (Here) difference between PARTITION BY x ORDER BY y and GROUP BY

Get the sum of each user

Dataset


# user$raw
6 rows:
name number_of_registered_entities
User_1 | 8
User_2 | 10
User_3 | 8
User_2 | 1
User_3 | 5
User_1 | 7

Query 1


# SQL query 1 GROUP BY:
SELECT name, SUM(number_of_registered_entities) entitysum 
from user$raw
GROUP BY name

# Output 1
3 rows:
name entitysum
User_1 | 15
User_2 | 11
User_3 | 13

# SQL query 1 GROUP BY:

Query 2


# SQL query 2 PARTITION BY:
SELECT SUM(number_of_registered_entities) 
OVER (PARTITION BY name) AS name,
entitysum 
FROM user$raw

# Output 2
6 rows:
name entitysum
User_1 | 15
User_1 | 15
User_2 | 11
User_2 | 11
User_3 | 13
User_3 | 13

# SQL query 2 PARTITION BY:

Explanation

You can use the SQL PARTITION BY clause with the OVER clause to specify the column on which we need to perform aggregation. PARTITION BY gives aggregated columns with each record in the specified table. If we have 15 records in the table, the query output SQL PARTITION BY also gets 15 rows. On the other hand, GROUP BY gives one row per group in result set.

A GROUP BY normally reduces the number of rows returned by rolling them up and calculating averages or sums for each row.

PARTITION BY does not affect the number of rows returned, but it changes how a window function's result is calculated.

The OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use the OVER clause with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.

Example 3: (Here) difference between PARTITION BY x ORDER BY y and GROUP BY

Relevant materials

OVER Clause (Transact-SQL) - SQL Server | Microsoft Learn

SQL PARTITION BY Clause overview (sqlshack.com)

SQL SERVER - What is the OVER Clause? - Notes from the Field #101 - SQL Authority with Pinal Dave

How is “PARTITION BY” different from “GROUP BY”? | Benchling Help Center

SQL PARTITION BY Clause overview (sqlshack.com)

PostgreSQL: Documentation: 14: 3.5. Window Functions

DENSE RANK and normal RANK

If there is no tie (打和), there will be no difference between rank() over and dense_rank() over functions