type
Post
Created date
Oct 3, 2022 08:22 AM
category
Data Science
tags
Data Warehousing
status
Published
Language
English
From
School
summary
slug
password
Author
Jason Ching Yuen Siu
Priority
Featured
Featured
Cover
Origin
Type
URL
Youtube
Youtube
icon

RANK() OVER (PARTITION BY x ORDER BY y)

What is PARTITION BY x ORDER BY y?
PostgreSQL: Documentation: 14: 3.5. Window Functions
As shown here, the rank function produces a numerical rank for each distinct ORDER BY value in the current row's partition, using the order defined by the ORDER BY clause. rank needs no explicit parameter, because its behavior is entirely determined by the OVER clause.
The rows considered by a window function are those of the “virtual table” produced by the query's FROM clause as filtered by its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the WHERE condition is not seen by any window function. A query can contain multiple window functions that slice up the data in different ways using different OVER clauses, but they all act on the same collection of rows defined by this virtual table.
We already saw that ORDER BY can be omitted if the ordering of rows is not important. It is also possible to omit PARTITION BY, in which case there is a single partition containing all rows.
There is another important concept associated with window functions: for each row, there is a set of rows within its partition called its window frame. Some window functions act only on the rows of the window frame, rather than of the whole partition. By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause.
when do you know to use rank and/or partition by? : SQL (reddit.com)
you should look up “window functions” or “analytical functions” because that’s what you’re talking about. Window functions (“partition by”) just let you group by something without having to use a group by. For ex: count(*) over (partition by sale_date) from table you can think of it as: count(*) from table group by sale_date. The difference is that the window function will return the result for every row of the table.
Rank is a window function that lets you order something. you use it when you want to order your data by a column.
if you use a partition with the rank then you’re basically grouping your data by the partitioned column first, and then ordering it. So if I do: rank() over (partition by sale_date order by price) then 1) you’re looking at each date, and 2) ordering the records according to the item price within that date only.
What is difference between PARTITION BY x ORDER BY y and GROUP BY ?
you should look up “window functions” or “analytical functions” because that’s what you’re talking about. Window functions (“partition by”) just let you group by something without having to use a group by. For ex: count(*) over (partition by sale_date) from table you can think of it as: count(*) from table group by sale_date. The difference is that the window function will return the result for every row of the table.
 
 
Syntax
SELECT carmodel, sum (NUMSEATS) over(partition by carbodytype) FROM car;
Example 1: From FIT3003 textbook p.521
notion image
Example 2: (Here) difference between PARTITION BY x ORDER BY y and GROUP BY

Get the sum of each user

Dataset

# user$raw 6 rows: name number_of_registered_entities User_1 | 8 User_2 | 10 User_3 | 8 User_2 | 1 User_3 | 5 User_1 | 7

Query 1

# SQL query 1 GROUP BY: SELECT name, SUM(number_of_registered_entities) entitysum from user$raw GROUP BY name # Output 1 3 rows: name entitysum User_1 | 15 User_2 | 11 User_3 | 13
# SQL query 1 GROUP BY:

Query 2

# SQL query 2 PARTITION BY: SELECT SUM(number_of_registered_entities) OVER (PARTITION BY name) AS name, entitysum FROM user$raw # Output 2 6 rows: name entitysum User_1 | 15 User_1 | 15 User_2 | 11 User_2 | 11 User_3 | 13 User_3 | 13
# SQL query 2 PARTITION BY:
Explanation
You can use the SQL PARTITION BY clause with the OVER clause to specify the column on which we need to perform aggregation. PARTITION BY gives aggregated columns with each record in the specified table. If we have 15 records in the table, the query output SQL PARTITION BY also gets 15 rows. On the other hand, GROUP BY gives one row per group in result set.
GROUP BY normally reduces the number of rows returned by rolling them up and calculating averages or sums for each row.
PARTITION BY does not affect the number of rows returned, but it changes how a window function's result is calculated.
The OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use the OVER clause with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.
 
Example 3: (Here) difference between PARTITION BY x ORDER BY y and GROUP BY
notion image
notion image
Relevant materials

DENSE RANK and normal RANK

If there is no tie (打和), there will be no difference between rank() over and dense_rank() over functions
notion image
notion image
 
 
FIT3003 - W8 - SQL Notes about Cumulative and Moving AggregatesFIT3003 - W8 - CUBE & ROLLUP in SQL (With partial CUBE and partial ROLLUP) (W8)