type
Post
Created date
Aug 9, 2022 02:58 AM
category
Data Science
tags
Data Warehousing
status
Published
Language
English
From
School
summary
slug
password
Author
Jason Ching Yuen Siu
Priority
Featured
Featured
Cover
Origin
Type
URL
Youtube
Youtube
icon

What is a snowflake?

A Hierarchy is formed -- which is snowflake -- when a dimension is broken down to two or more dimensions in a hierarchical manner.

What is a bridge table?

  • is a table that links two dimensions, and only one of these two dimensions is linked to the fact. As a result, the star schema becomes a snowflake schema.
  • A Bridge Table is also a snowflake, but the two dimensions that are linked through a Bridge Table. The relationship inbetween has a cardinality of 1-many and many-1.

When do we need a bridge table?

  • Is needed when a dimension cannot be connected directly to the Fact as it has to go through another dimension.
For example, Dim-4b cannot be connected directly to the fact without going through Dim-4a.
For example, Dim-4b cannot be connected directly to the fact without going through Dim-4a. For Dim-4b to be connected to Dim-4a, it has to go through a Bridge Table. This is because the cardinality relationship between Dim-4a and Dim-4b is actually a many-many relationship which is why a Bridge Table needs to be an intermediate table between these two dimensions

Why does one dim cannot be DIRECTLY to the fact?

  • (a) The Fact Table has a fact measure, and the dimension has a key identity. In order to connect a dimension to the Fact, the dimension’s key identity must contribute directly to the calculation of the fact measure. Unfortunately, this cannot happen if the operational database does not have this data.
  • (b) The operational database does not have this data if the relationship between two entities in the operational database that hold the information about the dimension’s key identity and the intended fact measure is a <mark>many-many</mark> relationship

How to check if we need a bridge?

This question is asked to check if we need a bridge. There are 2 reasons:
  1. Is there a DIRECT relationship between dim and fact?
  1. Is Dim1 information available in Fact of a Dim2?

In principle, a Bridge Table is used:

(a) When it is impossible to have a dimension connected directly to the Fact Table, because simply there is no relationship between this dimension and the Fact Table (e.g. in the Product Sales case study, it is impossible to have a direct link from SupplierDim to ProductSalesFact)
(b) When an entity (which will become a dimension) has a many-many relationship with another entity (dimension) in the E/R schema of the operational database (e.g. Supplier and Stock has a many-many relationship).
(c) When the temporality aspect (data history) is maintained in the operational database and the Bridge Table can be used to accommodate a dimension that has temporal attributes (e.g. product supply history is maintained in the second snowflake schema example).

When a Bridge Table is used in the schema, there are two additional options:

(a) A Weight Factor is used to estimate the contribution of a dimension in the calculation of the fact measure. Because this is only an estimate, a weight factor is an option.
(b) Every snowflake schema (whether it has Weight Factor or not) can be implemented in two ways: a List Aggregate version and a non-List Aggregate version.
 

A case study about Product Sales:

notion image
notion image
notion image

Why isn't this report correct?

  • This report seems correct but it is impossible to be produced.
  • Because in the E/R diagram, it shows that one supplier may supply many products (Many2Many) -- so the sales do not record which supplier supplied a product.
  • Therefore, we can conclud that 1) there is no direct relationship between suppliers and total sales. 2) supplier information is not available in the sales of a particular product.

If the report is not correct, what should we do?

  • We move the Supplier Dimension from being connected to the Fact and to create a relationship with the Product Dimension — that means we added a Bridge Table between Product Dimension and Supplier Dimension.
  • The bridge table has three information in which we can “drill down” for information on products: location, date and quantity of supplies.

Why are there three kinds of representations? What is the difference?

notion image
notion image
notion image
notion image
Figure 5.7 for further details. 5.7 is where you — as the manager —want to see the detail of the dim that cannot be directly connected (i.e., Supplier). For example, we know that product A is performing particularly well, and we want to further information about details of the supply history and the suppliers.
Figure 5.8 for temporal dim. 5.8 is where the operational database does not maintain a master list of all suppliers (assume that each supplier may supply any product in an ad hoc manner), we will not have the Supplier Dimension in the schema in the figure 5.8. Instead, we will only have the Bridge Table; however, for each Product-Supplier pair, we will have Location and Date implemented as a weak entity, because for each productsupplier, there are multiple supplies.
Therefore, the Bridge Table here acts as a temporal dimension. This means, for each product, there is a list of the history of supplies.
Figure 5.9 for non-temporal warehouse. 5.9 is where the history is not needed (in a non-temporal data warehouse). You can see that the Product Supplier Bridge Table will only have ProductNo and SupplierName attributes, without the history of supplies.

A Truck Delivery Case Study

A trucking company is responsible for picking up goods from the warehouses of a retail chain company and delivering the goods to individual retail stores. A truck carries goods on a single trip, which is identified by TripID and delivers these goods to multiple stores. Trucks have different capacities for both the volumes they can hold and the weights they can carry.
Currently, a truck makes several trips each week. An operational database is being used to keep track of the deliveries, including the scheduling of trucks to provide timely deliveries to stores.
notion image
(i) a trip may pick up goods from many warehouses (i.e. a many-many relationship between Warehouse and Trip);
(ii) a trip uses one truck only, and obviously a truck may have many trips in the history (i.e. a many1 relationship between Trip and Truck);
(iii) a trip delivers goods (e.g. TVs, fridges, etc.) potentially to several stores (a many-many relationship between Trip and Store, which is represented by the Destination table)
notion image

Why isn't this report correct?

If you look at the Tables 5.9, 5.10, 5.11, 5.12, 5.13, and 5.14, you will see that there is no direct relationship between Store and Total Delivery Cost in the Fact Table.
This is due to the many-many relationship between the Trip entity and Store entity in the E/R diagram of the operational database.

If the report is not correct, what should we do?

To solve this problem, a Bridge Table can be used, as shown in Fig. 5.12.
notion image
 
FIT3003 - W11 - Active Data WarehousingFIT3003 - W3 - Average in the Fact