FIT3003 - W11 - Active Data Warehousing

type

Post

Created date

Oct 30, 2022 05:39 AM

Chapter Summary

Active Data Warehousing is solely about interdependency between the operational database and the data warehouse itself, as well as among the star schemas of various levels of granularity in the data warehouse.

There are three types of interdependencies:

Incremental updates, which is when new records are inserted into or old records are deleted from the operational database, they are immediately reflected in the star schema;

Changes in the data warehouse, which is when the data warehouse creates a new star schema or deletes an existing star schema, as well as changes to the upper levels of star schemas when the lower-level star schemas change; and

changes in the operational database, which is when the underlying operational database has changed and has an impact on the data warehouse.

Since interdependency can be quite complex in many cases, Active Data Warehousing is used only for applications that really need active data in the data warehouse. This includes the monitoring system which may rely on real-time data and not so much on historical data. In this case, an Active Data Warehousing is appropriate and useful.

Summary of all these changes in 3 tables

Passive Data Warehouse (一成不變)

The concepts learnt previously are based on Passive Data Warehouse; it has a few characteristics:

1. Star schemas are subject-oriented, including multi-fact star schema.

The purpose of the data warehouse is decided at the design state

They are generally designed for one specific purpose. Some examples are:

A Product Sales star schema is for the sole purpose of analysing product sales and nothing else.
A Computer Lab Activities star schema is for the sole purpose of analysing the use of computer labs by students.
A Hospital Admission star schema is for the sole purpose of analysing patient admissions to hospitals.

2. They are built based on historical data with a specific time range.

So, traditional DW is not only subject-oriented but also non-volatile and time-variant.

Non-volatile in a sense that the data in the data warehouse will never change

Time-variant means that it is for a specific range of time, which is a past time period.

ACTIVE DATA WAREHOUSE (一成可變)

Problem of Passive DW:

Passive DW can cater the general need, but there are applications where there is a need of monitoring sensors or real-time systems using current data.

Examples of continuous monitoring:

air traffic control,

the stock market, certain medical monitoring systems, etc.

Solution:

We need ACTIVE DATA WAREHOUSE. When the operation database is updated, the data warehouse is immediately updated.

In another words, Active DW is dynamic. And here we look into the how updates are impacted in different levels of DW.

Create viewmight be used in various levels of the star schema, which directly reflects any changes in the underlying tables (second part in the graph), as views are virtual tables based on the underlying physical tables. Any changes in the underlying physical tables will automatically be reflected in the virtual tables.

Additionally, certain database triggers can be used to automatically insert, update or delete the records in the data warehouse when an event (e.g. insert, update or delete) occurs in the underlying physical tables.

What is CREATE VIEW ?

You can think of it as a Pre-Defined filter, or saved query.

A view is essentially a saved query, which you can query as if it were a table itself.

The data for a view is typically not stored at all - just the SQL query you used to define the view. When you write a query which selects from the view, the SQL for the view is effectively substituted-in dynamically to your main query.

So this:


create view v as (select a+b as total from some_table);
select max(total) from v;

Is effectively the same as:


select max(total) from (select a+b as total from some_table);

Why are they useful? For a few reasons:

Convenience - if you have some often used SQL, you can save it as a view to avoid re-typing it each time

Clarity - giving a descriptive name to the view helps to document what it means, and what it is used for

Encapsulation - if the users of the database are querying via views, it might be possible to change the underlying table structures and then update the views so that they still work, and look the same as before

Security - you can grant different permissions for accessing the view, vs the underlying tables. So you could create a view which omits certain columns, or masks the data in some other way, and then grant people access to that

VIEW vs TABLE

A table is something that you can modify (update, delete, insert). A view is read only and usually presents related information (say, from two related tables).

Architecture of Active DW

Unlike Passive DW (top-down), ADW adopts a bottom-up approach, where the star schema is built from Level-0, and the upper levels (e.g. Level-1, Level-2, etc) are built on top of the immediate lower levels.

PK-FK constraints

Why do we need PK-FK in the operational databases?

We need this to maintain data integrity in the database because insert, update and delete of records are frequently performed and maintaining entity integrity and referential integrity is of the utmost importance.

Why we DONT ❌ need PK-FK in traditional DW

Because once the star schema is created and built, no records will be updated in the star schema (e.g. dimensions nor fact); there will be no insert as the data warehouse is passive and no delete.

Why we DO ✔️ need PK-FK in active DW

Active Data Warehousing is actively updated when there are changes in the operational database, the PK-FK constraint between dimensions and the fact becomes important to minimise data anomalies due to updates.

Therefore, the PK constraint must be enforced in the dimension and the PK-FK constraint in the fact.

Why do we prefer using `CREATE TABLE` to `CREATE VIEW`?

DIM: If using CREATE VIEW for creating the dimensions in Level-0, it is very difficult to assume that the primary keys have been implemented correctly and primary keys cannot be built on top of views. Therefore, it is advisable that dimensions in the Level-0 star schema are created using the create table statement, and not through create view.

How do you enforce a PK-FK constraint?


alter table DegreeDim
add constraint DegreeDimPK primary key (DegreeCode);

1. INCREMENTAL UPDATES / INSTANT UPDATE

Three ways of incremental updates includes 1) Auto updates, 2) Expiry Date, and 3) Date warehouse rules changed.

Type 1 — Auto updates

Automatic updates refer to updates that occur INSTANTLY, once the operational database has changed.

Level 0

In creating tables for lv 0, one must consider:

Dim: can be created using CREATE VIEW (if the data is directly retrieving from op. database), or CREATE TABLE (but it needs a trigger to monitor instant insert)

Fact: table must be created using CREATE TABLE

Level 1

These rules applied to both fact and dim.

Rule 1: Since there are some aggregated measures in Level-1, we CANNOT use CREATE VIEW ; we need to use CREATE TABLE instead.

Rule 2: If the records are timeless (e.g., Semester start and end Date), there is no need to create a trigger, otherwise we MUST create a trigger.

A database trigger must be created to ensure that for every record inserted into the TempFact Level-1, it will increase the fact measure Number of Logins by one.

Rule 3: Same as level 0, we need to enforce PK-FK contraints.


alter table TempComputerLabFactLevel1
add (SemesterID varchar2(10))

Level 2

Quote from the textbook:

In Level-2, we basically remove the Student Dimension to lower the granularity of the fact measure Number of Logins. All other dimensions are reused. Fact Level-2 reuses Fact Level-1, where the aggregation of the fact measure is increased. There is no need to have a TempFact in Level-2 because we can create the fact in Level-2 by using create view from the fact in Level-1. As a result of this, there won’t be any need for a database trigger.

Type 2 — Expiry Date

Sometimes we need to remove the unneeded-obsolete data (because decision makers are interested in recent data), so DW can take consideration of “recentness” of the data. To achieve so, we can set an expiry date and remove the expired records from star schema.

Bear in mind that, the old data should NOT removed from the operational databases because the operational databases should not only support day-to-day operations but also act as an archival repository.

Level 0

As mentioned above, in level 0 can have two ways of creation: 1) CREATE VIEW (only when the schema without fact measure); and 2) CREATE TABLE. Here is how they can be implemented:

Implementation for CREATE VIEW — Implementation for `CREATE VIEW`

Implementation for CREATE TABLE — Implementation for `CREATE TABLE`

Level 1

To implemented the level 1 one, we must delete those expired records MANUALLY.

Level 2

Same as level 1.

Type 3 — Date warehouse rules changed

As time goes by, the data warehousing rules may have changed, so the data warehousing needs to adapt to the new rules.

2. DATA WAREHOUSING SCHEMA EVOLUTION

Four types of occurrence include: 1) Changes propagating to next levels; 2) Changes not affecting the next levels; 3) Inserting new star schema; and 4) Delete star schema.

Type 1 — Changes propagating to next levels

Since Active DW is sensitive to changes of star schema which may propagate to the upper levels of the star schema. For example, if a Level-1 star schema changes its structure due to new requirements, the changes might be propagated to the Level-2 star schema and so on.

Level 0

Level 1

Type 2 — Changes not affecting the next levels

Level 0

Level 1

Type 3 — Inserting new star schema

Type 4 — Delete star schema

3. OPERATIONAL DATABASE EVOLUTION

As the operational database is changed, the data warehousing requirements may also change.

The operational databases which feed into the data warehouse may also evolve over time. This may then impact the data warehouse, as in Active Data Warehousing, updates or changes in the operational databases are reflected in the data warehouse in real time.

As the operational database is changed, the data warehousing requirements may also change. In this section, we are going to see how the changes in the operational database (at various degrees of severity) will change the data warehousing requirements from small changes to the table structure and changes to the entities and relationships to major changes to the operational database.

Some of the changes in the operational database may affect the Level-0 star schema only, while some will propagate to all star schemas in various levels of granularity. In the worst scenario, the entire data warehouse (e.g. all star schemas) will be overhauled, and the old data warehouse is fully decommissioned and replaced with the new one.

There are three types of changes: 1) Changes in the table structure, 2) Changes in the ER diagram, 3) Changes in the operational database.

Type 1 — Changes in the table structure

There are two possible options to deal with this: 1) update the existing star schemas; 2) build a new data warehouse, since the star schema now has new requirements.

Type 2 — Changes in the ER diagram

Type 3 — Changes in the Op. database

Like ERP transition the old system is decommissioned and the new system is deployed. This would affect Op. database (e.g., Table names may change, table structures may change; the entire design may change.). Here is the case:

There are two possible options to deal with this: 1) update the existing star schemas; 2) build a new data warehouse, since the star schema now has new requirements.

Chapter Summary

Summary of all these changes in 3 tables

Passive Data Warehouse (一成不變)

1. Star schemas are subject-oriented, including multi-fact star schema.

2. They are built based on historical data with a specific time range.

ACTIVE DATA WAREHOUSE (一成可變)

Problem of Passive DW:

Solution:

VIEW vs TABLE

Architecture of Active DW

PK-FK constraints

Why do we need PK-FK in the operational databases?

Why we DONT ❌ need PK-FK in traditional DW

Why we DO ✔️ need PK-FK in active DW

Why do we prefer using CREATE TABLE to CREATE VIEW?

How do you enforce a PK-FK constraint?

1. INCREMENTAL UPDATES / INSTANT UPDATE

Type 1 — Auto updates

Level 0

Level 1

Level 2

Type 2 — Expiry Date

Level 0

Level 1

Level 2

Type 3 — Date warehouse rules changed

2. DATA WAREHOUSING SCHEMA EVOLUTION

Type 1 — Changes propagating to next levels

Level 0

Level 1

Type 2 — Changes not affecting the next levels

Level 0

Level 1

Type 3 — Inserting new star schema

Type 4 — Delete star schema

3. OPERATIONAL DATABASE EVOLUTION

As the operational database is changed, the data warehousing requirements may also change.

Type 1 — Changes in the table structure

Type 2 — Changes in the ER diagram

Type 3 — Changes in the Op. database

Why do we prefer using `CREATE TABLE` to `CREATE VIEW`?