type
Post
Created date
Nov 10, 2022 01:03 AM
category
Data Science
tags
Data Warehousing
status
Published
Language
English
From
School
summary
slug
password
Author
Jason Ching Yuen Siu
Priority
Featured
Featured
Cover
Origin
Type
URL
Youtube
Youtube
icon

1. Duplicate Problem

SELECT student_id, COUNT(*) as duplicate_Student_records FROM dw.student GROUP BY student_id HAVING COUNT (*) >1;
To check duplicate records in Student table
CREATE TABLE student AS SELECT DISTINCT * FROM dw.student;
To clean duplicate records:
 

2. Relationship Problem

SELECT * FROM <<table 1>> WHERE <<FK>> NOT IN (SELECT <<PK>> FROM <<table 2>>);
To check invalid FK values
DELETE FROM <<table 1>> WHERE <<FK>> NOT IN (SELECT <<PK>> FROM <<table 2>>);
To resolve this issue (simplest approach)
Case study
notion image

3. Inconsistent Values

Inconsistent values can be different in units, codes or different precisions such as two different attributes conflict to each other
For example: Cable Television case study - Workshop activity week 4 Error: Contract Start Time > Contract End Time
SELECT * FROM contract WHERE starttime > endtime
To check if there is any inconsistent values:
DELETE FROM contract WHERE starttime > endtime;
If there is any inconsistent values, to resolve the issue (simplest approach)

4. Incorrect Values

Incorrect values can be incorrect spelling, attributes fall outside the correct data range, illogical value of the attribute or incorrect data type
SELECT * FROM charter WHERE char_distance < 0;
To check if there is any incorrect values:
DELETE FROM charter WHERE char_distance <0;
If there is any inconsistent values, to resolve the issue (simplest approach):

5. Null Value Problems

SELECT * FROM major WHERE major_name IS NULL;
To check if there is any null value in Major table
DELETE FROM major WHERE major_name IS NULL;
If there is any null value, to resolve the issue (simplest approach):
 
FIT3003 - W3 - Bridge tableFIT3003 - W4 - Multi-fact star schema