type
Post
Created date
Jun 16, 2022 01:21 PM
category
Data Science
tags
Machine Learning
Machine Learning
status
Published
Language
From
summary
slug
password
Author
Priority
Featured
Featured
Cover
Origin
Type
URL
Youtube
Youtube
icon

Week 9 : Classification


Confusion Matrix

notion image
notion image
 

Interpretation of indicator: (Here)

Accuracy

Error rate

  • Proportion of times the classifier is correct
This is useful, code found in ETC2420 lecture slide week 9.
This is useful, code found in ETC2420 lecture slide week 9.
notion image

Precision (吾係變係)

  • To determine the best model when the costs of False Positive is high.
    • For instance, email spam detection.
      • In email spam detection, a false positive means that an email that is non-spam (actual negative) has been identified as spam (predicted spam).
      • The email user might lose important emails if the precision is not high for the spam detection model.

Recall (係變吾係) aka Sensitivity

% of predicting a YES as YES = Low values imply that the model does not predict the positive class well.
  • To determine the best model when the cost associated with False Negative is high.
  • Calculates how many of the Actual Positives our model capture through labeling it as Positive (True Positive).
    • For instance of fraud detection,
      • If a fraudulent transaction (Actual Positive) is predicted as non-fraudulent (Predicted Negative), the consequence can be very bad for the bank.
    • For instance of sick patient detection,
      • Similarly, in sick patient detection, if a sick patient (Actual Positive) goes through the test and predicted as not sick (Predicted Negative).
      • The cost associated with False Negative will be extremely high if the sickness is contagious.

Specificity

% of predicting a NO as NO = Low values imply that the model does not predict the negative class well.
 

FAQ

for the class is highly imbalanced, do we always go for balanced accuracy over the overall accuracy?
generally yes. Otherwise it might as well be a single class problem, because the best strategy is to predict to be majority class!
However, it can be depend on the application. Or the use of a different metric that combines the class accuracy in different ways.
 
Cross Validation (CV)Linear Discriminant Analysis (LDA)