Created date
Jun 16, 2022 01:21 PM
Data Science
Machine Learning
Machine Learning

Ensemble method

  • It is the creation of a better classifier from a collection of weaker classifiers. (團結就是力量)
  • They can make the model more robust (unlikely to be influenced by small changes in the training data).
    • Boosting tries to achieve better accuracy.
    • Bagging tries to reduce variance and prevent overfitting.
  • You can think of Boosting is an advanced version of Bagging.
    • There are a few of ensemble methods learnt, namely Bagging, Boosting and
      Random Forest
      Assumption :
      • The individual classifiers are moderately (> 50%) accurate.
      • Individual classifiers are created independently.
      • Pooling the results of the each classifier reduces the variance of the overall classification.
      • Decision trees work well as the individual classifiers
      • Disadvantage: model produced is not as easy to interpret as a single tree.

Bagging (Bootstrap Aggregation)

  • Bagging is a method to decrease variance.
  • Aims to create several subsets of data from training sample chosen randomly with replacement. Each collection of subset data is used to train their decision trees. [indiaMag]
  • It works well for high-variance machine learning algorithm, typically decision trees.
Bagging is nothing but SAMPLING WITH REPLACEMENT, using the same sample size of your distribution.
  • (That means same observation can occur more than once in the bootstrap data set. )
Each bootstrap replicate may have multiple instances of the original data points, and contains approximately 63% of the original data set.

The way it works :

notion image
For example, in the samples drawn from set 0-L (we called Replicate here), some of the observation can be happened more than once. (Sample with Replacement)
Step 2: Construct a single classifier for each replicate Step 3: Combine the classifiers by taking a Average of all the predictions from different trees.


  • Learners are learned sequentially with early learners fitting simple models to the data and then analysing data for errors. Consecutive trees (random sample) are fit and at every step, the goal is to improve the accuracy from the prior tree.
  • If an observation was classified incorrectly, it tries to increase the weight of this observation and vice versa.

How to perform Boosting

  • Assign equal weights to each point in training set; fit basic tree.
  • Repeat n iterations:
    • Update weights of misclassified items and normalise;
    • update tree,
    • building on current tree.
  • Output the final classifier as weighted sum of votes from each tree.
  1. Initialise the dataset and assign equal weight to each of the data point.
  1. Provide this as input to the model and identify the wrongly classified data points.
  1. Increase the weight of the wrongly classified data points.
  1. if (got required results)   → Goto step 5
    1. else   → Goto step 2
  1. End

Here tells the difference between those two : This link is very useful

1. How we sample for # of replicate

In the case of Bagging, any element has the same probability (1 / # of dataset ) to appear in a new data set. However, for Boosting the observations are weighted and therefore some of them will take part in the new sets more often.
notion image

2. How we model

The training stage of the two models is different; parallel for Bagging (i.e., each model is built independently) and sequential for Boosting.
It is sequential in a sense that each classifier is trained on data, taking into account the previous classifiers’ success.
After each training step., the weights are redistributed. Misclassified data (the results of FP and TN) increases its weights to emphasise the most difficult cases. In this way, subsequent learners will prioritise on them during their training.
notion image

3. How we produce the result

The way they produce the result is different: Average for Bagging and weighted average for Bagging
In Bagging the result is obtained by averaging the responses of the N learners (or majority vote).
However, Boosting assigns a second set of weights, this time for the N classifiers, in order to take a weighted average of their estimates.
notion image
The way Boosting training learns is via evaluation. A learner with good a classification result on the training data will be assigned a higher weight than a poor one.
That's why the Boosting algorithm allocates weights to each resulting model.
notion image

notion image
notion image


FAQ for Bagging :
When to use bagging
  • Useful when there is noise in the data.
  • Useful for unstable classifiers – that is, small changes in the training data cause large changes in the classifier; Unstable classifiers include decision trees, neural networks, linear regression.
  • Not recommended for stable classifiers such as K Nearest Neighbours, Naïve Bayes.
The disadvantage and advantage [indiaMag]
  • During the sampling of train data, there are many observations which overlaps. So, the combination of these learners helps in overcoming the high variance.
  • Handles higher dimensionality data very well.
  • Maintains accuracy for missing data.
  • Since final prediction is based on the average predictions from subset trees, it won’t give precise values for the classification and regression model.
  • Not helpful in case of bias or underfitting in the data.
FAQ for Boosting:
The disadvantage and advantage [indiaMag]
  • Supports different loss function (we have used ‘binary:logistic’ for this example).
  • Works well with interactions.
  • Decreases the bias error and builds strong predictive models.
  • Helps when we are dealing with bias or underfitting in the data set
  • Prone to over-fitting.
  • Requires careful tuning of different hyper-parameters.
  • Increases the complexity of the classification.
  • Time and computation can be a bit expensive.


R code

R interpretation



Extra Resource

Autopilot: The Mind’s Three Favorite OptionsRandom Forest