Bias, Variance and Under fitting, Over fitting

內容

So, In this article we will see what is bias and variance and then we will use these concepts to learn what is Underfitting and Overfitting.

What is Bias? Let’s see the what wikipedia says.

( Bias is a disproportionate weight in favor of or against an idea or thing )

or in machine learning, we can say bias is a disproportionate weight in favor of or against a feature. THE SAME THING WE HEAR EVERYWHERE.

What is variance?

( Variance measures how far a set of numbers is spread out from their average value ). AGAIN, THE SAME THING WE HEAR EVERYWHERE.

In easy words, the bias corresponds to training set error and variance corresponds to the test set error.

Let’s see what is Overfitting and Underfitting.

What is Overfitted Model?

A model which performs really well on the training set or has high accuracy on training set but it does not perform well on the test set. So, as I told you bias is the training set error and variance is the test set error. Suppose, if we plot a point with new coordinates. Overfitted model might not perform better.

In case of Overfitting, our training set error is less, so it will have low bias and our test set error is high, so it will have high variance.

Overfitted Model — Low Bias and High Variance

A decision is very prone to Overfitting. If we have a tree which is particularly deep. One way to solve this problem is pruning. But we will not discuss it here, we will only stick to the given topic :)

What is Underfitted Model?

A model which does not perform well on both training and test set. So, it’s training error as well as test error is high, So it will have high bias and high variance.

In case of Underfitting, our training set error is high, so it will have high bias and our test set error is also high, so it will have high variance.

So, now we know what is Underfitted model and Overfitted model. We will now see what is Balanced model.

What is a Balanced Model?

A Balanced model is a model which performs well both on training and test set. This may not have as high accuracy as an overfitted model on a training set but a balanced model will perform well on test set as well.

A Balanced model will have low bias and low variance.

Let’s take 3 examples to understand Overfitting, Underfitting and Balanced Model.

  1. A model with training error : 2% and test error : 20%
  • Less Training Error — Low Bias
  • High Test Error — High Variance
  • This is an Overfitted Model.

2. A model with training error : 30% and test error : 30%

  • High Training Error — High Bias
  • High Test Error — High Variance
  • This is an Underfitted model.

3. A model with training error : 4% and test error : 3%

  • Less Training Error — Low Bias
  • Less Test Error — Low Variance
  • This is a Balanced model.

Thank you so much for reading this. I hope you liked this article :)

總結
This article explains the concepts of bias and variance in machine learning, and how they relate to underfitting and overfitting. Bias refers to the error due to overly simplistic assumptions in the learning algorithm, while variance measures how much the model's predictions vary for different training sets. An overfitted model performs well on the training set but poorly on the test set, characterized by low bias and high variance. Conversely, an underfitted model struggles on both training and test sets, exhibiting high bias and high variance. The article also introduces the concept of a balanced model, which performs well on both training and test sets, featuring low bias and low variance. Three examples illustrate these concepts: an overfitted model with a training error of 2% and a test error of 20%, an underfitted model with both training and test errors at 30%, and a balanced model with training and test errors of 4% and 3%, respectively. The article concludes by emphasizing the importance of achieving a balanced model for optimal performance in machine learning tasks.