Loss Functions Part-2

 

This is a continuation of this

Loss Functions used for Classification 

As we all know that for regression problems we use Least square error as the loss function. Through this, we get a convex loss function and we can optimize by finding its global minimal. But when it comes to logistic regression the concept is completely changed, the Least Square error will give us a non-convex loss function with, more than one local minima. Here we get a wavy curve due to the non-linear sigmoid function used in the logistic regression hypothesis so it has multiple local minima which are bad for gradient Descent which is used to find minima. 

Cross-Entropy Loss 

This is the most common setting for classification problems. Cross-entropy loss increases as the predicted probability diverge from the actual label. An important aspect of this is that cross-entropy loss penalizes heavily the predictions that are confident but wrong. We can’t give equal weight to all false results. 




 
For Example, if the confidence is low and predicted a result at the same time if confidence is too high and predicted the result but both times the predicted is
false. The error function should use high weights for the prediction with higher confidence. The below proof of binary cross-entropy loss function.


Below is find the gradient of a complex cost function for Logistic Regression.


HInge Loss 
We use Hinge Loss for Support vector Machines. Hinge Loss trains the classifier. The difference in hinge loss is convex but not differentiable. Hinge loss penalises the wrong side of the hyperplane but as it is not differentiable so Gradient or Stochastic Gradient Descent so we use Cross entropy most of the time.

Comments

Popular posts from this blog

Learning Optimization(SGD) Through Examples

Revise Key Terms and Concepts of Deep Learning in 5 minutes