Loss Functions Part-2

- June 15, 2020

This is a continuation of this

Loss Functions used for Classification

As we all know that for regression problems we use Least square error as the loss function. Through this, we get a convex loss function and we can optimize by finding its global minimal. But when it comes to logistic regression the concept is completely changed, the Least Square error will give us a non-convex loss function with, more than one local minima. Here we get a wavy curve due to the non-linear sigmoid function used in the logistic regression hypothesis so it has multiple local minima which are bad for gradient Descent which is used to find minima.

Cross-Entropy Loss

This is the most common setting for classification problems. Cross-entropy loss increases as the predicted probability diverge from the actual label. An important aspect of this is that cross-entropy loss penalizes heavily the predictions that are confident but wrong. We can’t give equal weight to all false results. 

For Example, if the confidence is low and predicted a result at the same time if confidence is too high and predicted the result but both times the predicted is

false. The error function should use high weights for the prediction with higher confidence. The below proof of binary cross-entropy loss function.

Below is find the gradient of a complex cost function for Logistic Regression.

HInge Loss

We use Hinge Loss for Support vector Machines. Hinge Loss trains the classifier. The difference in hinge loss is convex but not differentiable. Hinge loss penalises the wrong side of the hyperplane but as it is not differentiable so Gradient or Stochastic Gradient Descent so we use Cross entropy most of the time.

Search This Blog

Machine Learning and Deep Learning with Srinivas

Loss Functions Part-2

Comments

Post a Comment

Popular posts from this blog

Learning Optimization(SGD) Through Examples

Revolutionize Your Hiring Process with AI: Exploring the Features of Our AI Interview