Learning Optimization(SGD) Through Examples
Introduction The entire aim of optimization is to minimize the cost function . We will learn more about optimization in the later sections of the paper. Batch Gradient Descent Here we sum up all examples on every iteration while performing the updates for the weight or parameters. So for every update in weights, we need to sum over all examples. The weights and bias get updated based on gradient and learning rate(n). Mainly advantages when there is a straight trajectory towards minimum and it has an unbiased estimate of gradients and fixed learning rate during training. Disadvantageous when we use vector implementation because we have to go over all the training set again and again. Learning happens when we go through all data even when some examples are reductant and with no contribution to the updating. Stochastic Gradient Descent Here unlike Branch Gradient Descent, we update the parameters on each example so learning happens on every example. So it converges more quickly than...