Abstract

Aitken-Based Acceleration Methods for Assessing Convergence of Multilayer Neural Networks

Ramani S. Pilla , Sagar V. Kamarthi and Bruce G. Lindsay

Suppose a nonlinear and non quadratic objective function is being optimized over a high dimensional parameter space. Often a closed-form solution does not exist and iterative methods are employed to nd a local optimum of the function. However, algorithms designed for such high-dimensional optimization problems tend to be very slow in order to ensure reliable convergence behaviors. This problem occurs frequently, for example, in training multilayer neural networks (NNs) using a gradient-descent (backpropagation) algorithm. Lack of measures of algorithmic convergence force one to use ad hoc criteria to stop the training process.
This paper rst develops the ideas of Aitken Æ 2 method to accelerate the rate of con- vergence of an error sequence (value of the objective function at each step) obtained by training a NN with a sigmoidal activation function via the backpropagation algorithm. The Aitken method is exact when the error sequence is exactly geometric. However, theoretical and empirical evidence suggests that the best possible rate of convergence obtainable for such an error sequence is log-geometric (an inverse power of the epoch n). The current paper develops a new Invariant Extended-Aitken acceleration method for accelerating log-geometric sequences. The resulting accelerated sequence enables one to predict the nal value of the error function. These predictions can in turn be used to assess the distance between the current and nal solution and thereby provides a stopping criterion for a desired accuracy. Each of the techniques described in the paper is applicable to a wide range of problems. The invariant extended-Aitken acceleration approach shows improved acceleration as well as outstanding prediction of the nal error in the practical problems considered.

Key Words: Acceleration of sequences; Aitken $\sigma^2$ ; Cross-validation; Extrapolation; Learning rate; Log-geometric convergence; Linear convergence; Multilayer neural networks; Minimization; Rate of convergence; Rate index; Shift invariance properties; Sublinear convergence; Stopping rule.