Submitted by Jun Siong Ang.
With vast improvements in computational power, increased accessibility to big data, and rapid innovations in computing algorithms, the use of neural networks for both engineering and business purposes was met with a renewed interest beginning in early 2000s. Amidst substantial development, the Softplus and Rectified Linear Unit (ReLU) activation functions were introduced in 2000 and 2001 respectively, with the latter emerging as the more popular choice of activation function in neural networks. Notably, the ReLU activation function maintains a high degree of gradient propagation while presenting greater model sparsity and computational efficiency over Softplus. As an alternative to the ReLU, a family of a modified Softplus activation function – the “Smoothing” activation function of the form g(z) = [mu] log(1 + e[superscript z/[mu]) has been proposed. Theoretically, the Smoothing activation function will leverage the high degree of gradient propagation and model simplicity characteristic of the ReLU function, while eliminating possible issues associated with the non-differentiability of ReLU about the origin. In this research, the performance of the Smoothing family of activation functions vis-à-vis the ReLU activation function will be examined.
To read the complete thesis, visit DSpace at the MIT Libraries.