The activation function activates the neural network through its nonlinear mechanism, allowing it to work effectively and maintain high performance. Therefore, choosing a good activation function can have a significant impact on the training results of the network model. Due to its simplicity and high efficiency, ReLU was once used as the standard activation function for various applications in the field of deep learning.Despite this, related research has not been interrupted. In recent years, the gated activation function Swish and Mish have been proposed successively, all of which perform well. Inspired by this, we consider designing a similar function consisting of learnable parameters and hyperparameters in order to train higher performance neural network models. In the process of function design, we found that if no restrictions were applied, the value range of learnable parameters would become too large and out of control, which would cause abnormal fluctuations of the gradient in the training process, making it difficult to further improve the performance of the network model. In order to solve this problem, we propose a new activation function Pish, which can generate a gradient space conducive to neural network training by limiting the values of learnable parameters and hyperparameters to a specific range, thus further improving the accuracy of the model.
We conducted experiments on five publicly available datasets (CIFAR-10, CIFAR-100, STL-10, SVHN, and ImageNet) using various network architectures, including SqueezeNet, MobileNet, DenseNet-121, ResNet-50, Se ResNet-18, Inception-v3, and ShuffleNet, to evaluate the performance of the Pish activation function. The results demonstrate that Pish consistently outperforms other activation functions such as ReLU, Swish, and Mish across multiple tasks and architectures. Specifically, on the CIFAR-10 dataset, Pish achieved the highest accuracy of 91.66% with ResNet-50; on CIFAR-100, it reached 67.64% with ResNet-50; on STL-10, it achieved 82.51% with ResNet-50; on SVHN, it attained the best accuracy of 95.54% with Se ResNet-18; and on the ImageNet dataset, Pish delivered a top accuracy of 73.51% using Inception-v3. These results highlight the significant advantages of the Pish activation function in improving model performance, stability, and convergence speed, demonstrating its effectiveness and applicability across diverse deep learning tasks and network architectures.