Activation Functions

1 minute read

Published: March 13, 2024

激活函数（Activation Function），负责将神经元的输入映射到输出端，激活函数将神经网络中将输入信号的总和转换为输出信号。激活函数大多是非线性函数，才能将多层感知机的输出转换为非线性，使得神经网络可以任意逼近任何非线性函数，进而可以应用到众多的非线性模型中。

Sigmoid Family

$sigmoid(x) = \frac{1}{1+e^{-x}}$

Hard Sigmoid

Swish

$swish(x) = x\cdot sigmoid(x) = \frac{x}{1+e^{-x}}$

Maxout

ReLU Family (Rectified Linear Unit)

$ReLU(x) = max(0,x)$

Dying neuron
Handles the vanishing gradient issue
Cannot avoid exploding gradient issue

ELU

$ELU(x) = \left\{ \begin{aligned} x,\ if\ x\ \geq 0 \\ \alpha(e^x-1),\ x<0 \end{aligned} \right.$

Avoids dying neuron issue
Cannot avoid exploding gradient
Computational expensive (because of exponantial calculation)
$\alpha$ is an hyper-parameter (normally, $\alpha$ between 0.1 and 0.3)

Leaky ReLU

$LeakyReLU(x) = \left\{ \begin{aligned} x,\ if\ x\ \geq 0 \\ \alpha x,\ x<0 \end{aligned} \right.$

Avoids dying neuron issue
Not computational expensive
$\alpha$ is an hyper-parameter (normally, $\alpha$ between 0.1 and 0.3)

SELU (Scaled Exponential Linear Units)

$SELU(x) = \left\{ \begin{aligned} \lambda x,\ if\ x\ \geq 0 \\ \lambda \alpha (e^x-1),\ x<0 \end{aligned} \right.$ $\alpha \approx 1.6733$, $\lambda \approx 1.0507$.

GELU (Gaussian Error Linear Unit)

$GELU(x) = xP(X\leq x) = x\Phi(x) = x\cdot\frac{1}{2}[1+erf(\frac{x}{\sqrt(2)})]$ if $X \sim \mathcal{N}(0,1)$.

GELUs are used in GPT-3, BERT, and most other Transformers.

Tanh Family

$Tanh(x) = \frac{e^x-e^{-x}}{e^x+e^{-x}}$

Vanishing gradient problem
Symmetric about the origin

HardTanh

$HardTanh(x) = \left\{ \begin{aligned} -1,\ if\ x < -1\\ x,\ if\ -1\leq x\leq 0\\ 1,\ if x>1 \end{aligned} \right.$

It is a cheaper and more computationally efficient version of the tanh activation.

Hudanyun Sheng

Activation Functions

Sigmoid Family

Hard Sigmoid

Swish

Maxout

ReLU Family (Rectified Linear Unit)

ELU

Leaky ReLU

SELU (Scaled Exponential Linear Units)

GELU (Gaussian Error Linear Unit)

Tanh Family

HardTanh

TanhShrink

Softmax

LogSoftMax

Softmin

Share on

You May Also Enjoy

Boruta Feature Selection

Purpose

Kernel Density Estimation

Financial Credit Risk Management

信贷业务模型维度

Fully Convolutional Networks