激活函数

在计算网络中，一个节点的激活函数定义了该节点在给定的输入或输入的集合下的输出。标准的电脑晶片电路可以看作是根据输入得到开（1）或关（0）输出的数字电路激活函数。这与神经网络中的线性感知机的行为类似。然而，只有非线性激活函数才允许这种网络仅使用少量节点来计算非平凡问题。在人工神经网络中，这个功能也被称为传递函数。

单变量输入激活函数

名称	方程式	导数	区间	连续性^[1]	单调	一阶导数单调	原点近似恒等
恒等函数	$f(x)=x$	$f'(x)=1$	$(-\infty ,\infty )$	$C^{\infty }$	是	是	是
单位阶跃函数	$f(x)={\begin{cases}0&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$f'(x)={\begin{cases}0&{\text{for }}x\neq 0\\{\text{不存在}}&{\text{for }}x=0\end{cases}}$	$\{0,1\}$	$C^{-1}$	是	否	否
逻辑函数 (S函数的一种)	$f(x)=\sigma (x)={\frac {1}{1+e^{-x}}}$ ^[2]	$f'(x)=f(x)(1-f(x))$	$(0,1)$	$C^{\infty }$	是	否	否
双曲正切函数	$f(x)=\tanh(x)={\frac {(e^{x}-e^{-x})}{(e^{x}+e^{-x})}}$	$f'(x)=1-f(x)^{2}$	$(-1,1)$	$C^{\infty }$	是	否	是
反正切函数	$f(x)=\tan ^{-1}(x)$	$f'(x)={\frac {1}{x^{2}+1}}$	$\left(-{\frac {\pi }{2}},{\frac {\pi }{2}}\right)$	$C^{\infty }$	是	否	是
Softsign 函数^[1]^[2]	$f(x)={\frac {x}{1+\|x\|}}$	$f'(x)={\frac {1}{(1+\|x\|)^{2}}}$	$(-1,1)$	$C^{1}$	是	否	是
反平方根函数 (ISRU)^[3]	$f(x)={\frac {x}{\sqrt {1+\alpha x^{2}}}}$	$f'(x)=\left({\frac {1}{\sqrt {1+\alpha x^{2}}}}\right)^{3}$	$\left(-{\frac {1}{\sqrt {\alpha }}},{\frac {1}{\sqrt {\alpha }}}\right)$	$C^{\infty }$	是	否	是
线性整流函数 (ReLU)	$f(x)={\begin{cases}0&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(x)={\begin{cases}0&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$[0,\infty )$	$C^{0}$	是	是	否
带泄露线性整流函数 (Leaky ReLU)	$f(x)={\begin{cases}0.01x&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(x)={\begin{cases}0.01&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$	是	是	否
参数化线性整流函数 (PReLU)^[4]	$f(\alpha ,x)={\begin{cases}\alpha x&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(\alpha ,x)={\begin{cases}\alpha &{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$	Yes iff $\alpha \geq 0$	是	Yes iff $\alpha =1$
带泄露随机线性整流函数 (RReLU)^[5]	$f(\alpha ,x)={\begin{cases}\alpha x&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$ ^[3]	$f'(\alpha ,x)={\begin{cases}\alpha &{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$	是	是	否
指数线性函数 (ELU)^[6]	$f(\alpha ,x)={\begin{cases}\alpha (e^{x}-1)&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(\alpha ,x)={\begin{cases}f(\alpha ,x)+\alpha &{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\alpha ,\infty )$	${\begin{cases}C_{1}&{\text{when }}\alpha =1\\C_{0}&{\text{otherwise }}\end{cases}}$	Yes iff $\alpha \geq 0$	Yes iff $0\leq \alpha \leq 1$	Yes iff $\alpha =1$
扩展指数线性函数 (SELU)^[7]	$f(\alpha ,x)=\lambda {\begin{cases}\alpha (e^{x}-1)&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$ with $\lambda =1.0507$ and $\alpha =1.67326$	$f'(\alpha ,x)=\lambda {\begin{cases}\alpha (e^{x})&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\lambda \alpha ,\infty )$	$C^{0}$	是	否	否
S 型线性整流激活函数 (SReLU)^[8]	$f_{t_{l},a_{l},t_{r},a_{r}}(x)={\begin{cases}t_{l}+a_{l}(x-t_{l})&{\text{for }}x\leq t_{l}\\x&{\text{for }}t_{l}<x<t_{r}\\t_{r}+a_{r}(x-t_{r})&{\text{for }}x\geq t_{r}\end{cases}}$ $t_{l},a_{l},t_{r},a_{r}$ are parameters.	$f'_{t_{l},a_{l},t_{r},a_{r}}(x)={\begin{cases}a_{l}&{\text{for }}x\leq t_{l}\\1&{\text{for }}t_{l}<x<t_{r}\\a_{r}&{\text{for }}x\geq t_{r}\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$	否	否	否
反平方根线性函数 (ISRLU)^[3]	$f(x)={\begin{cases}{\frac {x}{\sqrt {1+\alpha x^{2}}}}&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(x)={\begin{cases}\left({\frac {1}{\sqrt {1+\alpha x^{2}}}}\right)^{3}&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$\left(-{\frac {1}{\sqrt {\alpha }}},\infty \right)$	$C^{2}$	是	是	是
自适应分段线性函数 (APL)^[9]	$f(x)=\max(0,x)+\sum _{s=1}^{S}a_{i}^{s}\max(0,-x+b_{i}^{s})$	$f'(x)=H(x)-\sum _{s=1}^{S}a_{i}^{s}H(-x+b_{i}^{s})$ ^[4]	$(-\infty ,\infty )$	$C^{0}$	否	否	否
SoftPlus 函数^[10]	$f(x)=\ln(1+e^{x})$	$f'(x)={\frac {1}{1+e^{-x}}}$	$(0,\infty )$	$C^{\infty }$	是	是	否
弯曲恒等函数	$f(x)={\frac {{\sqrt {x^{2}+1}}-1}{2}}+x$	$f'(x)={\frac {x}{2{\sqrt {x^{2}+1}}}}+1$	$(-\infty ,\infty )$	$C^{\infty }$	是	是	是
S 型线性加权函数 (SiLU)^[11] (也被称为Swish^[12])	$f(x)=x\cdot \sigma (x)$ ^[5]	$f'(x)=f(x)+\sigma (x)(1-f(x))$ ^[6]	$[\approx -0.28,\infty )$	$C^{\infty }$	否	否	否
软指数函数^[13]	$f(\alpha ,x)={\begin{cases}-{\frac {\ln(1-\alpha (x+\alpha ))}{\alpha }}&{\text{for }}\alpha <0\\x&{\text{for }}\alpha =0\\{\frac {e^{\alpha x}-1}{\alpha }}+\alpha &{\text{for }}\alpha >0\end{cases}}$	$f'(\alpha ,x)={\begin{cases}{\frac {1}{1-\alpha (\alpha +x)}}&{\text{for }}\alpha <0\\e^{\alpha x}&{\text{for }}\alpha \geq 0\end{cases}}$	$(-\infty ,\infty )$	$C^{\infty }$	是	是	Yes iff $\alpha =0$
正弦函数	$f(x)=\sin(x)$	$f'(x)=\cos(x)$	$[-1,1]$	$C^{\infty }$	否	否	是
Sinc 函数	$f(x)={\begin{cases}1&{\text{for }}x=0\\{\frac {\sin(x)}{x}}&{\text{for }}x\neq 0\end{cases}}$	$f'(x)={\begin{cases}0&{\text{for }}x=0\\{\frac {\cos(x)}{x}}-{\frac {\sin(x)}{x^{2}}}&{\text{for }}x\neq 0\end{cases}}$	$[\approx -0.217234,1]$	$C^{\infty }$	否	否	否
高斯函数	$f(x)=e^{-x^{2}}$	$f'(x)=-2xe^{-x^{2}}$	$(0,1]$	$C^{\infty }$	否	否	否

说明

^ 若一函数是连续的，则称其为

C^{0}

函数；若一函数

n

阶可导，并且其

n

阶导函数连续，则为

C^{n}

函数（

n\geq 1

）；若一函数对于所有

n

都属于

C^{n}

函数，则称其为 $C^{\infty }$ 函数，也称光滑函数。

^ 此处

H

是单位阶跃函数。

^

α

是在训练时间从均匀分布中抽取的随机变量，并且在测试时间固定为分布的期望值。

^ ^ ^ 此处

\sigma

是逻辑函数。

多变量输入激活函数

名称	方程式	导数	区间	光滑性
Softmax函数	$f_{i}({\vec {x}})={\frac {e^{x_{i}}}{\sum _{j=1}^{J}e^{x_{j}}}}$ for $i$ = 1, …, $J$	${\frac {\partial f_{i}({\vec {x}})}{\partial x_{j}}}=f_{i}({\vec {x}})(\delta _{ij}-f_{j}({\vec {x}}))$ ^[7]	$(0,1)$	$C^{\infty }$
Maxout函数^[14]	$f({\vec {x}})=\max _{i}x_{i}$	${\frac {\partial f}{\partial x_{j}}}={\begin{cases}1&{\text{for }}j={\underset {i}{\operatorname {argmax} }}\,x_{i}\\0&{\text{for }}j\neq {\underset {i}{\operatorname {argmax} }}\,x_{i}\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$

说明

^ 此处 $δ$ 是克罗内克δ函数。

参见

参考资料

^ Bergstra, James; Desjardins, Guillaume; Lamblin, Pascal; Bengio, Yoshua. Quadratic polynomials learn better image features". Technical Report 1337. Département d’Informatique et de Recherche Opérationnelle, Université de Montréal. 2009. （原始内容存档于2018-09-25）.
^ Glorot, Xavier; Bengio, Yoshua, Understanding the difficulty of training deep feedforward neural networks (PDF), International Conference on Artificial Intelligence and Statistics (AISTATS’10), Society for Artificial Intelligence and Statistics, 2010, （原始内容存档 (PDF)于2017-04-01）
^ ^3.0 ^3.1 Carlile, Brad; Delamarter, Guy; Kinney, Paul; Marti, Akiko; Whitney, Brian. Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs). 2017-11-09. arXiv:1710.09967  [cs.LG].
^ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. 2015-02-06. arXiv:1502.01852  [cs.CV].
^ Xu, Bing; Wang, Naiyan; Chen, Tianqi; Li, Mu. Empirical Evaluation of Rectified Activations in Convolutional Network. 2015-05-04. arXiv:1505.00853  [cs.LG].
^ Clevert, Djork-Arné; Unterthiner, Thomas; Hochreiter, Sepp. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). 2015-11-23. arXiv:1511.07289  [cs.LG].
^ Klambauer, Günter; Unterthiner, Thomas; Mayr, Andreas; Hochreiter, Sepp. Self-Normalizing Neural Networks. 2017-06-08. arXiv:1706.02515  [cs.LG].
^ Jin, Xiaojie; Xu, Chunyan; Feng, Jiashi; Wei, Yunchao; Xiong, Junjun; Yan, Shuicheng. Deep Learning with S-shaped Rectified Linear Activation Units. 2015-12-22. arXiv:1512.07030  [cs.CV].
^ Forest Agostinelli; Matthew Hoffman; Peter Sadowski; Pierre Baldi. Learning Activation Functions to Improve Deep Neural Networks. 21 Dec 2014. arXiv:1412.6830  [cs.NE].
^ Glorot, Xavier; Bordes, Antoine; Bengio, Yoshua. Deep sparse rectifier neural networks (PDF). International Conference on Artificial Intelligence and Statistics. 2011. （原始内容存档 (PDF)于2018-06-19）.
^ Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. [2018-06-13]. （原始内容存档于2018-06-13）.
^ Searching for Activation Functions. [2018-06-13]. （原始内容存档于2018-06-13）.
^ Godfrey, Luke B.; Gashler, Michael S. A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks. 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: KDIR. 2016-02-03, 1602: 481–486. Bibcode:2016arXiv160201321G. arXiv:1602.01321  .
^ Goodfellow, Ian J.; Warde-Farley, David; Mirza, Mehdi; Courville, Aaron; Bengio, Yoshua. Maxout Networks. JMLR WCP. 2013-02-18, 28 (3): 1319–1327. Bibcode:2013arXiv1302.4389G. arXiv:1302.4389  .

[1] Bergstra, James; Desjardins, Guillaume; Lamblin, Pascal; Bengio, Yoshua. Quadratic polynomials learn better image features". Technical Report 1337. Département d’Informatique et de Recherche Opérationnelle, Université de Montréal. 2009. （原始内容存档于2018-09-25）.

[2] Glorot, Xavier; Bengio, Yoshua, Understanding the difficulty of training deep feedforward neural networks (PDF), International Conference on Artificial Intelligence and Statistics (AISTATS’10), Society for Artificial Intelligence and Statistics, 2010, （原始内容存档 (PDF)于2017-04-01）

[isrlu-3] 3.0 ^3.1 Carlile, Brad; Delamarter, Guy; Kinney, Paul; Marti, Akiko; Whitney, Brian. Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs). 2017-11-09. arXiv:1710.09967  [cs.LG].

[4] He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. 2015-02-06. arXiv:1502.01852  [cs.CV].

[5] Xu, Bing; Wang, Naiyan; Chen, Tianqi; Li, Mu. Empirical Evaluation of Rectified Activations in Convolutional Network. 2015-05-04. arXiv:1505.00853  [cs.LG].

[6] Clevert, Djork-Arné; Unterthiner, Thomas; Hochreiter, Sepp. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). 2015-11-23. arXiv:1511.07289  [cs.LG].

[7] Klambauer, Günter; Unterthiner, Thomas; Mayr, Andreas; Hochreiter, Sepp. Self-Normalizing Neural Networks. 2017-06-08. arXiv:1706.02515  [cs.LG].

[8] Jin, Xiaojie; Xu, Chunyan; Feng, Jiashi; Wei, Yunchao; Xiong, Junjun; Yan, Shuicheng. Deep Learning with S-shaped Rectified Linear Activation Units. 2015-12-22. arXiv:1512.07030  [cs.CV].

[9] Forest Agostinelli; Matthew Hoffman; Peter Sadowski; Pierre Baldi. Learning Activation Functions to Improve Deep Neural Networks. 21 Dec 2014. arXiv:1412.6830  [cs.NE].

[10] Glorot, Xavier; Bordes, Antoine; Bengio, Yoshua. Deep sparse rectifier neural networks (PDF). International Conference on Artificial Intelligence and Statistics. 2011. （原始内容存档 (PDF)于2018-06-19）.

[11] Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. [2018-06-13]. （原始内容存档于2018-06-13）.

[12] Searching for Activation Functions. [2018-06-13]. （原始内容存档于2018-06-13）.

[13] Godfrey, Luke B.; Gashler, Michael S. A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks. 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: KDIR. 2016-02-03, 1602: 481–486. Bibcode:2016arXiv160201321G. arXiv:1602.01321  .

[14] Goodfellow, Ian J.; Warde-Farley, David; Mirza, Mehdi; Courville, Aaron; Bengio, Yoshua. Maxout Networks. JMLR WCP. 2013-02-18, 28 (3): 1319–1327. Bibcode:2013arXiv1302.4389G. arXiv:1302.4389  .

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]