激活函数

在计算网络中，一个节点的激活函数定义了该节点在给定的输入或输入的集合下的输出。标准的计算机芯片电路可以看作是根据输入得到开（1）或关（0）输出的數位電路激活函数。这与神经网络中的线性感知机的行为类似。然而，只有非線性激活函数才允許這種網絡僅使用少量節點來計算非平凡問題。在人工神經網絡中，這個功能也被稱為傳遞函數。

单变量输入激活函數

名稱	方程式	導數	區間	连续性^[1]	單調	一阶导数单调	原点近似恒等
恆等函數	$f(x)=x$	$f'(x)=1$	$(-\infty ,\infty )$	$C^{\infty }$	是	是	是
單位階躍函數	$f(x)={\begin{cases}0&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$f'(x)={\begin{cases}0&{\text{for }}x\neq 0\\{\text{不存在}}&{\text{for }}x=0\end{cases}}$	$\{0,1\}$	$C^{-1}$	是	否	否
邏輯函數 (S函數的一种)	$f(x)=\sigma (x)={\frac {1}{1+e^{-x}}}$ ^[2]	$f'(x)=f(x)(1-f(x))$	$(0,1)$	$C^{\infty }$	是	否	否
雙曲正切函數	$f(x)=\tanh(x)={\frac {(e^{x}-e^{-x})}{(e^{x}+e^{-x})}}$	$f'(x)=1-f(x)^{2}$	$(-1,1)$	$C^{\infty }$	是	否	是
反正切函數	$f(x)=\tan ^{-1}(x)$	$f'(x)={\frac {1}{x^{2}+1}}$	$\left(-{\frac {\pi }{2}},{\frac {\pi }{2}}\right)$	$C^{\infty }$	是	否	是
Softsign 函數^[1]^[2]	$f(x)={\frac {x}{1+\|x\|}}$	$f'(x)={\frac {1}{(1+\|x\|)^{2}}}$	$(-1,1)$	$C^{1}$	是	否	是
反平方根函數 (ISRU)^[3]	$f(x)={\frac {x}{\sqrt {1+\alpha x^{2}}}}$	$f'(x)=\left({\frac {1}{\sqrt {1+\alpha x^{2}}}}\right)^{3}$	$\left(-{\frac {1}{\sqrt {\alpha }}},{\frac {1}{\sqrt {\alpha }}}\right)$	$C^{\infty }$	是	否	是
線性整流函數 (ReLU)	$f(x)={\begin{cases}0&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(x)={\begin{cases}0&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$[0,\infty )$	$C^{0}$	是	是	否
帶泄露線性整流函數 (Leaky ReLU)	$f(x)={\begin{cases}0.01x&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(x)={\begin{cases}0.01&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$	是	是	否
參數化線性整流函數 (PReLU)^[4]	$f(\alpha ,x)={\begin{cases}\alpha x&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(\alpha ,x)={\begin{cases}\alpha &{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$	Yes iff $\alpha \geq 0$	是	Yes iff $\alpha =1$
帶泄露隨機線性整流函數 (RReLU)^[5]	$f(\alpha ,x)={\begin{cases}\alpha x&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$ ^[3]	$f'(\alpha ,x)={\begin{cases}\alpha &{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$	是	是	否
指數線性函數 (ELU)^[6]	$f(\alpha ,x)={\begin{cases}\alpha (e^{x}-1)&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(\alpha ,x)={\begin{cases}f(\alpha ,x)+\alpha &{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\alpha ,\infty )$	${\begin{cases}C_{1}&{\text{when }}\alpha =1\\C_{0}&{\text{otherwise }}\end{cases}}$	Yes iff $\alpha \geq 0$	Yes iff $0\leq \alpha \leq 1$	Yes iff $\alpha =1$
擴展指數線性函數 (SELU)^[7]	$f(\alpha ,x)=\lambda {\begin{cases}\alpha (e^{x}-1)&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$ with $\lambda =1.0507$ and $\alpha =1.67326$	$f'(\alpha ,x)=\lambda {\begin{cases}\alpha (e^{x})&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\lambda \alpha ,\infty )$	$C^{0}$	是	否	否
S 型線性整流激活函數 (SReLU)^[8]	$f_{t_{l},a_{l},t_{r},a_{r}}(x)={\begin{cases}t_{l}+a_{l}(x-t_{l})&{\text{for }}x\leq t_{l}\\x&{\text{for }}t_{l}<x<t_{r}\\t_{r}+a_{r}(x-t_{r})&{\text{for }}x\geq t_{r}\end{cases}}$ $t_{l},a_{l},t_{r},a_{r}$ are parameters.	$f'_{t_{l},a_{l},t_{r},a_{r}}(x)={\begin{cases}a_{l}&{\text{for }}x\leq t_{l}\\1&{\text{for }}t_{l}<x<t_{r}\\a_{r}&{\text{for }}x\geq t_{r}\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$	否	否	否
反平方根線性函數 (ISRLU)^[3]	$f(x)={\begin{cases}{\frac {x}{\sqrt {1+\alpha x^{2}}}}&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(x)={\begin{cases}\left({\frac {1}{\sqrt {1+\alpha x^{2}}}}\right)^{3}&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$\left(-{\frac {1}{\sqrt {\alpha }}},\infty \right)$	$C^{2}$	是	是	是
自適應分段線性函數 (APL)^[9]	$f(x)=\max(0,x)+\sum _{s=1}^{S}a_{i}^{s}\max(0,-x+b_{i}^{s})$	$f'(x)=H(x)-\sum _{s=1}^{S}a_{i}^{s}H(-x+b_{i}^{s})$ ^[4]	$(-\infty ,\infty )$	$C^{0}$	否	否	否
SoftPlus 函數^[10]	$f(x)=\ln(1+e^{x})$	$f'(x)={\frac {1}{1+e^{-x}}}$	$(0,\infty )$	$C^{\infty }$	是	是	否
彎曲恆等函數	$f(x)={\frac {{\sqrt {x^{2}+1}}-1}{2}}+x$	$f'(x)={\frac {x}{2{\sqrt {x^{2}+1}}}}+1$	$(-\infty ,\infty )$	$C^{\infty }$	是	是	是
S 型线性加权函数 (SiLU)^[11] (也被稱為Swish^[12])	$f(x)=x\cdot \sigma (x)$ ^[5]	$f'(x)=f(x)+\sigma (x)(1-f(x))$ ^[6]	$[\approx -0.28,\infty )$	$C^{\infty }$	否	否	否
软指数函數^[13]	$f(\alpha ,x)={\begin{cases}-{\frac {\ln(1-\alpha (x+\alpha ))}{\alpha }}&{\text{for }}\alpha <0\\x&{\text{for }}\alpha =0\\{\frac {e^{\alpha x}-1}{\alpha }}+\alpha &{\text{for }}\alpha >0\end{cases}}$	$f'(\alpha ,x)={\begin{cases}{\frac {1}{1-\alpha (\alpha +x)}}&{\text{for }}\alpha <0\\e^{\alpha x}&{\text{for }}\alpha \geq 0\end{cases}}$	$(-\infty ,\infty )$	$C^{\infty }$	是	是	Yes iff $\alpha =0$
正弦函數	$f(x)=\sin(x)$	$f'(x)=\cos(x)$	$[-1,1]$	$C^{\infty }$	否	否	是
Sinc 函數	$f(x)={\begin{cases}1&{\text{for }}x=0\\{\frac {\sin(x)}{x}}&{\text{for }}x\neq 0\end{cases}}$	$f'(x)={\begin{cases}0&{\text{for }}x=0\\{\frac {\cos(x)}{x}}-{\frac {\sin(x)}{x^{2}}}&{\text{for }}x\neq 0\end{cases}}$	$[\approx -0.217234,1]$	$C^{\infty }$	否	否	否
高斯函數	$f(x)=e^{-x^{2}}$	$f'(x)=-2xe^{-x^{2}}$	$(0,1]$	$C^{\infty }$	否	否	否

说明

^ 若一函数是连续的，则称其为

C^{0}

函数；若一函数

n

阶可导，并且其

n

阶导函数连续，则为

C^{n}

函数（

n\geq 1

）；若一函数对于所有

n

都属于

C^{n}

函数，则称其为 $C^{\infty }$ 函数，也称光滑函数。

^ 此處

H

是單位階躍函數。

^

α

是在訓練時間從均勻分佈中抽取的隨機變量，並且在測試時間固定為分佈的期望值。

^ ^ ^ 此處

\sigma

是邏輯函數。

多变量输入激活函数

名稱	方程式	導數	區間	光滑性
Softmax函數	$f_{i}({\vec {x}})={\frac {e^{x_{i}}}{\sum _{j=1}^{J}e^{x_{j}}}}$ for $i$ = 1, …, $J$	${\frac {\partial f_{i}({\vec {x}})}{\partial x_{j}}}=f_{i}({\vec {x}})(\delta _{ij}-f_{j}({\vec {x}}))$ ^[7]	$(0,1)$	$C^{\infty }$
Maxout函數^[14]	$f({\vec {x}})=\max _{i}x_{i}$	${\frac {\partial f}{\partial x_{j}}}={\begin{cases}1&{\text{for }}j={\underset {i}{\operatorname {argmax} }}\,x_{i}\\0&{\text{for }}j\neq {\underset {i}{\operatorname {argmax} }}\,x_{i}\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$

说明

^ 此處 $δ$ 是克羅內克δ函數。

參見

參考資料

^ Bergstra, James; Desjardins, Guillaume; Lamblin, Pascal; Bengio, Yoshua. Quadratic polynomials learn better image features". Technical Report 1337. Département d’Informatique et de Recherche Opérationnelle, Université de Montréal. 2009. （原始内容存档于2018-09-25）.
^ Glorot, Xavier; Bengio, Yoshua, Understanding the difficulty of training deep feedforward neural networks (PDF), International Conference on Artificial Intelligence and Statistics (AISTATS’10), Society for Artificial Intelligence and Statistics, 2010, （原始内容存档 (PDF)于2017-04-01）
^ ^3.0 ^3.1 Carlile, Brad; Delamarter, Guy; Kinney, Paul; Marti, Akiko; Whitney, Brian. Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs). 2017-11-09. arXiv:1710.09967  [cs.LG].
^ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. 2015-02-06. arXiv:1502.01852  [cs.CV].
^ Xu, Bing; Wang, Naiyan; Chen, Tianqi; Li, Mu. Empirical Evaluation of Rectified Activations in Convolutional Network. 2015-05-04. arXiv:1505.00853  [cs.LG].
^ Clevert, Djork-Arné; Unterthiner, Thomas; Hochreiter, Sepp. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). 2015-11-23. arXiv:1511.07289  [cs.LG].
^ Klambauer, Günter; Unterthiner, Thomas; Mayr, Andreas; Hochreiter, Sepp. Self-Normalizing Neural Networks. 2017-06-08. arXiv:1706.02515  [cs.LG].
^ Jin, Xiaojie; Xu, Chunyan; Feng, Jiashi; Wei, Yunchao; Xiong, Junjun; Yan, Shuicheng. Deep Learning with S-shaped Rectified Linear Activation Units. 2015-12-22. arXiv:1512.07030  [cs.CV].
^ Forest Agostinelli; Matthew Hoffman; Peter Sadowski; Pierre Baldi. Learning Activation Functions to Improve Deep Neural Networks. 21 Dec 2014. arXiv:1412.6830  [cs.NE].
^ Glorot, Xavier; Bordes, Antoine; Bengio, Yoshua. Deep sparse rectifier neural networks (PDF). International Conference on Artificial Intelligence and Statistics. 2011. （原始内容存档 (PDF)于2018-06-19）.
^ Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. [2018-06-13]. （原始内容存档于2018-06-13）.
^ Searching for Activation Functions. [2018-06-13]. （原始内容存档于2018-06-13）.
^ Godfrey, Luke B.; Gashler, Michael S. A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks. 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: KDIR. 2016-02-03, 1602: 481–486. Bibcode:2016arXiv160201321G. arXiv:1602.01321  .
^ Goodfellow, Ian J.; Warde-Farley, David; Mirza, Mehdi; Courville, Aaron; Bengio, Yoshua. Maxout Networks. JMLR WCP. 2013-02-18, 28 (3): 1319–1327. Bibcode:2013arXiv1302.4389G. arXiv:1302.4389  .

[1] Bergstra, James; Desjardins, Guillaume; Lamblin, Pascal; Bengio, Yoshua. Quadratic polynomials learn better image features". Technical Report 1337. Département d’Informatique et de Recherche Opérationnelle, Université de Montréal. 2009. （原始内容存档于2018-09-25）.

[2] Glorot, Xavier; Bengio, Yoshua, Understanding the difficulty of training deep feedforward neural networks (PDF), International Conference on Artificial Intelligence and Statistics (AISTATS’10), Society for Artificial Intelligence and Statistics, 2010, （原始内容存档 (PDF)于2017-04-01）

[isrlu-3] 3.0 ^3.1 Carlile, Brad; Delamarter, Guy; Kinney, Paul; Marti, Akiko; Whitney, Brian. Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs). 2017-11-09. arXiv:1710.09967  [cs.LG].

[4] He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. 2015-02-06. arXiv:1502.01852  [cs.CV].

[5] Xu, Bing; Wang, Naiyan; Chen, Tianqi; Li, Mu. Empirical Evaluation of Rectified Activations in Convolutional Network. 2015-05-04. arXiv:1505.00853  [cs.LG].

[6] Clevert, Djork-Arné; Unterthiner, Thomas; Hochreiter, Sepp. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). 2015-11-23. arXiv:1511.07289  [cs.LG].

[7] Klambauer, Günter; Unterthiner, Thomas; Mayr, Andreas; Hochreiter, Sepp. Self-Normalizing Neural Networks. 2017-06-08. arXiv:1706.02515  [cs.LG].

[8] Jin, Xiaojie; Xu, Chunyan; Feng, Jiashi; Wei, Yunchao; Xiong, Junjun; Yan, Shuicheng. Deep Learning with S-shaped Rectified Linear Activation Units. 2015-12-22. arXiv:1512.07030  [cs.CV].

[9] Forest Agostinelli; Matthew Hoffman; Peter Sadowski; Pierre Baldi. Learning Activation Functions to Improve Deep Neural Networks. 21 Dec 2014. arXiv:1412.6830  [cs.NE].

[10] Glorot, Xavier; Bordes, Antoine; Bengio, Yoshua. Deep sparse rectifier neural networks (PDF). International Conference on Artificial Intelligence and Statistics. 2011. （原始内容存档 (PDF)于2018-06-19）.

[11] Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. [2018-06-13]. （原始内容存档于2018-06-13）.

[12] Searching for Activation Functions. [2018-06-13]. （原始内容存档于2018-06-13）.

[13] Godfrey, Luke B.; Gashler, Michael S. A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks. 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: KDIR. 2016-02-03, 1602: 481–486. Bibcode:2016arXiv160201321G. arXiv:1602.01321  .

[14] Goodfellow, Ian J.; Warde-Farley, David; Mirza, Mehdi; Courville, Aaron; Bengio, Yoshua. Maxout Networks. JMLR WCP. 2013-02-18, 28 (3): 1319–1327. Bibcode:2013arXiv1302.4389G. arXiv:1302.4389  .

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]