激活函數

在計算網絡中，一個節點的激活函數定義了該節點在給定的輸入或輸入的集合下的輸出。標準的計算機芯片電路可以看作是根據輸入得到開（1）或關（0）輸出的數位電路激活函數。這與神經網絡中的線性感知機的行為類似。然而，只有非線性激活函數才允許這種網絡僅使用少量節點來計算非平凡問題。在人工神經網絡中，這個功能也被稱為傳遞函數。

單變量輸入激活函數

名稱	方程式	導數	區間	連續性^[1]	單調	一階導數單調	原點近似恆等
恆等函數	$f(x)=x$	$f'(x)=1$	$(-\infty ,\infty )$	$C^{\infty }$	是	是	是
單位階躍函數	$f(x)={\begin{cases}0&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$f'(x)={\begin{cases}0&{\text{for }}x\neq 0\\{\text{不存在}}&{\text{for }}x=0\end{cases}}$	$\{0,1\}$	$C^{-1}$	是	否	否
邏輯函數 (S函數的一種)	$f(x)=\sigma (x)={\frac {1}{1+e^{-x}}}$ ^[2]	$f'(x)=f(x)(1-f(x))$	$(0,1)$	$C^{\infty }$	是	否	否
雙曲正切函數	$f(x)=\tanh(x)={\frac {(e^{x}-e^{-x})}{(e^{x}+e^{-x})}}$	$f'(x)=1-f(x)^{2}$	$(-1,1)$	$C^{\infty }$	是	否	是
反正切函數	$f(x)=\tan ^{-1}(x)$	$f'(x)={\frac {1}{x^{2}+1}}$	$\left(-{\frac {\pi }{2}},{\frac {\pi }{2}}\right)$	$C^{\infty }$	是	否	是
Softsign 函數^[1]^[2]	$f(x)={\frac {x}{1+\|x\|}}$	$f'(x)={\frac {1}{(1+\|x\|)^{2}}}$	$(-1,1)$	$C^{1}$	是	否	是
反平方根函數 (ISRU)^[3]	$f(x)={\frac {x}{\sqrt {1+\alpha x^{2}}}}$	$f'(x)=\left({\frac {1}{\sqrt {1+\alpha x^{2}}}}\right)^{3}$	$\left(-{\frac {1}{\sqrt {\alpha }}},{\frac {1}{\sqrt {\alpha }}}\right)$	$C^{\infty }$	是	否	是
線性整流函數 (ReLU)	$f(x)={\begin{cases}0&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(x)={\begin{cases}0&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$[0,\infty )$	$C^{0}$	是	是	否
帶泄露線性整流函數 (Leaky ReLU)	$f(x)={\begin{cases}0.01x&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(x)={\begin{cases}0.01&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$	是	是	否
參數化線性整流函數 (PReLU)^[4]	$f(\alpha ,x)={\begin{cases}\alpha x&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(\alpha ,x)={\begin{cases}\alpha &{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$	Yes iff $\alpha \geq 0$	是	Yes iff $\alpha =1$
帶泄露隨機線性整流函數 (RReLU)^[5]	$f(\alpha ,x)={\begin{cases}\alpha x&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$ ^[3]	$f'(\alpha ,x)={\begin{cases}\alpha &{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$	是	是	否
指數線性函數 (ELU)^[6]	$f(\alpha ,x)={\begin{cases}\alpha (e^{x}-1)&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(\alpha ,x)={\begin{cases}f(\alpha ,x)+\alpha &{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\alpha ,\infty )$	${\begin{cases}C_{1}&{\text{when }}\alpha =1\\C_{0}&{\text{otherwise }}\end{cases}}$	Yes iff $\alpha \geq 0$	Yes iff $0\leq \alpha \leq 1$	Yes iff $\alpha =1$
擴展指數線性函數 (SELU)^[7]	$f(\alpha ,x)=\lambda {\begin{cases}\alpha (e^{x}-1)&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$ with $\lambda =1.0507$ and $\alpha =1.67326$	$f'(\alpha ,x)=\lambda {\begin{cases}\alpha (e^{x})&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\lambda \alpha ,\infty )$	$C^{0}$	是	否	否
S 型線性整流激活函數 (SReLU)^[8]	$f_{t_{l},a_{l},t_{r},a_{r}}(x)={\begin{cases}t_{l}+a_{l}(x-t_{l})&{\text{for }}x\leq t_{l}\\x&{\text{for }}t_{l}<x<t_{r}\\t_{r}+a_{r}(x-t_{r})&{\text{for }}x\geq t_{r}\end{cases}}$ $t_{l},a_{l},t_{r},a_{r}$ are parameters.	$f'_{t_{l},a_{l},t_{r},a_{r}}(x)={\begin{cases}a_{l}&{\text{for }}x\leq t_{l}\\1&{\text{for }}t_{l}<x<t_{r}\\a_{r}&{\text{for }}x\geq t_{r}\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$	否	否	否
反平方根線性函數 (ISRLU)^[3]	$f(x)={\begin{cases}{\frac {x}{\sqrt {1+\alpha x^{2}}}}&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(x)={\begin{cases}\left({\frac {1}{\sqrt {1+\alpha x^{2}}}}\right)^{3}&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$\left(-{\frac {1}{\sqrt {\alpha }}},\infty \right)$	$C^{2}$	是	是	是
自適應分段線性函數 (APL)^[9]	$f(x)=\max(0,x)+\sum _{s=1}^{S}a_{i}^{s}\max(0,-x+b_{i}^{s})$	$f'(x)=H(x)-\sum _{s=1}^{S}a_{i}^{s}H(-x+b_{i}^{s})$ ^[4]	$(-\infty ,\infty )$	$C^{0}$	否	否	否
SoftPlus 函數^[10]	$f(x)=\ln(1+e^{x})$	$f'(x)={\frac {1}{1+e^{-x}}}$	$(0,\infty )$	$C^{\infty }$	是	是	否
彎曲恆等函數	$f(x)={\frac {{\sqrt {x^{2}+1}}-1}{2}}+x$	$f'(x)={\frac {x}{2{\sqrt {x^{2}+1}}}}+1$	$(-\infty ,\infty )$	$C^{\infty }$	是	是	是
S 型線性加權函數 (SiLU)^[11] (也被稱為Swish^[12])	$f(x)=x\cdot \sigma (x)$ ^[5]	$f'(x)=f(x)+\sigma (x)(1-f(x))$ ^[6]	$[\approx -0.28,\infty )$	$C^{\infty }$	否	否	否
軟指數函數^[13]	$f(\alpha ,x)={\begin{cases}-{\frac {\ln(1-\alpha (x+\alpha ))}{\alpha }}&{\text{for }}\alpha <0\\x&{\text{for }}\alpha =0\\{\frac {e^{\alpha x}-1}{\alpha }}+\alpha &{\text{for }}\alpha >0\end{cases}}$	$f'(\alpha ,x)={\begin{cases}{\frac {1}{1-\alpha (\alpha +x)}}&{\text{for }}\alpha <0\\e^{\alpha x}&{\text{for }}\alpha \geq 0\end{cases}}$	$(-\infty ,\infty )$	$C^{\infty }$	是	是	Yes iff $\alpha =0$
正弦函數	$f(x)=\sin(x)$	$f'(x)=\cos(x)$	$[-1,1]$	$C^{\infty }$	否	否	是
Sinc 函數	$f(x)={\begin{cases}1&{\text{for }}x=0\\{\frac {\sin(x)}{x}}&{\text{for }}x\neq 0\end{cases}}$	$f'(x)={\begin{cases}0&{\text{for }}x=0\\{\frac {\cos(x)}{x}}-{\frac {\sin(x)}{x^{2}}}&{\text{for }}x\neq 0\end{cases}}$	$[\approx -0.217234,1]$	$C^{\infty }$	否	否	否
高斯函數	$f(x)=e^{-x^{2}}$	$f'(x)=-2xe^{-x^{2}}$	$(0,1]$	$C^{\infty }$	否	否	否

說明

^ 若一函數是連續的，則稱其為

C^{0}

函數；若一函數

n

階可導，並且其

n

階導函數連續，則為

C^{n}

函數（

n\geq 1

）；若一函數對於所有

n

都屬於

C^{n}

函數，則稱其為 $C^{\infty }$ 函數，也稱光滑函數。

^ 此處

H

是單位階躍函數。

^

α

是在訓練時間從均勻分佈中抽取的隨機變量，並且在測試時間固定為分佈的期望值。

^ ^ ^ 此處

\sigma

是邏輯函數。

多變量輸入激活函數

名稱	方程式	導數	區間	光滑性
Softmax函數	$f_{i}({\vec {x}})={\frac {e^{x_{i}}}{\sum _{j=1}^{J}e^{x_{j}}}}$ for $i$ = 1, …, $J$	${\frac {\partial f_{i}({\vec {x}})}{\partial x_{j}}}=f_{i}({\vec {x}})(\delta _{ij}-f_{j}({\vec {x}}))$ ^[7]	$(0,1)$	$C^{\infty }$
Maxout函數^[14]	$f({\vec {x}})=\max _{i}x_{i}$	${\frac {\partial f}{\partial x_{j}}}={\begin{cases}1&{\text{for }}j={\underset {i}{\operatorname {argmax} }}\,x_{i}\\0&{\text{for }}j\neq {\underset {i}{\operatorname {argmax} }}\,x_{i}\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$

說明

^ 此處 $δ$ 是克羅內克δ函數。

參見

參考資料

^ Bergstra, James; Desjardins, Guillaume; Lamblin, Pascal; Bengio, Yoshua. Quadratic polynomials learn better image features". Technical Report 1337. Département d』Informatique et de Recherche Opérationnelle, Université de Montréal. 2009. （原始內容存檔於2018-09-25）.
^ Glorot, Xavier; Bengio, Yoshua, Understanding the difficulty of training deep feedforward neural networks (PDF), International Conference on Artificial Intelligence and Statistics (AISTATS’10), Society for Artificial Intelligence and Statistics, 2010, （原始內容存檔 (PDF)於2017-04-01）
^ ^3.0 ^3.1 Carlile, Brad; Delamarter, Guy; Kinney, Paul; Marti, Akiko; Whitney, Brian. Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs). 2017-11-09. arXiv:1710.09967  [cs.LG].
^ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. 2015-02-06. arXiv:1502.01852  [cs.CV].
^ Xu, Bing; Wang, Naiyan; Chen, Tianqi; Li, Mu. Empirical Evaluation of Rectified Activations in Convolutional Network. 2015-05-04. arXiv:1505.00853  [cs.LG].
^ Clevert, Djork-Arné; Unterthiner, Thomas; Hochreiter, Sepp. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). 2015-11-23. arXiv:1511.07289  [cs.LG].
^ Klambauer, Günter; Unterthiner, Thomas; Mayr, Andreas; Hochreiter, Sepp. Self-Normalizing Neural Networks. 2017-06-08. arXiv:1706.02515  [cs.LG].
^ Jin, Xiaojie; Xu, Chunyan; Feng, Jiashi; Wei, Yunchao; Xiong, Junjun; Yan, Shuicheng. Deep Learning with S-shaped Rectified Linear Activation Units. 2015-12-22. arXiv:1512.07030  [cs.CV].
^ Forest Agostinelli; Matthew Hoffman; Peter Sadowski; Pierre Baldi. Learning Activation Functions to Improve Deep Neural Networks. 21 Dec 2014. arXiv:1412.6830  [cs.NE].
^ Glorot, Xavier; Bordes, Antoine; Bengio, Yoshua. Deep sparse rectifier neural networks (PDF). International Conference on Artificial Intelligence and Statistics. 2011. （原始內容存檔 (PDF)於2018-06-19）.
^ Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. [2018-06-13]. （原始內容存檔於2018-06-13）.
^ Searching for Activation Functions. [2018-06-13]. （原始內容存檔於2018-06-13）.
^ Godfrey, Luke B.; Gashler, Michael S. A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks. 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: KDIR. 2016-02-03, 1602: 481–486. Bibcode:2016arXiv160201321G. arXiv:1602.01321  .
^ Goodfellow, Ian J.; Warde-Farley, David; Mirza, Mehdi; Courville, Aaron; Bengio, Yoshua. Maxout Networks. JMLR WCP. 2013-02-18, 28 (3): 1319–1327. Bibcode:2013arXiv1302.4389G. arXiv:1302.4389  .

[1] Bergstra, James; Desjardins, Guillaume; Lamblin, Pascal; Bengio, Yoshua. Quadratic polynomials learn better image features". Technical Report 1337. Département d』Informatique et de Recherche Opérationnelle, Université de Montréal. 2009. （原始內容存檔於2018-09-25）.

[2] Glorot, Xavier; Bengio, Yoshua, Understanding the difficulty of training deep feedforward neural networks (PDF), International Conference on Artificial Intelligence and Statistics (AISTATS’10), Society for Artificial Intelligence and Statistics, 2010, （原始內容存檔 (PDF)於2017-04-01）

[isrlu-3] 3.0 ^3.1 Carlile, Brad; Delamarter, Guy; Kinney, Paul; Marti, Akiko; Whitney, Brian. Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs). 2017-11-09. arXiv:1710.09967  [cs.LG].

[4] He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. 2015-02-06. arXiv:1502.01852  [cs.CV].

[5] Xu, Bing; Wang, Naiyan; Chen, Tianqi; Li, Mu. Empirical Evaluation of Rectified Activations in Convolutional Network. 2015-05-04. arXiv:1505.00853  [cs.LG].

[6] Clevert, Djork-Arné; Unterthiner, Thomas; Hochreiter, Sepp. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). 2015-11-23. arXiv:1511.07289  [cs.LG].

[7] Klambauer, Günter; Unterthiner, Thomas; Mayr, Andreas; Hochreiter, Sepp. Self-Normalizing Neural Networks. 2017-06-08. arXiv:1706.02515  [cs.LG].

[8] Jin, Xiaojie; Xu, Chunyan; Feng, Jiashi; Wei, Yunchao; Xiong, Junjun; Yan, Shuicheng. Deep Learning with S-shaped Rectified Linear Activation Units. 2015-12-22. arXiv:1512.07030  [cs.CV].

[9] Forest Agostinelli; Matthew Hoffman; Peter Sadowski; Pierre Baldi. Learning Activation Functions to Improve Deep Neural Networks. 21 Dec 2014. arXiv:1412.6830  [cs.NE].

[10] Glorot, Xavier; Bordes, Antoine; Bengio, Yoshua. Deep sparse rectifier neural networks (PDF). International Conference on Artificial Intelligence and Statistics. 2011. （原始內容存檔 (PDF)於2018-06-19）.

[11] Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. [2018-06-13]. （原始內容存檔於2018-06-13）.

[12] Searching for Activation Functions. [2018-06-13]. （原始內容存檔於2018-06-13）.

[13] Godfrey, Luke B.; Gashler, Michael S. A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks. 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: KDIR. 2016-02-03, 1602: 481–486. Bibcode:2016arXiv160201321G. arXiv:1602.01321  .

[14] Goodfellow, Ian J.; Warde-Farley, David; Mirza, Mehdi; Courville, Aaron; Bengio, Yoshua. Maxout Networks. JMLR WCP. 2013-02-18, 28 (3): 1319–1327. Bibcode:2013arXiv1302.4389G. arXiv:1302.4389  .

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]