LogSumExp

LogSumExp（LSE，也稱RealSoftMax^[1]或多變數softplus）函數是一個平滑最大值——一個對極值函數的光滑近似，主要用在機器學習演算法中。^[2] 其定義為參數的指數的和的對數：

\mathrm {LSE} (x_{1},\dots ,x_{n})=\log \left(\exp(x_{1})+\cdots +\exp(x_{n})\right).

性質

LogSumExp函數的定義域為 $\mathbb {R} ^{n}$ （實數空間（英語：real coordinate space）），共域是 $\mathbb {R}$ （實數線）。它是對極值函數 $\max _{i}x_{i}$ 的近似，同時有如下的界限：

\max {\{x_{1},\dots ,x_{n}\}}\leq \mathrm {LSE} (x_{1},\dots ,x_{n})\leq \max {\{x_{1},\dots ,x_{n}\}}+\log(n).

第一個不等式在 $n=1$ 以外的情況是嚴格成立的，第二個不等式僅在所有元素相等時取等號。（證明：令 $m=\max _{i}x_{i}$ ，則 $\exp(m)\leq \sum _{i=1}^{n}\exp(x_{i})\leq n\exp(m)$ 。將不等式取對數即可。）

另外，我們可以將不等式縮放到更緊的界限。考慮函數 ${\frac {1}{t}}\mathrm {LSE} (tx)$ 。然後，

\max {\{x_{1},\dots ,x_{n}\}}<{\frac {1}{t}}\mathrm {LSE} (tx)\leq \max {\{x_{1},\dots ,x_{n}\}}+{\frac {\log(n)}{t}}

（證明：將上式 $x_{i}$ 用 $t>0$ 的 $tx_{i}$ 替換，得到

\max {\{tx_{1},\dots ,tx_{n}\}}<\mathrm {LSE} (tx_{1},\dots ,tx_{n})\leq \max {\{tx_{1},\dots ,tx_{n}\}}+\log(n)

由於 $t>0$ ，

t\max {\{x_{1},\dots ,x_{n}\}}<\mathrm {LSE} (tx_{1},\dots ,tx_{n})\leq t\max {\{x_{1},\dots ,x_{n}\}}+\log(n)

最後，同除 $t$ 得到結果。）

此外，如果我們乘上一個負數，可以得到一個與 $\min$ 有關的不等式：

\min {\{x_{1},\dots ,x_{n}\}}-{\frac {\log(n)}{t}}\leq {\frac {1}{-t}}\mathrm {LSE} (-tx)<\min {\{x_{1},\dots ,x_{n}\}}.

LogSumExp函數是凸函數，因此在定義域上嚴格遞增。^[3] （但並非處處都是嚴格凸的^[4]。）

令 $\mathbf {x} =(x_{1},\dots ,x_{n})$ ，偏導數為：

{\frac {\partial }{\partial x_{i}}}{\mathrm {LSE} (\mathbf {x} )}={\frac {\exp x_{i}}{\sum _{j}\exp {x_{j}}}},

表明LogSumExp的梯度是softmax函數。

LogSumExp的凸共軛是負熵（英語：negative entropy）。

對數體中的log-sum-exp計算技巧

當通常的算術計算在對數尺度上進行時，經常會遇到LSE函數，例如對數概率。^[5]

類似於線性尺度中的乘法運算變成對數尺度中的簡單加法，線性尺度中的加法運算變成對數尺度中的LSE：

\mathrm {LSE} (\log(x_{1}),...,\log(x_{n}))=\log(x_{1}+\dots +x_{n})

使用對數體計算的一個常見目的是在使用有限精度浮點數直接表示（線上性域中）非常小或非常大的數字時提高精度並避免溢位問題.^[6]

不幸的是，在一些情況下直接使用 LSE 依然會導致上溢/下溢問題，必須改用以下等效公式（尤其是當上述「最大」近似值的準確性不夠時）。因此，IT++等很多數學庫都提供了LSE的預設常式，並在內部使用了這個公式。

\mathrm {LSE} (x_{1},\dots ,x_{n})=x^{*}+\log \left(\exp(x_{1}-x^{*})+\cdots +\exp(x_{n}-x^{*})\right)

其中 $x^{*}=\max {\{x_{1},\dots ,x_{n}\}}$

一個嚴格凸的log-sum-exp型函數

LSE是凸的，但不是嚴格凸的。我們可以通過增加一項為零的額外參數來定義一個嚴格凸的log-sum-exp型函數^[7]：

\mathrm {LSE} _{0}^{+}(x_{1},...,x_{n})=\mathrm {LSE} (0,x_{1},...,x_{n})

This function is a proper Bregman generator (strictly convex and differentiable). It is encountered in machine learning, for example, as the cumulant of the multinomial/binomial family.

在熱帶分析（英語：tropical analysis）中，這是對數半環（英語：log semiring）的和。

參見

參考資料

^ Zhang, Aston; Lipton, Zack; Li, Mu; Smola, Alex. Dive into Deep Learning, Chapter 3 Exercises. www.d2l.ai. [27 June 2020]. （原始內容存檔於2022-03-31）.
^ Nielsen, Frank; Sun, Ke. Guaranteed bounds on the Kullback-Leibler divergence of univariate mixtures using piecewise log-sum-exp inequalities. Entropy. 2016, 18 (12): 442. Bibcode:2016Entrp..18..442N. S2CID 17259055. arXiv:1606.05850  . doi:10.3390/e18120442  .
^ El Ghaoui, Laurent. Optimization Models and Applications. 2017 [2022-10-16]. （原始內容存檔於2020-12-19）.
^ convex analysis - About the strictly convexity of log-sum-exp function - Mathematics Stack Exchange. stackexchange.com.
^ McElreath, Richard. Statistical Rethinking. OCLC 1107423386.
^ Practical issues: Numeric stability.. CS231n Convolutional Neural Networks for Visual Recognition. [2022-10-16]. （原始內容存檔於2022-12-06）.
^ Nielsen, Frank; Hadjeres, Gaetan. Monte Carlo Information Geometry: The dually flat case. 2018. Bibcode:2018arXiv180307225N. arXiv:1803.07225  .

取自 "https://zh.wikipedia.org/w/index.php?title=LogSumExp&oldid=75624720"