協方差矩陣

在統計學與概率論中，協方差矩陣（covariance matrix）是一個方陣，代表着任兩列隨機變量（英語：Multivariate random variable）間的協方差，是協方差的直接推廣。

一個左下右上方向標準差為 3，正交方向標準差為 1 的多元高斯分佈的樣本點。由於 x 和 y 分量共變（即相關），x 與 y 的方差不能完全描述該分佈；箭頭的方向對應的協方差矩陣的特徵向量，其長度為特徵值的平方根。

定義

定義 —
設 $(\Omega ,\,\Sigma ,\,P)$ 是概率空間， $X=\{x_{i}\}_{i=1}^{m}$ 與 $Y=\{y_{i}\}_{j=1}^{n}$ 是定義在 $\Omega$ 上的兩列實數隨機變量序列

若二者對應的期望值分別為：

E(x_{i})=\int _{\Omega }x_{i}\,dP=\mu _{i}

E(y_{j})=\int _{\Omega }y_{j}\,dP=\nu _{j}

則這兩列隨機變量間的協方差矩陣為：

\operatorname {\mathbf {cov} } (X,Y):={\left[\,\operatorname {cov} (x_{i},y_{j})\,\right]}_{m\times n}={{\bigg [}\,\operatorname {E} [(x_{i}-\mu _{i})(y_{j}-\nu _{j})]\,{\bigg ]}}_{m\times n}

將之以矩形表示的話就是：

\operatorname {\mathbf {cov} } (X,Y)={\begin{bmatrix}\operatorname {cov} (x_{1},y_{1})&\operatorname {cov} (x_{1},y_{2})&\cdots &\operatorname {cov} (x_{1},y_{n})\\\operatorname {cov} (x_{2},y_{1})&\operatorname {cov} (x_{2},y_{2})&\cdots &\operatorname {cov} (x_{2},y_{n})\\\vdots &\vdots &\ddots &\vdots \\\operatorname {cov} (x_{m},y_{1})&\operatorname {cov} (x_{m},y_{2})&\cdots &\operatorname {cov} (x_{m},y_{n})\end{bmatrix}}

={\begin{bmatrix}\mathrm {E} [(x_{1}-\mu _{1})(y_{1}-\nu _{1})]&\mathrm {E} [(x_{1}-\mu _{1})(y_{2}-\nu _{2})]&\cdots &\mathrm {E} [(x_{1}-\mu _{1})(y_{n}-\nu _{n})]\\\mathrm {E} [(x_{2}-\mu _{2})(y_{1}-\nu _{1})]&\mathrm {E} [(x_{2}-\mu _{2})(y_{2}-\nu _{2})]&\cdots &\mathrm {E} [(x_{2}-\mu _{2})(y_{n}-\nu _{n})]\\\vdots &\vdots &\ddots &\vdots \\\mathrm {E} [(x_{m}-\mu _{m})(y_{1}-\nu _{1})]&\mathrm {E} [(x_{m}-\mu _{m})(y_{2}-\nu _{2})]&\cdots &\mathrm {E} [(x_{m}-\mu _{m})(y_{n}-\nu _{n})]\end{bmatrix}}

根據測度積分的線性性質，協方差矩陣還可以進一步化簡為：

\operatorname {\mathbf {cov} } (X,Y)={\left[\,\operatorname {E} (x_{i}y_{j})-\mu _{i}\nu _{j}\,\right]}_{n\times n}

矩陣表示法

以上定義所述的隨機變量序列 $X$ 和 $Y$ ，也可分別以用行向量 $\mathbf {X} :={\left[x_{i}\right]}_{m}$ 與 $\mathbf {Y} :={\left[y_{j}\right]}_{n}$ 表示，換句話說：

\mathbf {X} :={\begin{bmatrix}x_{1}\\x_{2}\\\vdots \\x_{m}\end{bmatrix}}

\mathbf {Y} :={\begin{bmatrix}y_{1}\\y_{2}\\\vdots \\y_{n}\end{bmatrix}}

這樣的話，對於 $m\times n$ 個定義在 $\Omega$ 上的隨機變量 $a_{ij}$ 所組成的矩陣 $\mathbf {A} ={\left[\,a_{ij}\,\right]}_{m\times n}$ ，定義：

\mathrm {E} [\mathbf {A} ]:={\left[\,\operatorname {E} (a_{ij})\,\right]}_{m\times n}

也就是說

\mathrm {E} [\mathbf {A} ]:={\begin{bmatrix}\operatorname {E} (a_{11})&\operatorname {E} (a_{12})&\cdots &\operatorname {E} (a_{1n})\\\operatorname {E} (a_{21})&\operatorname {E} (a_{22})&\cdots &\operatorname {E} (a_{2n})\\\vdots &\vdots &\ddots &\vdots \\\operatorname {E} (a_{m1})&\operatorname {E} (a_{m2})&\cdots &\operatorname {E} (a_{mn})\end{bmatrix}}

那上小節定義的協方差矩陣就可以記為：

\operatorname {\mathbf {cov} } (X,Y)=\mathrm {E} \left[\left(\mathbf {X} -\mathrm {E} [\mathbf {X} ]\right)\left(\mathbf {Y} -\mathrm {E} [\mathbf {Y} ]\right)^{\rm {T}}\right]

所以協方差矩陣也可對 $\mathbf {X}$ 與 $\mathbf {Y}$ 來定義：

\operatorname {\mathbf {cov} } (\mathbf {X} ,\mathbf {Y} ):=\mathrm {E} \left[\left(\mathbf {X} -\mathrm {E} [\mathbf {X} ]\right)\left(\mathbf {Y} -\mathrm {E} [\mathbf {Y} ]\right)^{\rm {T}}\right]

術語與符號分歧

也有人把以下的 $\mathbf {\Sigma } _{X}$ 稱為協方差矩陣：

{\begin{aligned}\mathbf {\Sigma } _{X}&:={\left[\operatorname {cov} (x_{i},x_{j})\right]}_{m\times m}\\&=\operatorname {\mathbf {cov} } (X,X)\end{aligned}}

但本頁面沿用威廉·費勒的說法，把 $\mathbf {\Sigma } _{X}$ 稱為 $X$ 的方差（variance of random vector），來跟 $\operatorname {\mathbf {cov} } (X,Y)$ 作區別。這是因為：

\operatorname {cov} (x_{i},x_{i})=\operatorname {E} [{(x_{i}-\mu _{i})}^{2}]=\operatorname {var} (x_{i})

換句話說， $\mathbf {\Sigma } _{X}$ 的對角線由隨機變量 $x_{i}$ 的方差所組成。據此，也有人也把 $\operatorname {\mathbf {cov} } (X,Y)$ 稱為方差-協方差矩陣（variance–covariance matrix）。

更有人因為方差和離差的相關性，含混的將 $\operatorname {\mathbf {cov} } (X,Y)$ 稱為離差矩陣。

性質

$\mathbf {\Sigma } =\operatorname {\mathbf {cov} } (X,X)$ 有以下的基本性質：

$\mathbf {\Sigma } =\mathrm {E} (\mathbf {X} \mathbf {X} ^{T})-\mathrm {E} (\mathbf {X} ){[\mathrm {E} (\mathbf {X} )]}^{T}$
$\mathbf {\Sigma }$ 是半正定的和對稱的矩陣。
$\operatorname {var} (\mathbf {a^{T}} \mathbf {X} )=\mathbf {a^{T}} \operatorname {var} (\mathbf {X} )\mathbf {a}$
$\mathbf {\Sigma } \geq 0$
$\operatorname {var} (\mathbf {AX} +\mathbf {a} )=\mathbf {A} \operatorname {var} (\mathbf {X} )\mathbf {A^{T}}$
$\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )=\operatorname {cov} (\mathbf {Y} ,\mathbf {X} )^{T}$
$\operatorname {cov} (\mathbf {X_{1}} +\mathbf {X_{2}} ,\mathbf {Y} )=\operatorname {cov} (\mathbf {X_{1}} ,\mathbf {Y} )+\operatorname {cov} (\mathbf {X_{2}} ,\mathbf {Y} )$
若 $p=q$ ，則有 $\operatorname {var} (\mathbf {X} +\mathbf {Y} )=\operatorname {var} (\mathbf {X} )+\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )+\operatorname {cov} (\mathbf {Y} ,\mathbf {X} )+\operatorname {var} (\mathbf {Y} )$
$\operatorname {cov} (\mathbf {AX} ,\mathbf {BX} )=\mathbf {A} \operatorname {cov} (\mathbf {X} ,\mathbf {X} )\mathbf {B} ^{T}$
若 $\mathbf {X}$ 與 $\mathbf {Y}$ 是獨立的，則有 $\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )=0$
$\mathbf {\Sigma } =\mathbf {\Sigma } ^{T}$

儘管協方差矩陣很簡單，可它卻是很多領域裏的非常有力的工具。它能導出一個轉換矩陣，這個矩陣能使數據完全去相關(decorrelation)。從不同的角度看，也就是說能夠找出一組最佳的基以緊湊的方式來表達數據。(完整的證明請參考瑞利商)。這個方法在統計學中被稱為主成分分析(principal components analysis)，在圖像處理中稱為Karhunen-Loève 轉換(KL-轉換)。