矩阵范数与Spectral标准化

在《Spectral Normalization for Generative Adversarial Networks》中，为了提高GAN中判别器训练的稳定性，利用spectral normalization控制判别器函数f的Lipschitz常数。相较于直接施加Lipschitz约束，该方法显著降低了模型训练和推理的计算复杂度。

矩阵范数

若阈$K^n$的向量范数$\Vert\cdot\Vert_{\alpha}$和阈$K^m$的向量范数$\Vert\cdot\Vert_{\beta}$，那么$m\times n$的矩阵$A$定义了一个$K^n\to K^m$的线性运算，那么矩阵范数为
$$
\begin{aligned}
\Vert A\Vert_{\alpha,\beta}=sup\{\Vert Ax\Vert_{\beta}:x\in K^{n},\Vert x\Vert_{\alpha}\le1\}
\end{aligned}\tag{1}
$$
该范数度量了矩阵$A$的映射可拉伸向量的程度。

若向量的$p$范数($1\le p\le\infty$)均用于空间$K^n$和$K^{m}$，那么对应的矩阵范数为
$$
\begin{aligned}
\Vert A\Vert_{p}=sup\{\Vert Ax\Vert_p:x\in K^n,\Vert x\Vert_p\le1\}
\end{aligned}\tag{2}
$$

几何视角

若$p$范数被视为$K^n$空间的单位球$V_{p,n}=\{x\in K^{n}:\Vert x\Vert_p\le1\}$，那么球上的线性映射$A$。最终，单位球变为扭曲的凸形$AV_{p,n}\subset K^m$，且$\Vert A\Vert_p$度量该扭曲凸形的最长半径。

Lipschitz范数

对于函数$f:X\to Y$，那么Lipschitz范数为
$$
\begin{aligned}
\Vert f\Vert_{Lip}:=\underset{x_1\neq x_2}{sup}\frac{d_Y(f(x)-f(y))}{d_X(x_1,x_2)}
\end{aligned}\tag{3}
$$
式中$d_X,d_y$为距离度量函数。最小$M$，使$\Vert f\Vert_{Lip}\le M$，属于Lipschitz常数。

Spectral Normalization

为了控制判别器函数的Lipschitz常量，通过逐个约束每层$g:\mathbf{h}_{in}\to\mathbf{h}_{out}$的spectral范数。根据定义，Lipschitz范数$\Vert g\Vert_{Lip}$等于$sup_h(\sigma(\nabla g(\mathbf{h})))$，即梯度$\nabla g(\mathbf{h})$的spectral范数$\sigma(\nabla g(\mathbf{h}))$的极大值。其中，矩阵$A$的spectral范数$\sigma(A)$是$A$的$L_2$矩阵范数：
$$
\begin{aligned}
\sigma(A):=\underset{\mathbf{h}:\mathbf{h}\neq0}{max}\frac{\Vert A\mathbf{h}\Vert_2}{\Vert \mathbf{h}\Vert_2}=\underset{\Vert \mathbf{h}\Vert_2\le1}{max}\Vert A\mathbf{h}\Vert_2
\end{aligned}\tag{4}
$$
等价于$A$的最大奇异值。由此，对于线形层$g(\mathbf{h}=W\mathbf{h})$，范数$\Vert g\Vert_{Lip}=sup_{\mathbf{h}}\sigma(\nabla g(\mathbf{h}))=sup_{\mathbf{h}}\sigma(W)=\sigma(W)$。若激活函数的Lipschitz的范数为$1$，那么可根据不等式$\Vert g_1\circ g_2\Vert_{Lip}\le\Vert g_1\Vert_{Lip}\cdot\Vert g_2\Vert_{Lip}$得
$$
\begin{aligned}
\Vert f\Vert_{Lip}&\le\Vert(\mathbf{h}_L\to W^{L+1}\mathbf{h}_{L})\Vert_{Lip}\cdot\Vert(\mathbf{h}_{L-1}\to W^{L}\mathbf{h}_{L-1})\Vert_{Lip} \\
&\cdots\Vert a_1\Vert_{Lip}\cdot\Vert(\mathbf{h}_0\to W^{1}\mathbf{h}_0)\Vert_{Lip}=\prod_{l=1}^{L+1}\Vert(\mathbf{h}_{l-1}\to W^{l}h_{l-1})\Vert_{Lip}=\prod_{l=1}^{L+1}\sigma(W^{l})
\end{aligned}\tag{5}
$$
由此，spectral normalization为
$$
\begin{aligned}
\bar{W}_{SN}(W):=\frac{W}{\sigma(W)}
\end{aligned}\tag{6}
$$
即每层神经网络参数被其最大奇异值标准化，从而约束函数$f$得Lipschitz常数小等于$1$。

矩阵范数与Spectral标准化

矩阵范数

几何视角

Lipschitz范数

Spectral Normalization

引用方法

添加新评论

最新文章

标签云 (Top20)

分类