关于 Activation Regularization

2024-10-21

Uncategorized

1.2k words

Activation Regularization - serp.ai A Gentle Introduction to Activation

\[\alpha{L}_{2}\left(m\circ{h_{t}}\right) \]

Here, \(m\) refers to the dropout mask used later in the model, \(\alpha\) is a scaling coefficient, and \(h_{t}\) is the output of the RNN at timestep \(t\). The \(L_{2}\) norm is used to calculate the magnitude of the activations, and the result is scaled by \(\alpha\). This encourages small activations, ultimately leading to better performance and generalization in the model.

A Gentle Introduction to Activation 中将Activation与value in feature画上等号，起初不理解，直到看到下文的：

These internal representations are tangible things. The output of a hidden layer within the network represent the learned features by the model at that point in the network.

之前的理解有局限：对于激活函数之后的下一层、单独看一个激活函数值来说，确实只是一个输入值；但如果考虑整个层输出的激活函数值整体，并且参照当前层的话，实际就是当前层的特征图。从这个角度看，Activation Regularization是限制特征图里值的大小

activation regularization 的别称