Initialization in DL
Category
Lecun
Uniform
It draws samples from a uniform distribution within [-limit, limit] where limit is
limit = sqrt(3. / fan_in)
。fan_in
: the number of input units in the weight tensor。Tensorflow中的函数:
Normal
It draws samples from a truncated normal distribution centered on 0 with
stddev = sqrt(1. / fan_in)
。fan_in
: the number of input units in the weight tensor.Tensorflow中的函数:
Xavier (or Glorot)
Uniform
It draws samples from a uniform distribution within [-limit, limit] where limit is
limit = sqrt(6. / (fan_in + fan_out))
。fan_in
: the number of input units in the weight tensor.fan_out
: the number of output units in the weight tensor.Tensorflow中的函数:
Normal
It draws samples from a truncated normal distribution centered on 0 with
stddev = sqrt(2. / (fan_in + fan_out))
。Tensorflow中的函数:
He / MSRA initialization
Uniform
It draws samples from a uniform distribution within [-limit, limit] where limit is
limit = sqrt(6. / fan_in)
。fan_in
: the number of input units in the weight tensor。Tensorflow中的函数:
Normal
It draws samples from a truncated normal distribution centered on 0 with
stddev = sqrt(2 / fan_in)
。fan_in
: the number of input units in the weight tensor。Tensorflow中的函数:
如果用了Relu/Leaky Relu
最好用这个初始化方法。
Glorot_He
Uniform
Normal
RandomUniform
Tensorflow中的函数:
TruncatedNormal
Tensorflow中的函数:
Orthogonal
Template in tensorflow
1 | def init_weights(): |
问题
为什么要打破网络的对称性。
对称性
是指某一个隐藏层中的所有hidden units都是一样的。如果网络是对称的,隐藏层相当于只有一个有意义的 hidden unit(只学到了一个特征)。而理想的情况是每一个hidden unit都学到了各自的特征,因此要打破网络的对称性。为什么不能全初始化为0。
如果全部初始化为0,那么会使得网络变成对称的(此时没有考虑bias,没有使用dropout)。