2017-02-14

CS231n Notes 2

CS231n课程笔记，记录自己看懂和没看懂的地方。这里是CS231n Notes 1。

Lecture 9

Lecture 10

RNN

输出层的输出只与当前的隐藏状态有关，而隐藏层可以认为是网络的记忆单元，一般只记忆前面若干步的隐藏状态。
RNN的权重是共享的，也就是输入-->隐层，隐层-->隐层，隐层-->输出的权重矩阵（分别是U、W 与 V），在每一层都是对同一个进行更新。
RNN并不是每一步都必须有输入与输出，但必须有隐层，用于捕捉序列的信息。
RNN训练用BPTT，但BPTT无法解决长时以来问题（即当前输出与前面一段较长的序列相关，可能会带来梯度消失）。

RNN中不常使用正则化，是一个hyper parameter。
LSTM之所以要forget一些东西，是为了防止时间轴上的梯度消失。LSTM的结果类似ResNet，但它还有个forget gate。不同结构的效果基本相同，有些在某些情况可能会更坏也可能更好。

Image Caption

用CNN + RNN组合来做。CNN用来获取图像的特征。

具体就是把CNN最后一层的 classifier 去掉，然后把fc的特征作为RNN的输入数据（有很多方式把 features 输入RNN中）。前向传播时的计算公式也随之发生了变化（这也只是很多种计算方式中的一种，可见视频33分钟处）。

一般只把CNN得到的 features plug into RNN的第一层，也可以 plug into 到它几层，但是效果会变差。
34：34出的sample没看懂。

就是把上一个的输出中最有可能正确的数据，作为紧接着的下一个的输入。输出y的维度就是 vocabulary 的大小+1（因为需要一个 token）
在训练的时候CNN的部分还需要继续训练吗？

可以训练，也可以不训练。训练的的话会更好，因为可以让CNN学习应该学习图片的哪些 features。

LSTM

让RNN更为复杂的一个方法是stack RNN，就是把一个RNN隐藏层的输出作为另一个RNN隐藏层的输入。
LSTM与RNN的不同在于更新公式更为复杂。另外，每个隐藏层都有两个参数（都是vector）：C与H，比RNN多了一个C（48：24）。
GRU

Lecture 11

Data Augumentation

通常的方法有以下几种：
1. Change all pixels
  - Transmision to gray image
  - Shift
  - Horizontal flipping(mirror image)
2. Random crops
  - Take a patch of image at a random scale and location (need to resize for training)
3. Color jitter
  - Change contrast
  - Complex method (9:28)
4. General theme
  - batch normalization
Very useful for small datasets, should be used at any time
通常在 training time 进行 Data Augumentation，因为如果把处理的图片存到 disk 中，会占用大量的 disk space。

Transfer Learning with CNNs

用训练好的网络提取待 training dataset 的 features，存入 disk，然后载入这些 features 训练自己的新模型。
- Small dataset: treat classifier as a fixed feature extractor.
  
  一般的做法： take away last layer, replace with liner calssifier with you care about, freeze other layerand retrain only that top layer (14:20)。就相当于在通过CNN得到的 feature 上直接训练一个 liner classifier（在这之前需要把 feature 存到 disk 上）。
- More data: 可以多训练最后几层。
training dataset 较少的时候可以少训练几层，较多的时候可以多训练几层。训练的时候可以把要训练的层同时训练，也可以先训练最后的 classifier 层（因为这里的参数是随机初始化的），当它收敛后，在训练前面的 intermediate 层。
底层得到的特征，比如边缘、颜色等可以直接用到许多 CV task 中。
如果数据比较多可以多训练几层
13分钟第一个问题没有听懂，训练时dump to disk是什么意思？

把训练好的 features（W、b等参数）存到磁盘中，然后利用这些 features 去训练
how to tackle convolutions: 35:50

All About Convolutions

前两个问题没看懂 :(
尽量用较小的filter进行stack的方式，不要用 size 较大的filter
- 参数数量少
- 非线性好
- 计算量少

Implementing Convolutions

im2col

可能会占用较多内存，但实际应用中还可以，且效率比较快。推荐。
FFT：Fast Fourier Transformation
- 实际使用中 doesn’t work well（对3 x 3的 convolution）
- stride 处理不好
Fast Algorithm
- Strasens’s Algorithm

Implementing Details

Use GPU、use existed library: TF、Tourch、Caffe 都可以玩一玩，当使用不懂时，看源码。

GPU - GPU communication is bottleneck (1:04:20有个 slide)
disk bottleneck (1:05:50有个 slide)

Lecture 13

Semantic Segmentation与Instance Segmentation一般都是分开做。

Semantic Segmentation

Don’t care about instance

取patch –> feed to CNN –> classify the center pixel of the patch。

对整个图片进行上面的操作。是一种比较 expensive operation，采用与object detection相类似的 trick：run FCNN–>get pixel heatmap（如果有downsampling，pad等操作，可能结果的图片尺寸会变小，需要额外的工作）。
Multi-scale
1. resize input image to different -> run fcnn -> upsample and stack them
2. merge similar pixels to form a segmentation trees
3. combine them together
Fully CNN
- 还是没看懂
大多还是conv与deconv的方式来做（ppt 79页）
Refinement
Upsampling

Instance Segmentation

distinguish different instance of same kind of objects.

与 RNN 很相似（30分处没看懂）。

Caseades：similar to Faster-RCNN
与Object Detection很相似。

Attention Models

comes from Machine Translation first
soft attention vs hard attention，which one is better？（49:17）
和后面的Spatial Transformer都没看懂

Lecture 14

Optical Flow
有个 tracker，视频课程中是15 frame
Spatio-Temporal ConvNets
- 3D VGGNet，
- 把 Optical Flow 当作原始数据跑，效果会更好。
Long-Time Spatio-Temporal ConvNets

在网络的某些地方使用RNN。
22：35处：summary so far
29:25: summary

Unsupervised Learning

Autoencoders

Encoder与Decoder有时会share weights（大概39分钟处），经常把Encoder作为Supervised Learning网络的初始化部分，但实际中的效果并不太好。

WatsonYang's Blog

Enrich yourself.

CS231n Notes 2

Lecture 9

Lecture 10

RNN

Image Caption

LSTM

Lecture 11

Data Augumentation

Transfer Learning with CNNs

All About Convolutions

Implementing Convolutions

Implementing Details

Lecture 13

Semantic Segmentation

Instance Segmentation

Attention Models

Lecture 14

Unsupervised Learning

Autoencoders

Comments