2017-02-25

Chapter 10 Sequence Modeling: Recurrent and Recursive Nets

最近比较浮躁，书读不进去，总结也写不下去。虽然感觉有很多事情要做，却有种分不清主次的感觉。啊啊啊啊啊啊啊啊，什么时候才是个头啊！！！！！！

这一 Chapter 的很多内容都没有仔细的去看，所以这个总结里的内容也很少，会在后面慢慢的补充上来 -_-

首先，介绍了RNN的定义：

Recurrent neural networks or RNNs (Rumelhart et al., 1986a) are a family of neural networks for processing sequential data. specialized for processing a sequence of values x⁽¹⁾, . . . , x^(τ).

也就是在每一步都有新的输入，所以从某种角度它也算是一种Feedforward Networks。

另外，这里介绍了RNN的parameter sharing机制：Each member of the output is produced using the same update rule applied to the previous outputs. This recurrent formulation results in the sharing of parameters through a very deep computational graph.

Sharing Parameters is key to RNNs。书中举得 “I went to Nepal in 1999 ” and “In 1999, I went to Nepal ”的例子就是为了说明这个重要性。

2017-02-25

Chapter 9 Convolutional Networks

这一章可能是因为比较了解的原因，没有找到感觉需要强调的地方。

不知道为什么最近读这本书，突然产生了怀疑：到底应不应该花时间读这本书，用读这本书的时间去多读点论文是不是会更好？或者多多动手实现一下算法是不是会收获更多？

总感觉时间确实很紧张，有很多事情需要去做。却不知道应该把时间花在哪件事情上会更值得。

2017-02-25

Chapter 8 Optimization for Training Deep Models

非常抱歉，这一张基本上没怎么看，只是大致的浏览了一下。

原因是，这章基本上都是理论的介绍，感觉对做工程的可能没有太大帮助（如果你是研究这个的phD，而且你对这方面也不是很了解，还是很建议你好好读一下的），另外Neural Networks的 optimization 本身就像是个黑盒子。

这一部分感觉目前对我的帮助真的不大，打算是等到以后遇到相关问题后再回来复习吧。

This chapter focuses on one particular case of optimization: finding the parameters θ of a neural network that significantly reduce a cost function J(θ), which typically includes a performance measure evaluated on the entire training set as well as additional regularization terms。

8.1 How Learning Differs from Pure Optimization

前面有个小节就是关于optimization的，在那里也说了：ML只是利用 optimization 的方法来进行训练，它的终极目标是test error尽可能的小；而 optimization 是希望完全的拟合数据。

2017-02-25

Chapter 7 Regularization for Deep Learning

首先，介绍了regularization的定义：

any modification we make to a learning algorithm that is intended to reduce its generalization error but not its training error.

也就是说regularization的目的是：通过控制模型复杂度的方式，尽量减轻模型overfitting的程度，提高generalization能力。

regularization的大致可以分为以下几类：
- 作用对象/实现方式
  - put extra constraints on a machine learning model，such as adding restrictions on the parameter values。
  - add extra terms in the objective function that can be thought of as corresponding to a soft constraint on the parameter values。
- 原因

2017-02-24

Chapter 6 Deep Feedforward Networks

首先，介绍了Deep feedforward networks的概念：之所以称为networks，是因为它们是通过把不同的函数组合起来的形式表达的（typically represented by composing together many different functions）；feedforward是因为 no feedback connections in which outputs of the model are fed back into itself（）。

然后，介绍了output layer与hidden layer。training examples 决定了的output layer的行为：输出要接近目标值y。但是它并没有定义hidden layer的行为，不过 learning algorithm 则必须决定如何使用这些hidden layer来完成对函数的逼近。

接着，引入了非线性函数的概念（不是activation function，是指最终学出来的那个函数），指出得到非线性函数的方法有3种：

使用一个比较 generic 的函数，比如RBF kernel，但最终在 test dataset 上的效果非常不好（generalization to the test set often remains poor）。
使用 domain knowledge 人工指定，但是太费时费力。
算法自己学习，这种方法既可以保证第一种方法的 generic 的性质，又能保证第二种 domain knowledge 的优点（人为的设置一些biased）。

2017-02-23

Chapter 5 Machine Learning Basics

《Deep Learning》读书笔记

写这个笔记的目的有两个：一是以高层的角度把整个章节的内容联系起来，从而加深自己的理解，同时也可以供日后复习使用；二是在日后的组会中可能会降到的时候，有东西可以讲（好偷懒 -_-）。

因为是刚入门的新手，有很多东西还不了解，或者了解的不透彻，肯定会有错误和疏漏的地方，希望大家不吝赐教哈～～

5.1 Learning Algorithms

首先，提出了经典的ML的定义：

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P , if its performance at tasks in T , as measured by P , improves with experience E.

2017-02-14

CS231n Notes 2

CS231n课程笔记，记录自己看懂和没看懂的地方。这里是CS231n Notes 1。

Lecture 9

Lecture 10

RNN

输出层的输出只与当前的隐藏状态有关，而隐藏层可以认为是网络的记忆单元，一般只记忆前面若干步的隐藏状态。
RNN的权重是共享的，也就是输入-->隐层，隐层-->隐层，隐层-->输出的权重矩阵（分别是U、W 与 V），在每一层都是对同一个进行更新。
RNN并不是每一步都必须有输入与输出，但必须有隐层，用于捕捉序列的信息。
RNN训练用BPTT，但BPTT无法解决长时以来问题（即当前输出与前面一段较长的序列相关，可能会带来梯度消失）。

2017-01-19

Frequently-used Numpy Functions

np.argsort

np.array_split

np.argmax

np.argsort

np.bincount

np.flatten

np.flatnonzero

np.hstack

Python numpy函数hstack() vstack() stack() dstack() vsplit() concatenate()
np.linalg.norm

np.linalg

2017-01-18

CS231n Assignment

CS231n课程作业，记录自己弄懂和没弄懂的地方。第一个作业难度不大，感觉都是在练习numpy的使用（相关函数的总结可以看 Frequently-used-Numpy-functions），特别是矩阵运算（广播等）。

因为在代码中相应部分都有注释（基本就相当于在代码里直接进行讲解了），感觉这里没有必要写个博客详细讲解，所以这里只是随便记录一些感觉还算重要的东西，以便日后复习。

具体代码可以见github的 CS231n。

CS231n Assignment 1

Q1: k-Nearest Neighbor classifier

与大多数分类器一样，KNN分为两个步骤：train和predict。但不同的是，严格来说KNN没有train的过程（KNN的train过程就是“记住”所有的数据），在实际代码中通常按照下面的做（在 knn.ipynb 的第6个cell中也有提到）：

2017-01-16

Unique Binary Search Trees II

Unique Binary Search Trees II

Description:

Given n, generate all structurally unique BST’s (binary search trees) that store values 1…n.

Example:

Given n = 3, your program should return all 5 unique BST’s shown below.
1           3    3       2      1
 \         /    /       / \      \
  3      2     1       1   3      2
 /      /       \                  \
2     1          2                  3

WatsonYang's Blog

Enrich yourself.

Chapter 10 Sequence Modeling: Recurrent and Recursive Nets

Chapter 9 Convolutional Networks

Chapter 8 Optimization for Training Deep Models

8.1 How Learning Differs from Pure Optimization

Chapter 7 Regularization for Deep Learning

Chapter 6 Deep Feedforward Networks

Chapter 5 Machine Learning Basics

《Deep Learning》读书笔记

5.1 Learning Algorithms

CS231n Notes 2

Lecture 9

Lecture 10

RNN

Frequently-used Numpy Functions

CS231n Assignment

CS231n Assignment 1

Q1: k-Nearest Neighbor classifier

Unique Binary Search Trees II

8.1 How Learning Differs from Pure Optimization

《Deep Learning》 读书笔记

5.1 Learning Algorithms

Lecture 9

Lecture 10

RNN

CS231n Assignment 1

Q1: k-Nearest Neighbor classifier

《Deep Learning》读书笔记