[個人筆記]: 2018

解決問題的方法不外乎三個Step

Model: 把問題Model成一Nerual Network，以窮舉functions(構成一個function set)
//i.e. 決定天賦
Goodness of function: 決定哪個Function是好的, 哪些是壞的
Pick the best function: Train出最好的Function

--------------------------
Step2中, 評量function好壞的方式有很多, 目地就是希望跟目標越像越好
假設目標是y', 而預測出來的答案是y, 而loss則代表預目標的差距
市面上有很多種評量loss的方式: Mean-Square-Error, Cross-Entropy, classification error

直接講結論:

分類問題(Training過程): use Cross-Entropy
回歸問題(Training過程): use MSE
驗證結果的指標(Training後的驗證): use classification error, 通嘗試validation / testing
*回歸=regression

下面這篇文章講的淺顯易懂, 說明何時使用哪一種評量作為loss的依據
http://jackon.me/posts/why-use-cross-entropy-error-for-loss-function/

而搜尋出一組function parameter 來最小畫loss的搜尋法, 通常用Gradient Descent
*Pick an initial value for w -- Random, RBM per-train (2006) 但後來證明其實沒有差很多

步驟:

初始畫一組weight, w; 得到一個Loss值, L
∂ L / ∂ w, 負的表示該方向使Loss變小
w' = w - Ƞ * ∂ L / ∂ w

Ƞ稱為learning rate: 表示一步要踏多大

於Neural Network更有效率的Gradient Descent演算法: Backpropagation:

教學: https://www.youtube.com/watch?v=ibJpTrp5mcE
網路上有很多toolkit可以達到

--------------------

Use Keras to Build a Network

model = Sequential() //宣告一個model
model.add(Dense(input_dim=28*28,units=500,activation='relu'))
//Dense: full connected, 也可以用convolution
//input_dim: input dimension is 28*28
//units: 500個neural
//activation: 每個nerual的activation function的名字; ex: 'softplus', 'softsign', 'sigmoid', 'tanh', 'hard_sigmoid', 'linear', ...
//用relu的話 => input記得要normalize到0~1; sigmoid的話就不用
model.add(Dense(units=500,activation='relu'))
model.add(Dense(units=10,activation='softmax'))

最終會定義一個如下圖的network

model.compile(loss='categorical crossentropy', optimizer='adam', metrics=['accurancy']) //configuration

//1) loss: for more loss function, please refer to https://keras.io/losses///2) optimizer: SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax, Nadam

model.fit(x_train, y_train, batch_size=100, epochs=20) //pick the best function

//Both x_train, y_train are numpy array

https://www.tensorflow.org/get_started/mnist/beginners

==========================

CNN 與 RNN

CNN擅長處理image, RNN擅長處理sequence

CNN(Covolution Neural Network)
RNN(Recurrent Neural Network): 讓Neural Network有記憶

Why CNN for Image
原因是因為Image有以下三種property

A neuron doesn't have to see the whole image to discover the pattern
(i.e. can connect to small region with less parameter)
比如影像中通常都只有一小部分是我們需要關注的對象, 用fully connected參數會太多
The same pattern appear in different regions
用同樣的參數(同一個neuron)就可以判斷不同位置的"鳥嘴", 不用依空間位置而訓練不同的"鳥嘴判斷器"
Sub-sampling the pixels will not change the object
縮小圖片, 圖片還是不變

Whole CNN

(Convolution => Max Pooling) repeat N times => Flattern => Fully Connected Feedforward network

Convolution解決前面的property 1, 2
Max Pooling解決前面的property 3

Convolution: 套filter; 其實就是套Filter下圖中6x6的image, 套3x3的filter, 此filter就是network"需要學"的參數,
假設已經學出來的話長這樣:
..其實就是在做Property2
數字大者表示原image中與此filter相似.

畫成Network, 就如同不同neural share同一組weigth(下圖同色)
Max Pooling: 切成數個子集合, 然後選最大者, 形成一張更小的image; 其實就是sub-sampling
Flattern: 把Max pooling的結果, 拉直成vector, 往下塞給下一階段的Fully connected network