about 2 years ago

神經網路的數學較為複雜,跟著推導一遍後,以後也比較能掌握實做細節部份。主要是參考google brain的大神Denny Britz的部落格。如果有興趣強烈推薦看原版的,這裡是自己用弱弱的方式來重新理解Neural Network。

以下分成幾個部份

  • 問題描述
  • 神經網路架構與前向傳遞
  • Loss 函數對變數的微分
  • 三層的基本神經網路各自的微分式
  • 程式實做

1. 問題描述

假定有一筆資料點如下分佈


可以想像此資料集也許是病人的特徵參數,而紅色點與藍色點代表男生與女生。我們的目標是,建立一個分類器可以適度的透過病患特徵來區分男女。

2. 神經網路架構與前向傳遞


在網路中,定義 為輸入,預期輸出(經過softmax轉換後)的機率值,經過神經元的活化函數(可為sigmoid, tanh, ReLu),而輸入經過神經網的作用後,輸出是經過一系列的轉換。

3. Loss 函數的表示與其微分

Loss函數為預測機率值與真值的差異,可以利用交叉熵(cross entropy)表達。若為訓練的樣本數,為區分為幾類別,

此值愈表示預測的與真值差異愈大,資料的不確定性愈高。目標是使用梯度下降法來最小化上述。而常常面對到該如何對此函數對變量微分

其中為one hot encoding 僅有在時值為1,其他皆為零。這個特性會大量使用,簡化問題。

寫成矩陣式即為,

4. 三層的基本神經網路各自的微分式

利用3.的關係式,我們能針對最小化目標函數L來求得參數

5. 程式實做

建立輔助函數,包含函數與微分,函數,和繪圖輔助函數。

def sigmoid(z):
    return 1./(1+np.exp(-z))
def sigmoid_prime(z):
    return sigmoid(z)*(1-sigmoid(z))
def softmax(x):
    """
    Compute the softmax function for each row of the input x.
    """

    if len(x.shape) > 1:
        tmp = np.max(x, axis = 1)
        x -= tmp.reshape((x.shape[0], 1))
        x = np.exp(x)
        tmp = np.sum(x, axis = 1)
        x /= tmp.reshape((x.shape[0], 1))
    else:
        # x = np.exp(x) / np.sum(np.exp(x))

        tmp = np.max(x)
        x -= tmp
        x = np.exp(x)
        tmp = np.sum(x)
        x /= tmp
    return x
    
# Helper function to plot a decision boundary.

# If you don't fully understand this function don't worry, it just generates the contour plot below.

def plot_decision_boundary(pred_func):
    # Set min and max values and give it some padding

    x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
    y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
    h = 0.01
    # Generate a grid of points with distance h between them

    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    # Predict the function value for the whole gid

    Z = pred_func(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    # Plot the contour and training examples

    plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Spectral)

定義參數以供後續最佳化(Grident Descent)使用

num_examples = len(X) # training set size

nn_input_dim = 2 # input layer dimensionality

nn_output_dim = 2 # output layer dimensionality


# Gradient descent parameters (I picked these by hand)

epsilon = 0.01 # learning rate for gradient descent

reg_lambda = 0.01 # regularization strength

定義損失函數(Loss function)注意的是,這一項因為為one hot encode,導致這邊只有預測的項會列入誤差考慮,其餘為零。所以只取出算的機率值就好。

# Helper function to evaluate the total loss on the dataset

def calculate_loss(model):
    W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']
    # Forward propagation to calculate our predictions

    z1 = X.dot(W1) + b1
    a1 = sigmoid(z1)
    z2 = a1.dot(W2) + b2
    exp_scores = np.exp(z2)
    y_hat = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
    # Calculating the loss

    corect_logprobs = -np.log(y_hat[range(num_examples), y])
    data_loss = np.sum(corect_logprobs)
    # Add regulatization term to loss (optional)

    data_loss += reg_lambda/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)))
    return 1./num_examples * data_loss

預測函數,輸入任何一組的資料集能根據不同模型預測分類結果。

# Helper function to predict an output (0 or 1)

def predict(model, x):
    W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']
    # Forward propagation

    z1 = x.dot(W1) + b1
    a1 = sigmoid(z1)
    z2 = a1.dot(W2) + b2
    exp_scores = np.exp(z2)
    y_hat = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
    return np.argmax(y_hat, axis=1)

主程式為兩部份,計算向前傳播的預測值與後向傳播的修正值。

def build_nn(nn_hdim=50, num_passes=20000, print_loss=False):

    np.random.seed(0)
    
    print 'num_examples:%d'%num_examples
    
    W1 = np.random.randn(nn_input_dim, nn_hdim) / np.sqrt(nn_input_dim)
    b1 = np.random.randn(1, nn_hdim)
    W2 = np.random.randn(nn_hdim, nn_output_dim) / np.sqrt(nn_hdim)
    b2 = np.random.randn(1, nn_output_dim)

    for i in xrange(num_passes):
        # feedforward

        z1 = np.dot(X,W1) + b1 
        a1 = sigmoid(z1)
        z2 = np.dot(a1,W2) + b2
        y_hat = softmax(z2)

        # back-propagation

        delta3 = y_hat
        delta3[range(num_examples),y] -= 1 # y_hat - y

        delta2 = np.dot(delta3,W2.T)*sigmoid_prime(z1)

        dW2 = np.dot(a1.T,delta3) 
        db2 = np.sum(delta3, axis=0, keepdims=True) # 


        dW1 = np.dot(X.T,delta2)
        db1 = np.sum(delta2, axis=0, keepdims=True)
        
        # Add regularization terms (b1 and b2 don't have regularization terms)

        dW1 += reg_lambda * W1
        dW2 += reg_lambda * W2
        
        W1 += -epsilon * dW1
        b1 += -epsilon * db1
        W2 += -epsilon * dW2
        b2 += -epsilon * db2
        model = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}
        
        # Optionally print the loss.

        # This is expensive because it uses the whole dataset, so we don't want to do it too often.

        if print_loss and i % 1000 == 0:
            print "Loss after iteration %i: %f" %(i, calculate_loss(model))
    
    return model

訓練與預測分類結果,

# Build a model with a 3-dimensional hidden layer

model = build_nn(3,num_passes=10000, print_loss=True)

# Plot the decision boundary

plot_decision_boundary(lambda x: predict(model, x))
plt.title("Decision Boundary for hidden layer size 3")
fit = predict(model,X)==y
print 'Coorect predicted:{} in numbers of samples:{}'.format(sum(fit),num_examples)
num_examples:200
Loss after iteration 0: 0.766292
Loss after iteration 1000: 0.086076
Loss after iteration 2000: 0.079339
Loss after iteration 3000: 0.078535
Loss after iteration 4000: 0.078305
Loss after iteration 5000: 0.078222
Loss after iteration 6000: 0.078188
Loss after iteration 7000: 0.078173
Loss after iteration 8000: 0.078166
Loss after iteration 9000: 0.078162
Coorect predicted:195 in numbers of samples:200

← 文章摘要-TextRank算法 Effective Python程式筆記 →
 
comments powered by Disqus