almost 3 years ago

如果有一筆資料紀錄NBA聖安東尼奧的馬刺隊,球隊平均每日練習時間(小時/天),與比賽輸贏關係如下表。

日期 練習時間 比賽結果
1/1 2 W
1/3 3 L
1/5 7 W
1/6 10 W
1/8 4 L
1/9 1 L

只考慮練習時間對比賽結果的影響,這裡雖然是個連續型的變數。但是結果只有W(1)或L(0),如果我們假設機器學習學到的模型解釋成為機率,即在的條件下,贏球的機率(P)為何的話,就能使用線性迴歸的概念。

  1. 若模型是線性形式,
    $$
    \hat{ h_{\theta}} (x^i) = \theta_0 + \theta_1 x^i
    $$
    發現到由於不受規範,導致結果可以從,這結果不合理。

  2. 若為對數函數形式,
    $$
    \log \hat{ h_{\theta}} (x^i) = \theta_0 + \theta_1 x^i
    $$
    由於對數函數在負平面()沒有定義,所以這也不是好的假設。

  3. 若使用如下形式,
    $$
    \log \left( \frac{\hat{h_{\theta}} (x^i) }{ 1 - \hat{h_{\theta}}(x^i) } \right) = \theta_0 + \theta_1 x^i
    $$
    當寫成上述形式,上式趨近於,
    可以簡單整理得到
    $$
    \hat{h_{\theta}}(x^i) = \frac{1}{1 + e^{-\left( \theta_0 + \theta_1 x^i\right)}} \tag{1}
    $$

換句話說,當機器學習的機制,寫成如式子(1)。即可利用一堆資料,透過迭代的方法得到一組最佳係數。此係數可以最佳推估球隊贏球的機率。此函數稱為logistic函數,

如果我們的Cost function寫成和最小均方問題一樣,
$$
min_{\vec \theta} \left ( \sum_{i=1} ^n \frac{1}{2} \| y^i - \hat{h_{\theta}} (x^i) \|^2 \right)
$$
此時會因為是高度非線性函數,造成是non-convex,意味著此函數不會收斂到global minmium。
所以我們改寫成本函數如下(詳細參考Andrew.Ng機器學習講義影片),
$$
J_\theta = \sum_i^n \left[-y^i \log h_\theta(x^i) - (1-y^i) \log \left( 1- h_\theta(x^i) \right) \right]
$$
再使用Gradient Descent迭代法
$$
\theta_j \rightarrow \theta_j + \alpha \sum_{i} \left( y^i - \hat{h_\theta}(x^i) \right)x_j^i \tag{2}
$$
利用矩陣的概念整理成矩陣形式,
$$
\vec{\theta} \rightarrow \vec{\theta} + \alpha \tilde{x}^{T} \cdot \left( \vec{y}^{train} - \vec{h_\theta}(x)\right)
$$
令人驚訝的是,這個形式和線性迴歸的解為一樣的。只是有不同的長相而已。
改寫之前的函式為logistic function。

def update(self,mini_batch,alpha):
        '''update theta with respect to different mini_batch '''
        xmini,ymini = zip(*mini_batch)
        xmini = np.array(xmini)
        ymini = np.array([[e] for e in ymini])
        htheta = 1/(1 + np.exp(-np.dot(xmini,self.theta))) ## hypothesis (h) :logistic function

        self.theta = self.theta + alpha*np.dot(xmini.T,(ymini - htheta))

更新後的程式碼

## logistic regression


from __future__ import division
import random
import math
import copy
import numpy as np
import matplotlib.pyplot as plt

class LogisticRegression(object):

    def __init__(self,data):
        # theta = [theta_0,theta_1,...,theta_k]

        # data -> [([x00,x01,..,x0k],y0),([x10,x11,..,x1k],y1),...]

        xa = []    
        ## store data_tmp into a different memory

        data_tmp = copy.deepcopy(data)

        x,y = zip(*data_tmp)
        if type(x[0]) is not list:
            # check if e is a list, if not insert it into a list

            for e in x: 
                e = [e]
                e.insert(0,1)
                xa.append(e)
                x = xa
                data_tmp = zip(x,y)
            print "single varible x, add 1 then bind it to a list--> [1,x]"

        else:[e.insert(0,1) for e in x]
        # print "data={},data={}".format(data,data)

        xarray = np.array(x)
        yarray = np.array([[e] for e in y])
        self.data = np.array(data_tmp)
        self.x = xarray
        self.y = yarray
        # self.theta = np.array([[10],[-0.26],[0.27],[0.1]])

        self.theta = np.random.sample((len(xarray[0]),1))
        # self.htheta = 1/(1 + np.exp(-np.dot(self.x.T,self.theta))) ## hypothesis for logistic function


    
    def in_random_order(self):
        '''generator that returns the elements of data in random set'''
        indexes = [i for i,_ in enumerate(self.data)]
        random.shuffle(self.data)
        for i in indexes:
            yield self.data[i]

    def sgd(self,mini_batch_size,iter_no,tol,alpha):
        ''' 
        stochastic gradient descent
        
        update theta until iter > iter_no or error < tol
        mini_batch_size: choose the numbers of traing sets to update theta
        alpha: learning rate
        iter_no: iteration numbers
        tol: tolerence
        '''
        error = 1
        iterno = 0

        data_size = len(self.data)
        while iterno <iter_no and error>tol:
            theta_origin = self.theta
            self.in_random_order()
            mini_batches = [self.data[k:k+mini_batch_size] 
                        for k in range(0,data_size,mini_batch_size)]
            for mini_batch in mini_batches:                
                self.update(mini_batch,alpha) # return update theta

                
            error = math.sqrt(np.sum((self.theta - theta_origin)**2))
            print iterno,error
            iterno +=1


    def update(self,mini_batch,alpha):
        '''update theta with respect to different mini_batch '''
        xmini,ymini = zip(*mini_batch)
        xmini = np.array(xmini)
        ymini = np.array([[e] for e in ymini])
        htheta = 1/(1 + np.exp(-np.dot(xmini,self.theta))) ## hypothesis (h) :logistic function

        self.theta = self.theta + alpha*np.dot(xmini.T,(ymini - htheta))
        
        
    def predict(self,xdata):
        '''return the probablity of result'''
        htheta = 1/(1 + np.exp(-np.dot(xdata,self.theta)))
        return htheta

    def predict_bool(self,xdata):
        htheta = 1/(1 + np.exp(-np.dot(xdata,self.theta)))

        return [1 if e>0.5 else 0 for e in htheta]

    def goodness(self):
        mypred = self.predict_bool(self.x)
        correct = 0
        yreal_list = [e[0] for e in self.y]
        
        for i,e in enumerate(mypred):
            if e == yreal_list[i]:
                correct += 1

        return correct/len(self.y)
← 多變數線性迴歸(四)Lasso Regression Logistic Regression(二)應用案例 →
 
comments powered by Disqus