over 2 years ago

在線性迴歸的問題中,常常影響結果的變因不只一種,而是變因彼此的線性相加。

$$
h_{\theta}(x^i) = \theta_0 + \theta_1 x^i_1 + \dots + \theta_k x^i_k = \sum_{j=0}^{k} x_{ij} \theta_j \tag{1}
$$

當中 訓練樣本(training set),係數
和之前一樣我們能定義成本函數

$$
C = \frac{1}{2} \sum_i | y^i - h_\theta(x^i)|^2 \tag{2}
$$

針對成本函數(2)作最小化,必有解析解。但此處我們沿用mini-batch GD的想法計算,迭代
$$
\theta_j \rightarrow \theta_j + \alpha \sum_{i=1}^m \left( y^i - h_\theta(x^i) \right) x_{ij} =\theta_j + \alpha \sum_{i=1}^m x_{ji} \left( y^i - \sum_{j'=0}^k x_{ij'}\theta_{j'}\right)
$$

注意上式其實即等同於向量與矩陣有如下關係,

$$
\vec \theta \rightarrow \vec \theta+ \alpha \tilde{x}^T \cdot \left( \vec{y}^{train} - \tilde{x}\cdot \vec{\theta} \right) \tag{3}
$$
如果使用mini-batch迭代形式變成,
$$
\vec \theta \rightarrow \vec \theta+ \alpha \tilde{x}^T_{mini} \cdot \left( \vec{y_{mini}}^{train} - \tilde{x_{mini}}\cdot \vec{\theta} \right)
$$

我們注意到當變數的維度增加時,接近的誤差狀態,卻能有完全不同的權重係數


上圖是針對三筆資料training_sets = [[(2,3,5),10], [(10,53,2),22], [(21,0.5,0.33),4]]所作的學習結果

自己寫的SGD(n=1) :函式庫(LinearRegression):
係數 [0.148245,0.31930,1.76998] [-0.265,0.2857,0.05968]
截距 0.16059 9.37137

ps. 具體原因為何,與如何分析變數的選擇是否恰當,須再深入了解。


程式碼

# my own mini-batch gradient descent 

import random
import math
import copy
import numpy as np
import matplotlib.pyplot as plt

class GradientDescent(object):

    def __init__(self,data):
        # theta = [theta_0,theta_1,...,theta_k]

        # data -> [([x00,x01,..,x0k],y0),([x10,x11,..,x1k],y1),...]

        xa = []    
        ## store data_tmp into a different memory

        data_tmp = copy.deepcopy(data)

        x,y = zip(*data_tmp)
        if type(x[0]) is not list:
            # check if e is a list, if not insert it into a list

            for e in x: 
                e = [e]
                e.insert(0,1)
                xa.append(e)
                x = xa
                data_tmp = zip(x,y)
            print "single varible x, add 1 then bind it to a list--> [1,x]"

        else:[e.insert(0,1) for e in x]
        # print "data={},data={}".format(data,data)

        xarray = np.array(x)
        yarray = np.array([[e] for e in y])
        self.data = np.array(data_tmp)
        self.x = xarray
        self.y = yarray
        # self.theta = np.array([[10],[-0.26],[0.27],[0.1]])

        self.theta = np.random.sample((len(xarray[0]),1))

    def in_random_order(self):
        '''generator that returns the elements of data in random set'''
        indexes = [i for i,_ in enumerate(self.data)]
        random.shuffle(self.data)
        for i in indexes:
            yield self.data[i]

    def fit(self,mini_batch_size,iter_no,tol,alpha):
        ''' update theta until iter > iter_no or error < tol
        mini_batch_size: choose the numbers of traing sets to update theta
        alpha: learning rate
        iter_no: iteration numbers
        tol: tolerence
        '''
        error = 1
        iterno = 0

        data_size = len(self.data)
        while iterno <iter_no and error>tol:
            theta_origin = self.theta
            self.in_random_order()
            mini_batches = [self.data[k:k+mini_batch_size] 
                        for k in range(0,data_size,mini_batch_size)]
            for mini_batch in mini_batches:                
                self.update(mini_batch,alpha) # return update theta                

            error = math.sqrt(np.sum((self.theta - theta_origin)**2))
            print iterno,error
            iterno +=1


    def update(self,mini_batch,alpha):
        '''update theta with respect to different mini_batch '''
        xmini,ymini = zip(*mini_batch)
        xmini = np.array(xmini)
        ymini = np.array([[e] for e in ymini])

        self.theta = self.theta + alpha*\
                        np.dot(xmini.T,ymini -np.dot(xmini,self.theta))
        
    def predict(self,xdata):

        return np.dot(xdata,self.theta)

← 簡單線性迴歸-Simple Linear Regression 多變數線性迴歸(二)解析解 →
 
comments powered by Disqus