JINWOOJUNG

[EECS 498] Assignment 3. Fully Connected Networks...(1) 본문

딥러닝/Michigan EECS 498

[EECS 498] Assignment 3. Fully Connected Networks...(1)

Jinu_01 2025. 1. 6. 23:58
728x90
반응형

본 포스팅은 Michigan Univ.의 EECS 498 강의를 수강하면서 공부한 내용을 정리하는 포스팅입니다.


https://jinwoo-jung.tistory.com/127

 

[EECS 498] Assignment 2. Two Layer Neural Network...(2)

본 포스팅은 Michigan Univ.의 EECS 498 강의를 수강하면서 공부한 내용을 정리하는 포스팅입니다.https://jinwoo-jung.tistory.com/126 [EECS 498] Assignment 2. Two Layer Neural Network...(1)본 포스팅은 Michigan Univ.의 EECS 4

jinwoo-jung.com


지난 과제에서 구현한 Two Layer Network는 Loss, Gradient, Forward pass 등의 연산이 각 Layer에 대해서 구현하였기에 Module화를 할 수 없어 여러개의 Layer과 서로 다른 크기의 Hidden Size가 주어질 경우 적용할 수 없고 일일이 구현하면 매우 비효율적이다. 따라서 Modulation을 통해 Neaural Network를 구현 해 보자.

 

각각의 구현 뒤에는 유효성을 검증하기 위해 지난 과제에서 부터 사용한 수치적 Gradient와 직접 구현한 analytically computed gradient(분석적으로 계산된 기울기)를 비교하여 Error 수치를 비교 할 것이다.

 

Linear Layer

Linear Layer : forward

def forward(x, w, b):
    """
    Computes the forward pass for an linear (fully-connected) layer.
    The input x has shape (N, d_1, ..., d_k) and contains a minibatch of N
    examples, where each example x[i] has shape (d_1, ..., d_k). We will
    reshape each input into a vector of dimension D = d_1 * ... * d_k, and
    then transform it to an output vector of dimension M.
    Inputs:
    - x: A tensor containing input data, of shape (N, d_1, ..., d_k)
    - w: A tensor of weights, of shape (D, M)
    - b: A tensor of biases, of shape (M,)
    Returns a tuple of:
    - out: output, of shape (N, M)
    - cache: (x, w, b)
    """
    out = None
    ######################################################################
    # TODO: Implement the linear forward pass. Store the result in out.  #
    # You will need to reshape the input into rows.                      #
    ######################################################################
    # y = wx + b
    out = x.reshape(x.shape[0],-1).mm(w)+b
    ######################################################################
    #                        END OF YOUR CODE                            #
    ######################################################################
    cache = (x, w, b)
    return out, cache

 

Linear Layer이므로, Forward의 경우 $y = wx + b$로 계산된다. 이때, Input $x$의 차원을 Weight $w$와의 Matrix Multiply를 위해 차원을 맞춰준다. $D = d_1 * \cdots * d_k$이기 때문에, torch.reshape를 이용해 맞춰준다.

 

Linear Layer : backward

def backward(dout, cache):
    """
    Computes the backward pass for an linear layer.
    Inputs:
    - dout: Upstream derivative, of shape (N, M)
    - cache: Tuple of:
      - x: Input data, of shape (N, d_1, ... d_k)
      - w: Weights, of shape (D, M)
      - b: Biases, of shape (M,)
    Returns a tuple of:
    - dx: Gradient with respect to x, of shape
      (N, d1, ..., d_k)
    - dw: Gradient with respect to w, of shape (D, M)
    - db: Gradient with respect to b, of shape (M,)
    """
    x, w, b = cache
    dx, dw, db = None, None, None
    ##################################################
    # TODO: Implement the linear backward pass.      #
    ##################################################
    # Linear Layer : y = wx+b
    db = dout.sum(axis=0)
    dw = x.reshape(x.shape[0], -1).t().mm(dout)
    dx = dout.mm(w.t()).view(x.shape)
    ##################################################
    #                END OF YOUR CODE                #
    ##################################################
    return dx, dw, db

 

현재 Linear Layer이기 때문에, db는 Local Gradient 가 1이므로, Upstream Gradient(dout)을 각 Bias에 맞춰 그데로 전달하면 된다. dw의 경우 Local Gradient는 Input $x$이기 때문에, $x$와 Upstream Gradient의 Matrix Multiply로 계산할 수 있다. dx는 반대로 Local Gradient가 Weight $w$이기 때문에, $w$와 Upstream Gradient의 Matrix Multiply로 계산할 수 있다. 

 

각 연산 과정에서의 차원 및 크기를 고려하면 Matrix Multiply를 쉽게 구현할 수 있다. 

 

ReLU activation

ReLU activation : forward

def forward(x):
    """
    Computes the forward pass for a layer of rectified
    linear units (ReLUs).
    Input:
    - x: Input; a tensor of any shape
    Returns a tuple of:
    - out: Output, a tensor of the same shape as x
    - cache: x
    """
    out = None
    ###################################################
    # TODO: Implement the ReLU forward pass.          #
    # You should not change the input tensor with an  #
    # in-place operation.                             #
    ###################################################
    out = x.clone()
    out[out<0] = 0
    ###################################################
    #                 END OF YOUR CODE                #
    ###################################################
    cache = x
    return out, cache

 

ReLU의 경우 forward는 ReLU의 Input이 양수인 경우만 그대로 활성화 된다. 따라서, Input이 음수인 mask를 생성한 뒤, 해당 mask의 값을 0으로 설정하여 forward를 구현할 수 있다. 

 

def backward(dout, cache):
        """
        Computes the backward pass for a layer of rectified
        linear units (ReLUs).
        Input:
        - dout: Upstream derivatives, of any shape
        - cache: Input x, of same shape as dout
        Returns:
        - dx: Gradient with respect to x
        """
        dx, x = None, cache
        #####################################################
        # TODO: Implement the ReLU backward pass.           #
        # You should not change the input tensor with an    #
        # in-place operation.                               #
        #####################################################
        # x<0 -> dx = 0 else dx = 1
        dx = dout.clone()
        dx[x<0] = 0
        #####################################################
        #                  END OF YOUR CODE                 #
        #####################################################
        return dx

 

ReLU 함수에 대한 미분은 1이다. 따라서 Upstream Gradient(dout)을 그데로 넘겨주면 된다. 이때, ReLU의 값이 존재하는 경우는 Input $x$가 양수인 경우이므로, x<0인 마스크를 생성한 뒤, Upstream Gradient에 대하여 mask에 해당하는 값을 0으로 할당하여 backward를 구현할 수 있다. 

 

"Sandwich" layers

일반적으로 Linear Layer 뒤에는 ReLU와 같은 Activation function이 함께 붙어온다. 이러한 일반적인 패턴에 대한 forward, backward를 구현 해 보자. 

 

Linear_ReLU

class Linear_ReLU(object):

    @staticmethod
    def forward(x, w, b):
        """
        Convenience layer that performs an linear transform
        followed by a ReLU.

        Inputs:
        - x: Input to the linear layer
        - w, b: Weights for the linear layer
        Returns a tuple of:
        - out: Output from the ReLU
        - cache: Object to give to the backward pass
        """
        a, fc_cache = Linear.forward(x, w, b)
        out, relu_cache = ReLU.forward(a)
        cache = (fc_cache, relu_cache)
        return out, cache

    @staticmethod
    def backward(dout, cache):
        """
        Backward pass for the linear-relu convenience layer
        """
        fc_cache, relu_cache = cache
        da = ReLU.backward(dout, relu_cache)
        dx, dw, db = Linear.backward(da, fc_cache)
        return dx, dw, db

 

이는 단순히 앞서 구현한 Linear, ReLU Class의 forward, backward를 통해 구현할 수 있다. 

 

Two-layer network

앞서 구현한 Modular를 이용하여 Two Layer Network를 구현 해 보자. 

def __init__(self, input_dim=3*32*32, hidden_dim=100, num_classes=10,
                 weight_scale=1e-3, reg=0.0,
                 dtype=torch.float32, device='cpu'):
        """
        Initialize a new network.
        Inputs:
        - input_dim: An integer giving the size of the input
        - hidden_dim: An integer giving the size of the hidden layer
        - num_classes: An integer giving the number of classes to classify
        - weight_scale: Scalar giving the standard deviation for random
          initialization of the weights.
        - reg: Scalar giving L2 regularization strength.
        - dtype: A torch data type object; all computations will be
          performed using this datatype. float is faster but less accurate,
          so you should use double for numeric gradient checking.
        - device: device to use for computation. 'cpu' or 'cuda'
        """
        self.params = {}
        self.reg = reg

        ###################################################################
        # TODO: Initialize the weights and biases of the two-layer net.   #
        # Weights should be initialized from a Gaussian centered at       #
        # 0.0 with standard deviation equal to weight_scale, and biases   #
        # should be initialized to zero. All weights and biases should    #
        # be stored in the dictionary self.params, with first layer       #
        # weights and biases using the keys 'W1' and 'b1' and second layer#
        # weights and biases using the keys 'W2' and 'b2'.                #
        ###################################################################

        self.params['W1'] = torch.zeros(input_dim, hidden_dim, dtype=dtype,device = device)
        self.params['W1'] += weight_scale*torch.randn(input_dim, hidden_dim, dtype=dtype,device= device)
        self.params['b1'] = torch.zeros(hidden_dim, dtype = dtype, device = device)

        self.params['W2'] = torch.zeros(hidden_dim, num_classes, dtype=dtype,device = device)
        self.params['W2'] += weight_scale*torch.randn(hidden_dim, num_classes, dtype=dtype,device= device)
        self.params['b2'] = torch.zeros(num_classes, dtype = dtype, device = device)

        ###############################################################
        #                            END OF YOUR CODE                 #
        ###############################################################

 

Weight, Bias는 각각의 크기에 맞게 초기화 시켜 줬으며, Weight의 경우 평균이 0이고 표준 편차가 weight_scale이 되도록 초기화 하였으며, Bias는 0으로 초기화 하였다. 각각의 dtype, device는 TwoLayerNet Class 객체 생성 시 입력받은 값으로 할당 하였다.

 

def loss(self, X, y=None):
        """
        Compute loss and gradient for a minibatch of data.

        Inputs:
        - X: Tensor of input data of shape (N, d_1, ..., d_k)
        - y: int64 Tensor of labels, of shape (N,). y[i] gives the
          label for X[i].

        Returns:
        If y is None, then run a test-time forward pass of the model
        and return:
        - scores: Tensor of shape (N, C) giving classification scores,
          where scores[i, c] is the classification score for X[i]
          and class c.
        If y is not None, then run a training-time forward and backward
        pass and return a tuple of:
        - loss: Scalar value giving the loss
        - grads: Dictionary with the same keys as self.params, mapping
          parameter names to gradients of the loss with respect to
          those parameters.
        """
        scores = None
        #############################################################
        # TODO: Implement the forward pass for the two-layer net,   #
        # computing the class scores for X and storing them in the  #
        # scores variable.                                          #
        #############################################################
        # 첫번째 Layer는 ReLU를 거침
        out_LR, cache_LR = Linear_ReLU.forward(X,self.params['W1'],self.params['b1'])
        # 두번재 Layer는 그냥 Linear Layer
        scores, cache_L =  Linear.forward(out_LR, self.params['W2'],self.params['b2'])
        ##############################################################
        #                     END OF YOUR CODE                       #
        ##############################################################

        # If y is None then we are in test mode so just return scores
        if y is None:
            return scores

        loss, grads = 0, {}
        ###################################################################
        # TODO: Implement the backward pass for the two-layer net.        #
        # Store the loss in the loss variable and gradients in the grads  #
        # dictionary. Compute data loss using softmax, and make sure that #
        # grads[k] holds the gradients for self.params[k]. Don't forget   #
        # to add L2 regularization!                                       #
        #                                                                 #
        # NOTE: To ensure that your implementation matches ours and       #
        # you pass the automated tests, make sure that your L2            #
        # regularization does not include a factor of 0.5.                #
        ###################################################################

        # Loss 계산
        loss, dout = softmax_loss(scores, y)
        loss += ((self.params['W1']**2).sum() + (self.params['W2']**2).sum())*self.reg    # Regularization Term

        # Backpropagation
        # Linear.backward(Upstream Gradient, cache)
        dx, dw, db = Linear.backward(dout, cache_L)
        grads['W2'] = dw + 2*self.reg*self.params['W2']
        grads['b2'] = db

        # 2번째 Layer에서의 x가 첫번재 Layer에서의 Output
        dx, dw, db = Linear_ReLU.backward(dx, cache_LR)
        grads['W1'] = dw + 2*self.reg*self.params['W1']
        grads['b1'] = db

        ###################################################################
        #                     END OF YOUR CODE                            #
        ###################################################################

        return loss, grads

 

앞서 Modular 방식으로 구현한 forward, backward를 활용하여 Two-layer network의 forward와 backward를 계산하였다.

 

    #############################################################
    # TODO: Implement the forward pass for the two-layer net,   #
    # computing the class scores for X and storing them in the  #
    # scores variable.                                          #
    #############################################################
    # 첫번째 Layer는 ReLU를 거침
    out_LR, cache_LR = Linear_ReLU.forward(X,self.params['W1'],self.params['b1'])
    # 두번재 Layer는 그냥 Linear Layer
    scores, cache_L =  Linear.forward(out_LR, self.params['W2'],self.params['b2'])
    ##############################################################
    #                     END OF YOUR CODE                       #
    ##############################################################

 

첫번째 Layer는 ReLU를 거치므로 Linear_ReLU를, 두번째 Layer는 ReLU를 거치지 않으므로 Linear 클래스의 forward를 사용하였다. 

 

    ###################################################################
    # TODO: Implement the backward pass for the two-layer net.        #
    # Store the loss in the loss variable and gradients in the grads  #
    # dictionary. Compute data loss using softmax, and make sure that #
    # grads[k] holds the gradients for self.params[k]. Don't forget   #
    # to add L2 regularization!                                       #
    #                                                                 #
    # NOTE: To ensure that your implementation matches ours and       #
    # you pass the automated tests, make sure that your L2            #
    # regularization does not include a factor of 0.5.                #
    ###################################################################

    # Loss 계산
    loss, dout = softmax_loss(scores, y)
    loss += ((self.params['W1']**2).sum() + (self.params['W2']**2).sum())*self.reg    # Regularization Term

    # Backpropagation
    # Linear.backward(Upstream Gradient, cache)
    dx, dw, db = Linear.backward(dout, cache_L)
    grads['W2'] = dw + 2*self.reg*self.params['W2']
    grads['b2'] = db

    # 2번째 Layer에서의 x가 첫번재 Layer에서의 Output
    dx, dw, db = Linear_ReLU.backward(dx, cache_LR)
    grads['W1'] = dw + 2*self.reg*self.params['W1']
    grads['b1'] = db

    ###################################################################
    #                     END OF YOUR CODE                            #
    ###################################################################

 

forward 과정에서 계산된 scores를 바탕으로 softmax_loss를 계산하였다. 이후 Regularization Term을 고려하여 최종 Loss를 계산한다.

 

Gradient의 경우 Backpropagation을 통해 계산하며, Linear, Linear_ReLU 클래스의 backward의 두 Input은 Upstream Gradient와 각각의 Value가 포함된 cache 이므로, forward 과정에서 계산된 값을 활용하여 Gradient를 계산한다. 이때, Weight의 Gradient의 경우 Regularization Term을 고려한다. 

 

Solver

Solver Class는 모델의 훈련과 평가 과정을 담당하는 도구로, 모델의 훈련 로직을 별도로 분리하여 더 모듈화 된 코드를 작성할 수 있도록 설계된 클래스이다. Solver는 신경망 모델을 학습하면서 Loss와 Accuracy를 기록하여 분석할 수 있도록 한다.

def create_solver_instance(data_dict, dtype, device):
    model = TwoLayerNet(hidden_dim=200, dtype=dtype, device=device)
    #############################################################
    # TODO: Use a Solver instance to train a TwoLayerNet that   #
    # achieves at least 50% accuracy on the validation set.     #
    #############################################################
    solver = None

    solver = Solver(model,data_dict,device = device,num_epochs = 100)

    ##############################################################
    #                    END OF YOUR CODE                        #
    ##############################################################
    return solver

 

create_solver_instance는 Solver 객체를 생성하는 함수로, TwoLayerNet를 모델로 하고, 입력받은 data_dict를 기반으로 학습하며, device는 입력 받은 device, num_epochs는 100으로 설정한 Solver 객체를 반환하는 함수이다. 

 

from fully_connected_networks import create_solver_instance

reset_seed(0)

# Create a solver instance that achieves 50% performance on the validation set
solver = create_solver_instance(data_dict=data_dict, dtype=torch.float64, device='cuda')
solver.train()

(Time 0.04 sec; Iteration 1 / 40000) loss: 2.302587
(Epoch 0 / 100) train acc: 0.111000; val_acc: 0.104500
(Time 0.44 sec; Iteration 11 / 40000) loss: 2.302548
(Time 0.49 sec; Iteration 21 / 40000) loss: 2.302528
(Time 0.54 sec; Iteration 31 / 40000) loss: 2.302652
(Time 0.58 sec; Iteration 41 / 40000) loss: 2.302491
(Time 0.63 sec; Iteration 51 / 40000) loss: 2.302103
(Time 0.69 sec; Iteration 61 / 40000) loss: 2.302732
(Time 0.74 sec; Iteration 71 / 40000) loss: 2.302635
(Time 0.79 sec; Iteration 81 / 40000) loss: 2.302163
(Time 0.84 sec; Iteration 91 / 40000) loss: 2.302191
(Time 0.90 sec; Iteration 101 / 40000) loss: 2.302482
...
(Time 196.39 sec; Iteration 39961 / 40000) loss: 0.886226
(Time 196.44 sec; Iteration 39971 / 40000) loss: 0.852981
(Time 196.49 sec; Iteration 39981 / 40000) loss: 1.122238
(Time 196.55 sec; Iteration 39991 / 40000) loss: 1.135639
(Epoch 100 / 100) train acc: 0.665000; val_acc: 0.532300

 

Solver 객체로 학습하면 각각의 학습 과정에서의 Loss와 Validation Accuracy를 기록할 수 있다. 

 

728x90
반응형