博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
DeepLearning.ai作业:(5-1)-- 循环神经网络(Recurrent Neural Networks)(1)
阅读量:4098 次
发布时间:2019-05-25

本文共 23770 字,大约阅读时间需要 79 分钟。


title: ‘DeepLearning.ai作业:(5-1)-- 循环神经网络(Recurrent Neural Networks)(1)’

id: dl-ai-5-1h1
tags:

  • homework
    categories:
  • AI
  • Deep Learning
    date: 2018-10-18 10:26:56

本周作业分为三部分:

  • 手动建一个RNN模型
  • 搭建一个字符级的语言模型来生成恐龙的名字
  • 用LSTM生成爵士乐

Part1:Building a recurrent neural network - step by step

来构建一个RNN的神经网络。

1 - Forward propagation for the basic Recurrent Neural Network

先来进行前向传播的构建,要构建这个网络,先构建每个RNN的传播单元:

RNN cell

  1. Compute the hidden state with tanh activation: a ⟨ t ⟩ = tanh ⁡ ( W a a a ⟨ t − 1 ⟩ + W a x x ⟨ t ⟩ + b a ) a^{\langle t \rangle} = \tanh(W_{aa} a^{\langle t-1 \rangle} + W_{ax} x^{\langle t \rangle} + b_a) at=tanh(Waaat1+Waxxt+ba)
  2. Using your new hidden state a ⟨ t ⟩ a^{\langle t \rangle} at, compute the prediction y ^ ⟨ t ⟩ = s o f t m a x ( W y a a ⟨ t ⟩ + b y ) \hat{y}^{\langle t \rangle} = softmax(W_{ya} a^{\langle t \rangle} + b_y) y^t=softmax(Wyaat+by). We provided you a function: softmax.
  3. Store ( a ⟨ t ⟩ , a ⟨ t − 1 ⟩ , x ⟨ t ⟩ , p a r a m e t e r s ) (a^{\langle t \rangle}, a^{\langle t-1 \rangle}, x^{\langle t \rangle}, parameters) (at,at1,xt,parameters) in cache
  4. Return a ⟨ t ⟩ a^{\langle t \rangle} at , y ⟨ t ⟩ y^{\langle t \rangle} yt and cache

We will vectorize over m m m examples. Thus, x ⟨ t ⟩ x^{\langle t \rangle} xt will have dimension ( n x , m ) (n_x,m) (nx,m), and a ⟨ t ⟩ a^{\langle t \rangle} at will have dimension ( n a , m ) (n_a,m) (na,m).

# GRADED FUNCTION: rnn_cell_forwarddef rnn_cell_forward(xt, a_prev, parameters):    """    Implements a single forward step of the RNN-cell as described in Figure (2)    Arguments:    xt -- your input data at timestep "t", numpy array of shape (n_x, m).    a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)    parameters -- python dictionary containing:                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)                        ba --  Bias, numpy array of shape (n_a, 1)                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)    Returns:    a_next -- next hidden state, of shape (n_a, m)    yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)    cache -- tuple of values needed for the backward pass, contains (a_next, a_prev, xt, parameters)    """        # Retrieve parameters from "parameters"    Wax = parameters["Wax"]    Waa = parameters["Waa"]    Wya = parameters["Wya"]    ba = parameters["ba"]    by = parameters["by"]        ### START CODE HERE ### (≈2 lines)    # compute next activation state using the formula given above    a_next = np.tanh(np.dot(Waa, a_prev) + np.dot(Wax, xt) + ba)    # compute output of the current cell using the formula given above    yt_pred = softmax(np.dot(Wya, a_next) + by)        ### END CODE HERE ###        # store values you need for backward propagation in cache    cache = (a_next, a_prev, xt, parameters)        return a_next, yt_pred, cache

RNN forward pass

思路是:

  • 先把 a ,y_pred置为0
  • 然后初始化a_next = a0
  • 然后经过Tx个循环,求得每一步的a和y以及cache
# GRADED FUNCTION: rnn_forwarddef rnn_forward(x, a0, parameters):    """    Implement the forward propagation of the recurrent neural network described in Figure (3).    Arguments:    x -- Input data for every time-step, of shape (n_x, m, T_x).    a0 -- Initial hidden state, of shape (n_a, m)    parameters -- python dictionary containing:                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)                        ba --  Bias numpy array of shape (n_a, 1)                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)    Returns:    a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)    y_pred -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)    caches -- tuple of values needed for the backward pass, contains (list of caches, x)    """        # Initialize "caches" which will contain the list of all caches    caches = []        # Retrieve dimensions from shapes of x and parameters["Wya"]    n_x, m, T_x = x.shape    n_y, n_a = parameters["Wya"].shape        ### START CODE HERE ###        # initialize "a" and "y" with zeros (≈2 lines)    a = np.zeros((n_a, m, T_x))    y_pred = np.zeros((n_y, m, T_x))        # Initialize a_next (≈1 line)    a_next = a0        # loop over all time-steps    for t in range(T_x):        # Update next hidden state, compute the prediction, get the cache (≈1 line)        a_next, yt_pred, cache = rnn_cell_forward(x[:, :, t], a_next, parameters)        # Save the value of the new "next" hidden state in a (≈1 line)        a[:,:,t] = a_next        # Save the value of the prediction in y (≈1 line)        y_pred[:,:,t] = yt_pred        # Append "cache" to "caches" (≈1 line)        caches.append(cache)            ### END CODE HERE ###        # store values needed for backward propagation in cache    caches = (caches, x)        return a, y_pred, caches

2 - Long Short-Term Memory (LSTM) network

接下来构建一个LSTM的网络

遗忘门:

假设我们正在阅读一段文字中的单词,并且希望使用LSTM来跟踪语法结构,例如主语是单数还是复数。 如果主语从单个单词变成复数单词,我们需要找到一种方法来摆脱先前存储的单数/复数状态的记忆值。

在LSTM中,遗忘门让我们做到这一点:

Γ f ⟨ t ⟩ = σ ( W f [ a ⟨ t − 1 ⟩ , x ⟨ t ⟩ ] + b f ) \Gamma_f^{\langle t \rangle} = \sigma(W_f[a^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_f) Γft=σ(Wf[at1,xt]+bf)

更新门:

一旦我们忘记所讨论的主题是单数的,我们需要找到一种方法来更新它,以反映新主题现在是复数。

Γ u ⟨ t ⟩ = σ ( W u [ a ⟨ t − 1 ⟩ , x { t } ] + b u ) \Gamma_u^{\langle t \rangle} = \sigma(W_u[a^{\langle t-1 \rangle}, x^{\{t\}}] + b_u) Γut=σ(Wu[at1,x{

t}]+bu)

所以两个门结合起来可以更新单元值:

c ~ ⟨ t ⟩ = tanh ⁡ ( W c [ a ⟨ t − 1 ⟩ , x ⟨ t ⟩ ] + b c ) \tilde{c}^{\langle t \rangle} = \tanh(W_c[a^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_c) c~t=tanh(Wc[at1,xt]+bc)

KaTeX parse error: Expected '}', got '\>' at position 7: c^{<t\̲>̲} = \Gamma_f^{<…

输出门:

为了决定输出,我们将使用以下两个公式:

Γ o ⟨ t ⟩ = σ ( W o [ a ⟨ t − 1 ⟩ , x ⟨ t ⟩ ] + b o ) \Gamma_o^{\langle t \rangle}= \sigma(W_o[a^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_o) Γot=σ(Wo[at1,xt]+bo)

a ⟨ t ⟩ = Γ o ⟨ t ⟩ ∗ tanh ⁡ ( c ⟨ t ⟩ ) a^{\langle t \rangle} = \Gamma_o^{\langle t \rangle}* \tanh(c^{\langle t \rangle}) at=Γottanh(ct)

LSTM 单元

  • 先将 a ⟨ t − 1 ⟩ a^{\langle t-1 \rangle} at1 and x ⟨ t ⟩ x^{\langle t \rangle} xt连接在一起变成 c o n c a t = [ a ⟨ t − 1 ⟩ x ⟨ t ⟩ ] concat = \begin{bmatrix} a^{\langle t-1 \rangle} \\ x^{\langle t \rangle} \end{bmatrix} concat=[at1xt]
  • 计算以上的6个公式
  • 然后预测输出y
# GRADED FUNCTION: lstm_cell_forwarddef lstm_cell_forward(xt, a_prev, c_prev, parameters):    """    Implement a single forward step of the LSTM-cell as described in Figure (4)    Arguments:    xt -- your input data at timestep "t", numpy array of shape (n_x, m).    a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)    c_prev -- Memory state at timestep "t-1", numpy array of shape (n_a, m)    parameters -- python dictionary containing:                        Wf -- Weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)                        bf -- Bias of the forget gate, numpy array of shape (n_a, 1)                        Wi -- Weight matrix of the update gate, numpy array of shape (n_a, n_a + n_x)                        bi -- Bias of the update gate, numpy array of shape (n_a, 1)                        Wc -- Weight matrix of the first "tanh", numpy array of shape (n_a, n_a + n_x)                        bc --  Bias of the first "tanh", numpy array of shape (n_a, 1)                        Wo -- Weight matrix of the output gate, numpy array of shape (n_a, n_a + n_x)                        bo --  Bias of the output gate, numpy array of shape (n_a, 1)                        Wy -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)                            Returns:    a_next -- next hidden state, of shape (n_a, m)    c_next -- next memory state, of shape (n_a, m)    yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)    cache -- tuple of values needed for the backward pass, contains (a_next, c_next, a_prev, c_prev, xt, parameters)        Note: ft/it/ot stand for the forget/update/output gates, cct stands for the candidate value (c tilde),          c stands for the memory value    """    # Retrieve parameters from "parameters"    Wf = parameters["Wf"]    bf = parameters["bf"]    Wi = parameters["Wi"]    bi = parameters["bi"]    Wc = parameters["Wc"]    bc = parameters["bc"]    Wo = parameters["Wo"]    bo = parameters["bo"]    Wy = parameters["Wy"]    by = parameters["by"]        # Retrieve dimensions from shapes of xt and Wy    n_x, m = xt.shape    n_y, n_a = Wy.shape    ### START CODE HERE ###    # Concatenate a_prev and xt (≈3 lines)    concat = np.zeros((n_x + n_a, m))    concat[: n_a, :] = a_prev      concat[n_a :, :] = xt     # Compute values for ft, it, cct, c_next, ot, a_next using the formulas given figure (4) (≈6 lines)    ft = sigmoid(np.dot(Wf, concat) + bf)    it = sigmoid(np.dot(Wi, concat) + bi)    cct = np.tanh(np.dot(Wc, concat) + bc)    c_next = ft * c_prev + it * cct    ot = sigmoid(np.dot(Wo, concat) + bo)    a_next = ot * np.tanh(c_next)        # Compute prediction of the LSTM cell (≈1 line)    yt_pred = softmax(np.dot(Wy, a_next) + by)    ### END CODE HERE ###    # store values needed for backward propagation in cache    cache = (a_next, c_next, a_prev, c_prev, ft, it, cct, ot, xt, parameters)    return a_next, c_next, yt_pred, cache

Forward pass for LSTM

# GRADED FUNCTION: lstm_forwarddef lstm_forward(x, a0, parameters):    """    Implement the forward propagation of the recurrent neural network using an LSTM-cell described in Figure (3).    Arguments:    x -- Input data for every time-step, of shape (n_x, m, T_x).    a0 -- Initial hidden state, of shape (n_a, m)    parameters -- python dictionary containing:                        Wf -- Weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)                        bf -- Bias of the forget gate, numpy array of shape (n_a, 1)                        Wi -- Weight matrix of the update gate, numpy array of shape (n_a, n_a + n_x)                        bi -- Bias of the update gate, numpy array of shape (n_a, 1)                        Wc -- Weight matrix of the first "tanh", numpy array of shape (n_a, n_a + n_x)                        bc -- Bias of the first "tanh", numpy array of shape (n_a, 1)                        Wo -- Weight matrix of the output gate, numpy array of shape (n_a, n_a + n_x)                        bo -- Bias of the output gate, numpy array of shape (n_a, 1)                        Wy -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)                            Returns:    a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)    y -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)    caches -- tuple of values needed for the backward pass, contains (list of all the caches, x)    """    # Initialize "caches", which will track the list of all the caches    caches = []        ### START CODE HERE ###    # Retrieve dimensions from shapes of x and parameters['Wy'] (≈2 lines)    n_x, m, T_x = x.shape    n_y, n_a = parameters['Wy'].shape    # initialize "a", "c" and "y" with zeros (≈3 lines)    a = np.zeros((n_a, m, T_x))    c = np.zeros((n_a, m, T_x))    y = np.zeros((n_y, m, T_x))        # Initialize a_next and c_next (≈2 lines)    a_next = a0    c_next = np.zeros((n_a, m))        # loop over all time-steps    for t in range(T_x):        # Update next hidden state, next memory state, compute the prediction, get the cache (≈1 line)        a_next, c_next, yt, cache = lstm_cell_forward(x[:, :, t], a_next, c_next, parameters)        # Save the value of the new "next" hidden state in a (≈1 line)        a[:,:,t] = a_next        # Save the value of the prediction in y (≈1 line)        y[:,:,t] = yt        # Save the value of the next cell state (≈1 line)        c[:,:,t]  = c_next        # Append the cache into caches (≈1 line)        caches.append(cache)            ### END CODE HERE ###        # store values needed for backward propagation in cache    caches = (caches, x)    return a, y, c, caches

3 - Backpropagation in recurrent neural networks

接下来是RNN的反向传播,但是一般框架都会帮我们实现,这里看看就好了。公式也比较复杂。

RNN backward pass

def rnn_cell_backward(da_next, cache):    """    Implements the backward pass for the RNN-cell (single time-step).    Arguments:    da_next -- Gradient of loss with respect to next hidden state    cache -- python dictionary containing useful values (output of rnn_cell_forward())    Returns:    gradients -- python dictionary containing:                        dx -- Gradients of input data, of shape (n_x, m)                        da_prev -- Gradients of previous hidden state, of shape (n_a, m)                        dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)                        dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)                        dba -- Gradients of bias vector, of shape (n_a, 1)    """        # Retrieve values from cache    (a_next, a_prev, xt, parameters) = cache        # Retrieve values from parameters    Wax = parameters["Wax"]    Waa = parameters["Waa"]    Wya = parameters["Wya"]    ba = parameters["ba"]    by = parameters["by"]    ### START CODE HERE ###    # compute the gradient of tanh with respect to a_next (≈1 line)    dtanh = (1 - a_next**2) * da_next    # compute the gradient of the loss with respect to Wax (≈2 lines)    dxt = np.dot(Wax.T, dtanh)    dWax = np.dot(dtanh, xt.T)    # compute the gradient with respect to Waa (≈2 lines)    da_prev = np.dot(Waa.T, dtanh)    dWaa = np.dot(dtanh, a_prev.T)    # compute the gradient with respect to b (≈1 line)    dba = np.sum(dtanh, keepdims=True, axis=-1)    ### END CODE HERE ###        # Store the gradients in a python dictionary    gradients = {
"dxt": dxt, "da_prev": da_prev, "dWax": dWax, "dWaa": dWaa, "dba": dba} return gradients
def rnn_backward(da, caches):    """    Implement the backward pass for a RNN over an entire sequence of input data.    Arguments:    da -- Upstream gradients of all hidden states, of shape (n_a, m, T_x)    caches -- tuple containing information from the forward pass (rnn_forward)        Returns:    gradients -- python dictionary containing:                        dx -- Gradient w.r.t. the input data, numpy-array of shape (n_x, m, T_x)                        da0 -- Gradient w.r.t the initial hidden state, numpy-array of shape (n_a, m)                        dWax -- Gradient w.r.t the input's weight matrix, numpy-array of shape (n_a, n_x)                        dWaa -- Gradient w.r.t the hidden state's weight matrix, numpy-arrayof shape (n_a, n_a)                        dba -- Gradient w.r.t the bias, of shape (n_a, 1)    """            ### START CODE HERE ###        # Retrieve values from the first cache (t=1) of caches (≈2 lines)    (caches, x) = caches    (a1, a0, x1, parameters) = caches[0]        # Retrieve dimensions from da's and x1's shapes (≈2 lines)    n_a, m, T_x = da.shape    n_x, m = x1.shape        # initialize the gradients with the right sizes (≈6 lines)    dx = np.zeros((n_x, m, T_x))    dWax = np.zeros((n_a, n_x))    dWaa = np.zeros((n_a, n_a))    dba = np.zeros((n_a, 1))    da0 = np.zeros((n_a, m))    da_prevt = np.zeros((n_a, m))    # Loop through all the time steps    for t in reversed(range(T_x)):        # Compute gradients at time step t. Choose wisely the "da_next" and the "cache" to use in the backward propagation step. (≈1 line)        gradients = rnn_cell_backward(da[:, :, t] + da_prevt, caches[t])        # Retrieve derivatives from gradients (≈ 1 line)        dxt, da_prevt, dWaxt, dWaat, dbat = gradients["dxt"], gradients["da_prev"], gradients["dWax"], gradients["dWaa"], gradients["dba"]        # Increment global derivatives w.r.t parameters by adding their derivative at time-step t (≈4 lines)        dx[:, :, t] = dxt        dWax += dWaxt        dWaa += dWaat        dba += dbat    # Set da0 to the gradient of a which has been backpropagated through all time-steps (≈1 line)     da0 = da_prevt    ### END CODE HERE ###    # Store the gradients in a python dictionary    gradients = {
"dx": dx, "da0": da0, "dWax": dWax, "dWaa": dWaa,"dba": dba} return gradients

LSTM backward pass

def lstm_cell_backward(da_next, dc_next, cache):    """    Implement the backward pass for the LSTM-cell (single time-step).    Arguments:    da_next -- Gradients of next hidden state, of shape (n_a, m)    dc_next -- Gradients of next cell state, of shape (n_a, m)    cache -- cache storing information from the forward pass    Returns:    gradients -- python dictionary containing:                        dxt -- Gradient of input data at time-step t, of shape (n_x, m)                        da_prev -- Gradient w.r.t. the previous hidden state, numpy array of shape (n_a, m)                        dc_prev -- Gradient w.r.t. the previous memory state, of shape (n_a, m, T_x)                        dWf -- Gradient w.r.t. the weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)                        dWi -- Gradient w.r.t. the weight matrix of the update gate, numpy array of shape (n_a, n_a + n_x)                        dWc -- Gradient w.r.t. the weight matrix of the memory gate, numpy array of shape (n_a, n_a + n_x)                        dWo -- Gradient w.r.t. the weight matrix of the output gate, numpy array of shape (n_a, n_a + n_x)                        dbf -- Gradient w.r.t. biases of the forget gate, of shape (n_a, 1)                        dbi -- Gradient w.r.t. biases of the update gate, of shape (n_a, 1)                        dbc -- Gradient w.r.t. biases of the memory gate, of shape (n_a, 1)                        dbo -- Gradient w.r.t. biases of the output gate, of shape (n_a, 1)    """    # Retrieve information from "cache"    (a_next, c_next, a_prev, c_prev, ft, it, cct, ot, xt, parameters) = cache        ### START CODE HERE ###    # Retrieve dimensions from xt's and a_next's shape (≈2 lines)    n_x, m = xt.shape    n_a, m = a_next.shape    # Compute gates related derivatives, you can find their values can be found by looking carefully at equations (7) to (10) (≈4 lines)    dot = da_next * np.tanh(c_next) * ot * (1-ot)    dcct = (dc_next*it+ot*(1-np.square(np.tanh(c_next)))*it*da_next)*(1-np.square(cct))    dit = (dc_next*cct+ot*(1-np.square(np.tanh(c_next)))*cct*da_next)*it*(1-it)    dft = (dc_next*c_prev+ot*(1-np.square(np.tanh(c_next)))*c_prev*da_next)*ft*(1-ft)     # Code equations (7) to (10) (≈4 lines)    # dit = None    # dft = None    # dot = None    # dcct = None    # Compute parameters related derivatives. Use equations (11)-(14) (≈8 lines)    dWf = np.dot(dft, np.concatenate((a_prev, xt), axis=0).T)    dWi = np.dot(dit, np.concatenate((a_prev, xt), axis=0).T)    dWc = np.dot(dcct, np.concatenate((a_prev, xt), axis=0).T)    dWo = np.dot(dot, np.concatenate((a_prev, xt), axis=0).T)    dbf = np.sum(dft, axis=1, keepdims=True)    dbi = np.sum(dit, axis=1, keepdims=True)    dbc = np.sum(dcct, axis=1, keepdims=True)    dbo = np.sum(dot, axis=1, keepdims=True)    # Compute derivatives w.r.t previous hidden state, previous memory state and input. Use equations (15)-(17). (≈3 lines)    da_prev = np.dot(parameters['Wf'][:,:n_a].T, dft) + np.dot(parameters['Wi'][:,:n_a].T, dit) + np.dot(parameters['Wc'][:,:n_a].T, dcct) + np.dot(parameters['Wo'][:,:n_a].T, dot)    dc_prev = dc_next*ft + ot*(1-np.square(np.tanh(c_next)))*ft*da_next    dxt = np.dot(parameters['Wf'][:,n_a:].T,dft)+np.dot(parameters['Wi'][:,n_a:].T,dit)+np.dot(parameters['Wc'][:,n_a:].T,dcct)+np.dot(parameters['Wo'][:,n_a:].T,dot)     ### END CODE HERE ###    # Save gradients in dictionary    gradients = {
"dxt": dxt, "da_prev": da_prev, "dc_prev": dc_prev, "dWf": dWf,"dbf": dbf, "dWi": dWi,"dbi": dbi, "dWc": dWc,"dbc": dbc, "dWo": dWo,"dbo": dbo} return gradients
def lstm_backward(da, caches):        """    Implement the backward pass for the RNN with LSTM-cell (over a whole sequence).    Arguments:    da -- Gradients w.r.t the hidden states, numpy-array of shape (n_a, m, T_x)    dc -- Gradients w.r.t the memory states, numpy-array of shape (n_a, m, T_x)    caches -- cache storing information from the forward pass (lstm_forward)    Returns:    gradients -- python dictionary containing:                        dx -- Gradient of inputs, of shape (n_x, m, T_x)                        da0 -- Gradient w.r.t. the previous hidden state, numpy array of shape (n_a, m)                        dWf -- Gradient w.r.t. the weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)                        dWi -- Gradient w.r.t. the weight matrix of the update gate, numpy array of shape (n_a, n_a + n_x)                        dWc -- Gradient w.r.t. the weight matrix of the memory gate, numpy array of shape (n_a, n_a + n_x)                        dWo -- Gradient w.r.t. the weight matrix of the save gate, numpy array of shape (n_a, n_a + n_x)                        dbf -- Gradient w.r.t. biases of the forget gate, of shape (n_a, 1)                        dbi -- Gradient w.r.t. biases of the update gate, of shape (n_a, 1)                        dbc -- Gradient w.r.t. biases of the memory gate, of shape (n_a, 1)                        dbo -- Gradient w.r.t. biases of the save gate, of shape (n_a, 1)    """    # Retrieve values from the first cache (t=1) of caches.    (caches, x) = caches    (a1, c1, a0, c0, f1, i1, cc1, o1, x1, parameters) = caches[0]        ### START CODE HERE ###    # Retrieve dimensions from da's and x1's shapes (≈2 lines)    n_a, m, T_x = da.shape    n_x, m = x1.shape    # initialize the gradients with the right sizes (≈12 lines)    dx = np.zeros((n_x, m, T_x))    da0 = np.zeros((n_a, m))    da_prevt = np.zeros((n_a, m))    dc_prevt = np.zeros((n_a, m))    dWf = np.zeros((n_a, n_a+n_x))    dWi = np.zeros((n_a, n_a+n_x))    dWc = np.zeros((n_a, n_a+n_x))    dWo = np.zeros((n_a, n_a+n_x))    dbf = np.zeros((n_a, 1))    dbi = np.zeros((n_a, 1))    dbc = np.zeros((n_a, 1))    dbo = np.zeros((n_a, 1))    # loop back over the whole sequence    for t in reversed(range(T_x)):        # Compute all gradients using lstm_cell_backward        gradients = lstm_cell_backward(da[:, :, t] + da_prevt, dc_prevt, caches[t])        # Store or add the gradient to the parameters' previous step's gradient        dx[:,:,t] = gradients['dxt']        dWf = dWf + gradients['dWf']        dWi = dWi + gradients['dWi']        dWc = dWc + gradients['dWc']        dWo = dWo + gradients['dWo']        dbf = dbf + gradients['dbf']        dbi = dbi + gradients['dbi']        dbc = dbc + gradients['dbc']        dbo = dbo + gradients['dbo']    # Set the first activation's gradient to the backpropagated gradient da_prev.    da0 = gradients['da_prev']    ### END CODE HERE ###    # Store the gradients in a python dictionary    gradients = {
"dx": dx, "da0": da0, "dWf": dWf,"dbf": dbf, "dWi": dWi,"dbi": dbi, "dWc": dWc,"dbc": dbc, "dWo": dWo,"dbo": dbo} return gradients

转载地址:http://jrrii.baihongyu.com/

你可能感兴趣的文章
高并发与大数据解决方案概述
查看>>
解决SimpleDateFormat线程安全问题NumberFormatException: multiple points
查看>>
MySQL数据库存储引擎简介
查看>>
处理Maven本地仓库.lastUpdated文件
查看>>
CentOS7,玩转samba服务,基于身份验证的共享
查看>>
计算机网络-网络协议模型
查看>>
计算机网络-OSI各层概述
查看>>
Java--String/StringBuffer/StringBuilder区别
查看>>
mySQL--深入理解事务隔离级别
查看>>
分布式之redis复习精讲
查看>>
数据结构与算法7-栈
查看>>
Java并发编程 | 一不小心就死锁了,怎么办?
查看>>
(python版)《剑指Offer》JZ01:二维数组中的查找
查看>>
(python版)《剑指Offer》JZ06:旋转数组的最小数字
查看>>
(python版)《剑指Offer》JZ13:调整数组顺序使奇数位于偶数前面
查看>>
(python版)《剑指Offer》JZ28:数组中出现次数超过一半的数字
查看>>
(python版)《剑指Offer》JZ30:连续子数组的最大和
查看>>
(python版)《剑指Offer》JZ32:把数组排成最小的数
查看>>
(python版)《剑指Offer》JZ02:替换空格
查看>>
JSP/Servlet——MVC设计模式
查看>>