Now let's dive under the hood as to how the RNN actually works. The way that RNNs work is that it takes in that first step, that first part of our series in our sequence, and that is used to learn the first hidden state. Then that hidden state, which has information from t_0, is passed along to the next hidden state along with the next step in our sequence. Now h_1, which is our second hidden state, has information from both t_0 via the last hidden state, as well as t_1. Then we pass that hidden states information. Again, h_1 had information from t_0 and t_1. We pass those onto h_2, along with the next part of our sequence, t_2, and now h_2 has information from t_0, t_1, as well as t_2. We continue that process until finally we have information from all our inputs and can make a prediction as to what the next value in our sequence will be. Again, just outputting a single value. As we've been discussing, generally speaking, we're going to use inputs that are a time series historical time steps here using five time steps, and then our output is just meant to be the series very next step. In order to then forecast multiple steps into the future, what we do is we add that predicted output as the next input to then predict further up. Here we use the first four steps to predict the value at t_5, and then we pass in that predicted t_5 to ultimately predict t_6. Now, to get into the actual math, with recurrent neural nets, we have three trainable weight matrices, U, V, and W, and we'll discuss how each is used in the slides to come. We discussed just before the h_i or the hidden states at each time step i, which we'll compute using our weight matrices and it'll be the way that we store the memory of each part of our sequence. We have our sigmoid activation function to ensure some non-linearity when we're doing our deep learning networks, and those weight matrices are going to be applied as linear transformations that lead into those sigmoid activation functions that we just discussed. So it'll be that linear transformation going into they'll nonlinear activation function. How do all of these connects? First that hidden state i is going to be composed of the dot product of U and the current time step, plus the dot product of V and the hidden state at the prior timestamp, so these are all linear transformation so far, and we're going to pass those linear transformations through the sigmoid activation function to get that non-linearity. Then the dot product of W and the hidden state will give us the output. Technically, there's output at each time step, is just that we don't care about those as we are not trying to predict those time steps. It'll be useful though those extra time steps was extra outputs. Could be useful though if we do want to build a deeper network, just something to note on the side. Then again, as a reminder, the sigmoid activation function takes on the form e_x over e_x plus 1, and outputs values between zero and one. That's why actually our t out, which would be our t_5 here, is not actually going to be passed through an activation function since we'll probably want some continuous value that's not bounded between zero and one. Now that we've been showing so far has been an unrolled version of the recurrent neural net. But oftentimes though, we may want to see it as the cycle we have here to the right. The reason we want to show this as a cycle is to emphasize that the same U and the same V are applied repeatedly to sequentially update the hidden states using the previous hidden state and the new input at each time step. With again, the output equal to the dot product of W and the final hidden state, and each hidden state equaling to the sigmoid of the dot product of U and input t_i plus the dot product of V and h_i minus 1. Again, we have the sigmoid that we'd be using that's going to help us run through this cycle, but again, always using the same matrices of U and V throughout this cycle. How do we obtain the weight matrices U, V, and W? When we train a recurrent neural net, we're actually finding weights via the back propagation algorithm. In back propagation, we repeatedly process the training data, updating the weights in order to minimize some cost function. For time series forecasting, a typical cost function would be the mean squared error or some similar metric that tests 40 out versus the next time step. How far off where we? Intuitively, we find values U, V, and W that cause our predicted outputs t out to be as close as possible to the true target values to that next step in our sequence. Now, there are limitations to simple recurrent neural networks. Basic recurrent neural networks often struggle when processing long input sequences. Mathematically, RNNs have trouble capturing long-term dependencies over many time steps. The problem for time series or a sequences are often hundreds of steps or more is that we won't be able to learn those longer-term patterns. In order to address this, long short-term memory networks or LSTMs, can help mitigate these issues with a better memory system. With that, we'll talk about how to leverage LSTMs for time series modeling in the upcoming video. I'll see you there.