Welcome to our notebook here on using deep learning for time series modeling. In this section, we're going to again discuss what we did in the lecture, and figure out how we can actually leverage deep learning and using Keras and Python in order to do time series modeling. So first things first, we're going to import the necessary libraries. We see that we have here, especially what I want to point out is the LSTM as well as a simple RNN. We're going to use those cells as we create our sequential models in order to actually model out our time series models. Now we're going to start off with simple recurrent neural networks. But first we want to introduce you to the actual data will be working with. We're going to be looking at the particle matters within Beijing at these different districts. So we have Dongsi, Dongsi one. Excuse me if I messed up what the pronunciations, but the idea is that you have across time each one of these different values. You also have the temperature and other variables that you may want to work in. We only work with one variable here, but if you want to work with more variables, feel free to pull those in as well. Then we have the year, month, day, and even the hour for each one of these different values. So we're going to plot out just the temperature over time and that's every hour. This is for all of 2015. We can see along with the season how it rises and falls. The next thing that we're going to do here is we're actually going to interpolate any missing data. There's going to be some missing data as we see here with those null values. We use the interpolate function in order to fill in those values. Since this is sequential, the data is ordered in time. It's going to try to say, given what happened in the last value is going to happen in the future value. What's a good way to interpolate or to fill in that value? There's different options that can come up with complex ways of actually filling in those values. By default, it's just going to use the linear function and that's what we're going to use here. The next thing that we're going to want to do is we're going to want an actual column that tells us the date. If you recall everything that we've been working with so far with time series within Pandas, we probably also want to set that equal to our index. So in order to do that, we call this datetime function and passing in, this is going to be a function that passes in the actual row from our DataFrame. It's going to take the rows year, take that rows month, take that rows day, and that rows hour. So that's from up here to recall what we're actually pulling from. Then creates that datetime objects and then we create our new column here, that's just applying that function to the row. Axis one ensures that is applying it to the rows. Then we're setting that new date column to our index. Now I'm going to skip one cell just to see that this is now our index. If we just looked at a single column which will show us the actual values as well as the indices and we see that it's now a datetime objects with both the date and the hour. If we were to plot that out, since that's the index, we also have now as our axis, the actual time values. Now, the next thing that we're going to want to do here, is we want to probably just look at certain windows of time, specifically a certain number of days. So we're going to create a function here that allows you to pull out a certain number of days and then once we pull out that number of days, we're going to leverage that function to also plot that number of days. In order to do that, we're doing it again just for a single column. So we say, "What does that column name?" We can pass that into our DataFrame. Then we just say the number of days and we multiply that by 24 since we're working with hourly data. We see here that we have a new row for every single hour. If we multiply it by 24, then we're working with a full day. So if we want three days, we want from negative 72 through to the end. Then we can leverage that get_n_last_days here. To just say I want to plot out just those last days, and already the x-axis will be just the Indices, which is our date-time objects. Then it'll just be whatever series we pass through, whatever column we pass through. Then from there we're just putting in a title that gives us the number of days and an x label and a y label. We can plot them last days, and we can see here a plot of the last 42 days for a PM_Dongsi. We have this review question of what components that we've learned in previous lessons, actually appear to be present within our time series here. If we look at it, it appears to be periodic components. That periodic component, by the way, doesn't need to be seasonal, as in throughout the year, could be hourly throughout the day, certain peaks and valleys within the day, just something to note as well. Also we see a bit of auto-correlation structure where if the PM starts to go up, it continues to go up for a certain amount of time and same with trending down. Now, our goal is to end up training a simple RNN to start on our PM_Dongsi time series. But in order to do that, we will need our data in a specific format to work with Keras, and that format's going to be this three-dimensional NumPy array. That's the number of samples we want to put through. For each one of those samples, what are the number of time steps for that sample? Then if we have multiple features, the number of features, or even if we only have one feature, we still have to create another dimension to say that it's just one feature. How does that work? We're going to create this get_keras_format and it's going to take a two-dimensional series. It's going to reshape that two-dimensional series, which is just the number of samples as well as the number of time steps in that sample. So the original series is just going to be these two dimensions, and then we add on this third dimension of one. Now, if you wanted more features, then you can change this to two or however many features you want, or you can pass that in here as a part of your function, a variable, and then put this into features, or just going to be working with one here so we keep this at one and remove this here. Now, the main work goes here into getting the train and test data. What we're going to want eventually is a bunch of different samples in our training set that are a certain length. If you recall from our lecture, we are working with length equal to five. We are going to want a bunch of samples of sequences of length five, and then our y value is just going to be the sixth value, so the following value. You can do the first five values, then the six, then 3-8, then the ninth, then 5-10 and 11, so on and so forth. Let's walk through how we actually come up with that training data. The first thing is, we're just going to be working with the last end days. We see here that we actually have the option to pass in the amount of days that we want to work with. Then our training set is going to be all the way through to our number of test hours that we want to work with. We're just going to hold out a small set for our test hours that we're actually going to use as our holdout set. Then our test set is obviously going to be those tests hours through to the end of our data frame or to the end of our series. We start off with train_x and train_y being these blink lists. Then we're going from i in range zero through till the end, the number of samples that we have, so starting off it's one long array. We say minus input ours, because this is going to be our starting point. If you look quickly here, we're going from that starting point through to that point plus input hours. That's really going to get us to the end of our series. Then the sample gap is saying, how many times do we want to skip before we start our next sequence? If you thought of my example of starting with just five values for every sequence, we can say that we want our sample gap equal to three, and we can take values 0-4, that'll be our first five values, then skip 0, 1, 2, and then have values from 3-7 being our next input samples. That's the reason for the sample gap, so we're not just moving one at a time, but actually creating sequences with a bit less overlap. We continually append on that i. We start off again, we can say zero and we can put hours our five steps into the future, so from 0-5. Then our train_y is just going to be that very next value. Recall in Python that if we're slicing through to a certain value, it doesn't actually include that value. Then as our y, which is going to be our very next step that we're trying to predict, that's just going to be that very next value, so train i plus input hours. These two will have to match up, and they will. Then we need the change. Now we have a bunch of different sequences that are going to be off shape, whatever our input hours are, by the number of samples that we end up producing. Once we append each one of those to our list, so we can end up let's say, 20 different sequences, and each one of those 20 different sequences have length five. Then we're working with that two-dimensional array. We have to get the Keras format and add on that third dimension that we have here. Then our train_y is just going to be our output variable. We'll just make that into an array rather than a list. Then finally, to actually test out how we're doing, we want an initial x value, so that's going to be our test set and has to be of the same size of our actual sequences that we've been working with, since we've been training our model to work say here again, with input hours equal to five. In order to pass that through our actual model, we have to have initial test set equal to a sequence of five. If you recall from lecture, we are going to pass in that sequence of five, predicts the sixth value, then we can use that next sixth value to help predict the seventh, and so on and so forth. Then test_y is just going to be that test with input hours due to the end. That's just going to be the remaining values from our test set. Then we're going to return so that we can use all these values, the train_x, the tests x and x, and then the train_y and the test_y. Now, that was a lot of talking through everything that we have there. Now let's look at what our function is actually accomplishing here. We're going to specify that we want 56 days. Our number of hours are going to be 12, so we're going to be working with sequences of length 12. Then at the end, we're going to have tests hours equal to 24, I'm going to test out 24 hours. You run this and our train, let's skip to here we can see our training input is those three dimensions, where we have 436 samples. Each one has 12 different time steps, and we add the number of features, which is just one. Let's actually just pull this out to make this a bit clear, we see the train_x shape again, that's what it is for you to look at just one sample. We see here that we have a sequence that's of length 12, and the shape of this is 12 by 1, this one being the number of features. Then we have our training output shape, which is just what we're trying to predict. For every single one of these different sequences for this, the 13th step, what we're trying to predict will be train_y zero. We use these 12 values to try to predict this 13th value, which is equal to 17. Then we have our test input shape, which is our initial test. Then recall that our tests hours are 24, half of that one into the initial test, and then the test output shape that we're trying to predict is going to be the next 12. We're going to use this to pass in our initial input from our test set, and then sequentially keep predicting that very next value. Now we have that all set up. I'm going to stop the video here. In the next video we're going to leverage our training set that we've now created as well as our test set, and actually fit a simple RNN and see how well we can actually perform using these deep learning techniques on time series. I'll see you there.