์๋ ํ์ธ์.
์ค๋์ LSTM์ ์ด์ฉํด์ ์ผ์ฑ์ ์ ์ฃผ๊ฐ๋ฅผ ์์ธกํด๋ณด๊ฒ ์ต๋๋ค.
ํฐ Dataset์ ๋ฐ๋ก ํ์ํ์ง ์์ผ๋ ๋ถ๋ด ๊ฐ์ง ์๊ณ ํ์๋ฉด ๋ ๊ฒ ๊ฐ์ต๋๋ค.
์๋๋ ๋ณธ๋ฌธ ๊ธ์ ๋๋ค.
LSTM์ด ์ด๋ป๊ฒ ๋์์ ํ๋์ง ์์ธํ ์์๊ณ ์ถ์ผ์๋ฉด ์๋ ๋ธ๋ก๊ทธ๋ฅผ ์ถ์ฒ๋๋ฆฝ๋๋ค.
dgkim5360.tistory.com/entry/understanding-long-short-term-memory-lstm-kr
Long Short-Term Memory (LSTM) ์ดํดํ๊ธฐ
์ด ๊ธ์ Christopher Olah๊ฐ 2015๋ 8์์ ์ด ๊ธ์ ์ฐ๋ฆฌ ๋ง๋ก ๋ฒ์ญํ ๊ฒ์ด๋ค. Recurrent neural network์ ๊ฐ๋ ์ ์ฝ๊ฒ ์ค๋ช ํ๊ณ , ๊ทธ ์ค ํ๊ธฐ์ ์ธ ๋ชจ๋ธ์ธ LSTM์ ์ด๋ก ์ ์ผ๋ก ์ดํดํ ์ ์๋๋ก ์ข์ ๊ทธ๋ฆผ๊ณผ ํจ๊ป
dgkim5360.tistory.com
1. ๋ผ์ด๋ธ๋ฌ๋ฆฌ
import numpy as np
import pandas as pd
import pandas_datareader.data as pdr
import matplotlib.pyplot as plt
import datetime
import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
no module pandas_datareaderno module named 'pandas_datareader'
pandas๊ฐ ๊น๋ ค ์๋๋ฐ, ์ ๋ฌธ๊ตฌ๊ฐ ๋ฌ๋ค๋ฉด pip install pandas_datareader๋ก ๋ค์ด๋ก๋ํฉ๋๋ค.
[ํ์ด์ฌ ์์ฉ] pandas_datareader์ error๋ฌธ : FutureWarning: pandas.util.testing is deprecated. Use the functions in the pu
์๋ ํ์ธ์. pandas_datareader์ ์ด์ฉํด์ ๋ฐ์ดํฐ ์ฒ๋ฆฌ๋ฅผ ํ๊ธฐ ์ํด ์๋ ๋ฌธ๊ตฌ์ฒ๋ผ ์๋ฌ๋ฌธ์ด ๋จ๋ ๊ฒฝ์ฐ๊ฐ ์์ต๋๋ค. FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at..
coding-yoon.tistory.com
์๋ ์๋ Pandas๋ฅผ ๊น๋ฉด ์๋์ผ๋ก ๊น๋ ธ์๋๋ฐ, ์ด๋ฒ์ ์์ ๋ถ๋ฆฌ๊ฐ ๋ ๊ฒ ๊ฐ์ต๋๋ค.
2. ์ผ์ฑ ์ ์ ์ฃผ์ ๋ถ๋ฌ์ค๊ธฐ
start = (2000, 1, 1) # 2020๋
01๋
01์
start = datetime.datetime(*start)
end = datetime.date.today() # ํ์ฌ
# yahoo ์์ ์ผ์ฑ ์ ์ ๋ถ๋ฌ์ค๊ธฐ
df = pdr.DataReader('005930.KS', 'yahoo', start, end)
df.head(5)
df.tail(5)
df.Close.plot(grid=True)
์ผ์ฑ ์ ์ ์ข ๊ฐ๋ฅผ 2000๋ ๋ถํฐ 2020๋ ์ผ๋ก ํ ๋ฒ์ ๋ณด๋ ๋ฏธ์ณ ๋ ๋ฐ๋ค์. ์ง๊ธ์ด๋ผ๋ ์ด ํ๋ฆ์ ํ์ผ ํ์ง ์์๊น์.
์ญ ๋ง์ ์ ๊ฐ์!!
ํน์ ๋ค๋ฅธ ์ฃผ์๋ ํ๊ณ ์ถ์ผ์๋ฉด ์ผํ ํ์ด๋ธ์์์ ์ฐพ์๋ณด์๋ ๊ฒ๋ ์ถ์ฒ๋๋ฆฝ๋๋ค.
Yahoo Finance - Stock Market Live, Quotes, Business & Finance News
At Yahoo Finance, you get free stock quotes, up-to-date news, portfolio management resources, international market data, social interaction and mortgage rates that help you manage your financial life.
finance.yahoo.com
๊ทธ๋ฆฌ๊ณ ํ์ต๋ ๋ชจ๋ธ์ด ์ฑ๋ฅ์ ํ์ธํ๊ธฐ ์ํด์ ์ ๋ฐ์ดํฐ(ํ์ฌ ์ฝ 5296๊ฐ)๋ฅผ Train(ํ์ตํ๊ณ ์ ํ๋ ๋ฐ์ดํฐ)๋ฅผ 0๋ถํฐ 4499๊น์ง, Test(์ฑ๋ฅ ํ ์คํธํ๋ ๋ฐ์ดํฐ)๋ 4500๋ถํฐ 5295๊ฐ ๊น์ง ๋ฐ์ดํฐ๋ก ๋ถ๋ฅํฉ๋๋ค.
์ค๋์ ๋๋ต, ๋ ธ๋์ ์ ์ ๋๊น์ง ๋ฐ์ดํฐ๋ฅผ ๊ฐ์ง๊ณ ํ์ต์ ํ๊ณ , ๋ ธ๋์ ์ ์ดํ๋ถํฐ ์์ธก์ ํ ๊ฒ์ ๋๋ค.
๊ณผ์ฐ ๋ด๋ ค๊ฐ๊ณ ์ฌ๋ผ๊ฐ๋ ํฌ์ธํธ๋ฅผ ์ ์์ธกํ ์ ์์์ง ๊ถ๊ธํฉ๋๋ค.
3. ๋ฐ์ดํฐ์ ์ค๋นํ๊ธฐ
"""
์ ๋ ์ฃผ์์ ์ ๋ชจ๋ฅด๊ธฐ ๋๋ฌธ์ ์ฐธ๊ณ ํด์ฃผ์๋ฉด ์ข์ ๊ฒ ๊ฐ์ต๋๋ค.
open ์๊ฐ
high ๊ณ ๊ฐ
low ์ ๊ฐ
close ์ข
๊ฐ
volume ๊ฑฐ๋๋
Adj Close ์ฃผ์์ ๋ถํ , ๋ฐฐ๋น, ๋ฐฐ๋ถ ๋ฑ์ ๊ณ ๋ คํด ์กฐ์ ํ ์ข
๊ฐ
ํ์คํ๊ฑด ๊ฑฐ๋๋(Volume)์ ๋ฐ์ดํฐ์์ ์ ํ๋ ๊ฒ์ด ์ค์ํ๊ณ ,
Y ๋ฐ์ดํฐ๋ฅผ Adj Close๋ก ์ ํฉ๋๋ค. (์ข
๊ฐ๋ก ํด๋ ๋๋ค๊ณ ์๊ฐํฉ๋๋ค.)
"""
X = df.drop(columns='Volume')
y = df.iloc[:, 5:6]
print(X)
print(y)
"""
ํ์ต์ด ์๋๊ธฐ ์ํด ๋ฐ์ดํฐ ์ ๊ทํ
StandardScaler ๊ฐ ํน์ง์ ํ๊ท ์ 0, ๋ถ์ฐ์ 1์ด ๋๋๋ก ๋ณ๊ฒฝ
MinMaxScaler ์ต๋/์ต์๊ฐ์ด ๊ฐ๊ฐ 1, 0์ด ๋๋๋ก ๋ณ๊ฒฝ
"""
from sklearn.preprocessing import StandardScaler, MinMaxScaler
mm = MinMaxScaler()
ss = StandardScaler()
X_ss = ss.fit_transform(X)
y_mm = mm.fit_transform(y)
# Train Data
X_train = X_ss[:4500, :]
X_test = X_ss[4500:, :]
# Test Data
"""
( ๊ตณ์ด ์์ด๋ ๋๋ค. ํ์ง๋ง ์ผ๋ง๋ ์์ธก๋ฐ์ดํฐ์ ์ค์ ๋ฐ์ดํฐ์ ์ ํ๋๋ฅผ ํ์ธํ๊ธฐ ์ํด
from sklearn.metrics import accuracy_score ๋ฅผ ํตํด ์ ํํ ๊ฐ์ผ๋ก ํ์ธํ ์ ์๋ค. )
"""
y_train = y_mm[:4500, :]
y_test = y_mm[4500:, :]
print("Training Shape", X_train.shape, y_train.shape)
print("Testing Shape", X_test.shape, y_test.shape)
"""
torch Variable์๋ 3๊ฐ์ ํํ๊ฐ ์๋ค.
data, grad, grad_fn ํ ๋ฒ ๊ตฌ๊ธ์ ์ฐพ์์ ๊ณต๋ถํด๋ณด๊ธธ ๋ฐ๋๋๋ค.
"""
X_train_tensors = Variable(torch.Tensor(X_train))
X_test_tensors = Variable(torch.Tensor(X_test))
y_train_tensors = Variable(torch.Tensor(y_train))
y_test_tensors = Variable(torch.Tensor(y_test))
X_train_tensors_final = torch.reshape(X_train_tensors, (X_train_tensors.shape[0], 1, X_train_tensors.shape[1]))
X_test_tensors_final = torch.reshape(X_test_tensors, (X_test_tensors.shape[0], 1, X_test_tensors.shape[1]))
print("Training Shape", X_train_tensors_final.shape, y_train_tensors.shape)
print("Testing Shape", X_test_tensors_final.shape, y_test_tensors.shape)
4. GPU ์ค๋นํ๊ธฐ (์์ผ๋ฉด CPU๋ก ๋๋ฆฌ๋ฉด ๋ฉ๋๋ค.)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") # device
print(torch.cuda.get_device_name(0))
5. LSTM ๋คํธ์ํฌ ๊ตฌ์ฑํ๊ธฐ
class LSTM1(nn.Module):
def __init__(self, num_classes, input_size, hidden_size, num_layers, seq_length):
super(LSTM1, self).__init__()
self.num_classes = num_classes #number of classes
self.num_layers = num_layers #number of layers
self.input_size = input_size #input size
self.hidden_size = hidden_size #hidden state
self.seq_length = seq_length #sequence length
self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size,
num_layers=num_layers, batch_first=True) #lstm
self.fc_1 = nn.Linear(hidden_size, 128) #fully connected 1
self.fc = nn.Linear(128, num_classes) #fully connected last layer
self.relu = nn.ReLU()
def forward(self,x):
h_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)).to(device) #hidden state
c_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)).to(device) #internal state
# Propagate input through LSTM
output, (hn, cn) = self.lstm(x, (h_0, c_0)) #lstm with input, hidden, and internal state
hn = hn.view(-1, self.hidden_size) #reshaping the data for Dense layer next
out = self.relu(hn)
out = self.fc_1(out) #first Dense
out = self.relu(out) #relu
out = self.fc(out) #Final Output
return out
์ ์ฝ๋๋ ๋ณต์กํด ๋ณด์ด์ง๋ง, ์ค์ ํ๋์ฉ ํ์ธํด๋ณด๋ฉด ๊ต์ฅํ ์ฐ์ฐ์ด ์ ์ ๋คํธ์ํฌ์ ๋๋ค.
์๊ณ์ด ๋ฐ์ดํฐ์ด์ง๋ง, ๊ฐ๋จํ ๊ตฌ์ฑ์ ์ํด Sequence Length๋ 1์ด๊ณ , LSTM Layer๋ 1์ด๊ธฐ ๋๋ฌธ์ ๊ต์ฅํ ๋นจ๋ฆฌ ๋๋ฉ๋๋ค. ์๋ง ๋ณธ๋ฌธ ์์ฑ์๊ฐ CPUํ๊ฒฝ์์๋ ์ฝ๊ฒ ๋ฐ๋ผ ํ ์ ์๊ฒ ๊ฐ๋จํ๊ฒ ์์ฑํ ๊ฒ ๊ฐ์ต๋๋ค.
์๋๋ Pytorch๋ก RNN์ ์ฌ์ฉํ๋ ๋ฐฉ๋ฒ์ ์ ์์ง๋ง, LSTM๊ณผ ๋์ผํฉ๋๋ค.
๊ธฐ๋ณธ ๋์ ์๋ฆฌ๋ง ์ดํดํ์๋ฉด, ์ฝ๊ฒ ๋ฐ๋ผ ํ์ค ์ ์์ต๋๋ค.
[๋ฅ๋ฌ๋] RNN with PyTorch ( RNN ๊ธฐ๋ณธ ๊ตฌ์กฐ, ์ฌ์ฉ ๋ฐฉ๋ฒ )
์ค๋์ Pytorch๋ฅผ ํตํด RNN์ ์์๋ณด๊ฒ ์ต๋๋ค. https://www.youtube.com/watch?v=bPRfnlG6dtU&t=2674s RNN์ ๊ธฐ๋ณธ๊ตฌ์กฐ๋ฅผ ๋ชจ๋ฅด์๋ฉด ์ ๋งํฌ๋ฅผ ๋ณด์๋๊ฑธ ์ถ์ฒ๋๋ฆฝ๋๋ค. Pytorch document์ RNN์ ํ์ธํ๊ฒ ์ต๋๋ค. ht..
coding-yoon.tistory.com
5. ๋คํธ์ํฌ ํ๋ผ๋ฏธํฐ ๊ตฌ์ฑํ๊ธฐ
num_epochs = 30000 #1000 epochs
learning_rate = 0.00001 #0.001 lr
input_size = 5 #number of features
hidden_size = 2 #number of features in hidden state
num_layers = 1 #number of stacked lstm layers
num_classes = 1 #number of output classes
lstm1 = LSTM1(num_classes, input_size, hidden_size, num_layers, X_train_tensors_final.shape[1]).to(device)
loss_function = torch.nn.MSELoss() # mean-squared error for regression
optimizer = torch.optim.Adam(lstm1.parameters(), lr=learning_rate) # adam optimizer
6. ํ์ตํ๊ธฐ
for epoch in range(num_epochs):
outputs = lstm1.forward(X_train_tensors_final.to(device)) #forward pass
optimizer.zero_grad() #caluclate the gradient, manually setting to 0
# obtain the loss function
loss = loss_function(outputs, y_train_tensors.to(device))
loss.backward() #calculates the loss of the loss function
optimizer.step() #improve from loss, i.e backprop
if epoch % 100 == 0:
print("Epoch: %d, loss: %1.5f" % (epoch, loss.item()))
7. ์์ธกํ๊ธฐ
df_X_ss = ss.transform(df.drop(columns='Volume'))
df_y_mm = mm.transform(df.iloc[:, 5:6])
df_X_ss = Variable(torch.Tensor(df_X_ss)) #converting to Tensors
df_y_mm = Variable(torch.Tensor(df_y_mm))
#reshaping the dataset
df_X_ss = torch.reshape(df_X_ss, (df_X_ss.shape[0], 1, df_X_ss.shape[1]))
train_predict = lstm1(df_X_ss.to(device))#forward pass
data_predict = train_predict.data.detach().cpu().numpy() #numpy conversion
dataY_plot = df_y_mm.data.numpy()
data_predict = mm.inverse_transform(data_predict) #reverse transformation
dataY_plot = mm.inverse_transform(dataY_plot)
plt.figure(figsize=(10,6)) #plotting
plt.axvline(x=4500, c='r', linestyle='--') #size of the training set
plt.plot(dataY_plot, label='Actuall Data') #actual plot
plt.plot(data_predict, label='Predicted Data') #predicted plot
plt.title('Time-Series Prediction')
plt.legend()
plt.show()
๋นจ๊ฐ์ ์ ์ดํ๋ถํฐ ๋ชจ๋ธ์ด ์์ธก์ ํ ๊ฒ์ธ๋ฐ ๋๋ฆ ๋น์ทํ๊ฒ ๋์จ ๊ฒ ๊ฐ์ต๋๋ค.
ํ์ง๋ง ์ธ๊ณต์ง๋ฅ์ด๋ผ๋ ํ๋ง ์ ์๋ ์์ํ์ง ๋ชปํ๋ ๋ด ๋๋ค.