์•ˆ๋…•ํ•˜์„ธ์š”. 

 

์˜ค๋Š˜์€ LSTM์„ ์ด์šฉํ•ด์„œ ์‚ผ์„ฑ์ „์ž ์ฃผ๊ฐ€๋ฅผ ์˜ˆ์ธกํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. 

 

ํฐ Dataset์€ ๋”ฐ๋กœ ํ•„์š”ํ•˜์ง€ ์•Š์œผ๋‹ˆ ๋ถ€๋‹ด ๊ฐ–์ง€ ์•Š๊ณ  ํ•˜์‹œ๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. 

 

 

์•„๋ž˜๋Š” ๋ณธ๋ฌธ ๊ธ€์ž…๋‹ˆ๋‹ค.

cnvrg.io/pytorch-lstm/?gclid=Cj0KCQiA6t6ABhDMARIsAONIYyxsIXn6G6EcMLhGnPDxnsKiv3zLU49TRMxsyTPXZmOV3E-Hh4xeI2EaAugLEALw_wcB

 

LSTM์ด ์–ด๋–ป๊ฒŒ ๋™์ž‘์„ ํ•˜๋Š”์ง€ ์ž์„ธํžˆ ์•„์‹œ๊ณ  ์‹ถ์œผ์‹œ๋ฉด ์•„๋ž˜ ๋ธ”๋กœ๊ทธ๋ฅผ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

dgkim5360.tistory.com/entry/understanding-long-short-term-memory-lstm-kr

 

Long Short-Term Memory (LSTM) ์ดํ•ดํ•˜๊ธฐ

์ด ๊ธ€์€ Christopher Olah๊ฐ€ 2015๋…„ 8์›”์— ์“ด ๊ธ€์„ ์šฐ๋ฆฌ ๋ง๋กœ ๋ฒˆ์—ญํ•œ ๊ฒƒ์ด๋‹ค. Recurrent neural network์˜ ๊ฐœ๋…์„ ์‰ฝ๊ฒŒ ์„ค๋ช…ํ–ˆ๊ณ , ๊ทธ ์ค‘ ํš๊ธฐ์ ์ธ ๋ชจ๋ธ์ธ LSTM์„ ์ด๋ก ์ ์œผ๋กœ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ์ข‹์€ ๊ทธ๋ฆผ๊ณผ ํ•จ๊ป˜

dgkim5360.tistory.com

 

 

1. ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

import numpy as np
import pandas as pd
import pandas_datareader.data as pdr
import matplotlib.pyplot as plt

import datetime

import torch
import torch.nn as nn
from torch.autograd import Variable 

import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

 

no module pandas_datareaderno module named 'pandas_datareader'

pandas๊ฐ€ ๊น”๋ ค ์žˆ๋Š”๋ฐ, ์œ„ ๋ฌธ๊ตฌ๊ฐ€ ๋œฌ๋‹ค๋ฉด pip install pandas_datareader๋กœ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค. 

 

coding-yoon.tistory.com/56

 

[ํŒŒ์ด์ฌ ์‘์šฉ] pandas_datareader์˜ error๋ฌธ : FutureWarning: pandas.util.testing is deprecated. Use the functions in the pu

์•ˆ๋…•ํ•˜์„ธ์š”. pandas_datareader์„ ์ด์šฉํ•ด์„œ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๋ฅผ ํ•˜๊ธฐ ์œ„ํ•ด ์•„๋ž˜ ๋ฌธ๊ตฌ์ฒ˜๋Ÿผ ์—๋Ÿฌ๋ฌธ์ด ๋œจ๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at..

coding-yoon.tistory.com

์˜›๋‚ ์—๋Š” Pandas๋ฅผ ๊น”๋ฉด ์ž๋™์œผ๋กœ ๊น”๋ ธ์—ˆ๋Š”๋ฐ, ์ด๋ฒˆ์— ์•„์˜ˆ ๋ถ„๋ฆฌ๊ฐ€ ๋œ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. 

 

 

2. ์‚ผ์„ฑ ์ „์ž ์ฃผ์‹ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

start = (2000, 1, 1)  # 2020๋…„ 01๋…„ 01์›” 
start = datetime.datetime(*start)  
end = datetime.date.today()  # ํ˜„์žฌ 

# yahoo ์—์„œ ์‚ผ์„ฑ ์ „์ž ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 
df = pdr.DataReader('005930.KS', 'yahoo', start, end)
df.head(5)
df.tail(5)
df.Close.plot(grid=True)

 

head(5) : ๋งจ ์•ž 5๊ฐœ

 

tail(5) : ๋งจ ๋’ค 5 ๊ฐœ
์‚ผ์„ฑ ์ „์ž 2000 ~ 2020๋…„

์‚ผ์„ฑ ์ „์ž ์ข…๊ฐ€๋ฅผ 2000๋…„๋ถ€ํ„ฐ 2020๋…„์œผ๋กœ ํ•œ ๋ฒˆ์— ๋ณด๋‹ˆ ๋ฏธ์ณ ๋‚  ๋›ฐ๋„ค์š”. ์ง€๊ธˆ์ด๋ผ๋„ ์ด ํ๋ฆ„์„ ํƒ€์•ผ ํ•˜์ง€ ์•Š์„๊นŒ์š”.

์‹ญ ๋งŒ์ „์ž ๊ฐ€์ž!!

 

ํ˜น์‹œ ๋‹ค๋ฅธ ์ฃผ์‹๋„ ํ•˜๊ณ  ์‹ถ์œผ์‹œ๋ฉด  ์•ผํ›„ ํŒŒ์ด๋‚ธ์‹œ์—์„œ ์ฐพ์•„๋ณด์‹œ๋Š” ๊ฒƒ๋„ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

finance.yahoo.com/

 

Yahoo Finance - Stock Market Live, Quotes, Business & Finance News

At Yahoo Finance, you get free stock quotes, up-to-date news, portfolio management resources, international market data, social interaction and mortgage rates that help you manage your financial life.

finance.yahoo.com

๊ทธ๋ฆฌ๊ณ  ํ•™์Šต๋œ ๋ชจ๋ธ์ด ์„ฑ๋Šฅ์„ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด์„œ ์œ„ ๋ฐ์ดํ„ฐ(ํ˜„์žฌ ์•ฝ 5296๊ฐœ)๋ฅผ Train(ํ•™์Šตํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฐ์ดํ„ฐ)๋ฅผ 0๋ถ€ํ„ฐ 4499๊นŒ์ง€, Test(์„ฑ๋Šฅ ํ…Œ์ŠคํŠธํ•˜๋Š” ๋ฐ์ดํ„ฐ)๋Š” 4500๋ถ€ํ„ฐ 5295๊ฐœ ๊นŒ์ง€ ๋ฐ์ดํ„ฐ๋กœ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.

 

์˜ค๋Š˜์ž ๋Œ€๋žต, ๋…ธ๋ž€์ƒ‰ ์„  ์ •๋„๊นŒ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ํ•™์Šต์„ ํ•˜๊ณ , ๋…ธ๋ž€์ƒ‰ ์„  ์ดํ›„๋ถ€ํ„ฐ ์˜ˆ์ธก์„ ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. 

๊ณผ์—ฐ ๋‚ด๋ ค๊ฐ€๊ณ  ์˜ฌ๋ผ๊ฐ€๋Š” ํฌ์ธํŠธ๋ฅผ ์ž˜ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ์„์ง€ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค.

 

3. ๋ฐ์ดํ„ฐ์…‹ ์ค€๋น„ํ•˜๊ธฐ

"""
์ €๋„ ์ฃผ์‹์„ ์ž˜ ๋ชจ๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ์ฐธ๊ณ ํ•ด์ฃผ์‹œ๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. 
open ์‹œ๊ฐ€
high ๊ณ ๊ฐ€
low ์ €๊ฐ€
close ์ข…๊ฐ€
volume ๊ฑฐ๋ž˜๋Ÿ‰
Adj Close ์ฃผ์‹์˜ ๋ถ„ํ• , ๋ฐฐ๋‹น, ๋ฐฐ๋ถ„ ๋“ฑ์„ ๊ณ ๋ คํ•ด ์กฐ์ •ํ•œ ์ข…๊ฐ€

ํ™•์‹คํ•œ๊ฑด ๊ฑฐ๋ž˜๋Ÿ‰(Volume)์€ ๋ฐ์ดํ„ฐ์—์„œ ์ œํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๊ณ , 
Y ๋ฐ์ดํ„ฐ๋ฅผ Adj Close๋กœ ์ •ํ•ฉ๋‹ˆ๋‹ค. (์ข…๊ฐ€๋กœ ํ•ด๋„ ๋œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.)

"""
X = df.drop(columns='Volume')
y = df.iloc[:, 5:6]

print(X)
print(y)

X
y

"""
ํ•™์Šต์ด ์ž˜๋˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ ์ •๊ทœํ™” 
StandardScaler	๊ฐ ํŠน์ง•์˜ ํ‰๊ท ์„ 0, ๋ถ„์‚ฐ์„ 1์ด ๋˜๋„๋ก ๋ณ€๊ฒฝ
MinMaxScaler	์ตœ๋Œ€/์ตœ์†Œ๊ฐ’์ด ๊ฐ๊ฐ 1, 0์ด ๋˜๋„๋ก ๋ณ€๊ฒฝ
"""

from sklearn.preprocessing import StandardScaler, MinMaxScaler
mm = MinMaxScaler()
ss = StandardScaler()

X_ss = ss.fit_transform(X)
y_mm = mm.fit_transform(y) 

# Train Data
X_train = X_ss[:4500, :]
X_test = X_ss[4500:, :]

# Test Data 
"""
( ๊ตณ์ด ์—†์–ด๋„ ๋œ๋‹ค. ํ•˜์ง€๋งŒ ์–ผ๋งˆ๋‚˜ ์˜ˆ์ธก๋ฐ์ดํ„ฐ์™€ ์‹ค์ œ ๋ฐ์ดํ„ฐ์˜ ์ •ํ™•๋„๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด 
from sklearn.metrics import accuracy_score ๋ฅผ ํ†ตํ•ด ์ •ํ™•ํ•œ ๊ฐ’์œผ๋กœ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. )
"""
y_train = y_mm[:4500, :]
y_test = y_mm[4500:, :] 

print("Training Shape", X_train.shape, y_train.shape)
print("Testing Shape", X_test.shape, y_test.shape) 

numpy ํ˜•ํƒœ : ์ด ์ƒํƒœ์—์„œ๋Š” ํ•™์Šต์ด ๋ถˆ๊ฐ€๋Šฅ. 

"""
torch Variable์—๋Š” 3๊ฐœ์˜ ํ˜•ํƒœ๊ฐ€ ์žˆ๋‹ค. 
data, grad, grad_fn ํ•œ ๋ฒˆ ๊ตฌ๊ธ€์— ์ฐพ์•„์„œ ๊ณต๋ถ€ํ•ด๋ณด๊ธธ ๋ฐ”๋ž๋‹ˆ๋‹ค. 
"""
X_train_tensors = Variable(torch.Tensor(X_train))
X_test_tensors = Variable(torch.Tensor(X_test))

y_train_tensors = Variable(torch.Tensor(y_train))
y_test_tensors = Variable(torch.Tensor(y_test))

X_train_tensors_final = torch.reshape(X_train_tensors,   (X_train_tensors.shape[0], 1, X_train_tensors.shape[1]))
X_test_tensors_final = torch.reshape(X_test_tensors,  (X_test_tensors.shape[0], 1, X_test_tensors.shape[1])) 

print("Training Shape", X_train_tensors_final.shape, y_train_tensors.shape)
print("Testing Shape", X_test_tensors_final.shape, y_test_tensors.shape) 

ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•ด Torch๋กœ ๋ณ€ํ™˜

 

 

4. GPU ์ค€๋น„ํ•˜๊ธฐ (์—†์œผ๋ฉด CPU๋กœ ๋Œ๋ฆฌ๋ฉด ๋ฉ๋‹ˆ๋‹ค.)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")  # device
print(torch.cuda.get_device_name(0))

 

 

5. LSTM ๋„คํŠธ์›Œํฌ ๊ตฌ์„ฑํ•˜๊ธฐ

class LSTM1(nn.Module):
  def __init__(self, num_classes, input_size, hidden_size, num_layers, seq_length):
    super(LSTM1, self).__init__()
    self.num_classes = num_classes #number of classes
    self.num_layers = num_layers #number of layers
    self.input_size = input_size #input size
    self.hidden_size = hidden_size #hidden state
    self.seq_length = seq_length #sequence length
 
    self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size,
                      num_layers=num_layers, batch_first=True) #lstm
    self.fc_1 =  nn.Linear(hidden_size, 128) #fully connected 1
    self.fc = nn.Linear(128, num_classes) #fully connected last layer

    self.relu = nn.ReLU() 

  def forward(self,x):
    h_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)).to(device) #hidden state
    c_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)).to(device) #internal state   
    # Propagate input through LSTM

    output, (hn, cn) = self.lstm(x, (h_0, c_0)) #lstm with input, hidden, and internal state
   
    hn = hn.view(-1, self.hidden_size) #reshaping the data for Dense layer next
    out = self.relu(hn)
    out = self.fc_1(out) #first Dense
    out = self.relu(out) #relu
    out = self.fc(out) #Final Output
   
    return out 

 

 

์œ„ ์ฝ”๋“œ๋Š” ๋ณต์žกํ•ด ๋ณด์ด์ง€๋งŒ, ์‹ค์ƒ ํ•˜๋‚˜์”ฉ ํ™•์ธํ•ด๋ณด๋ฉด ๊ต‰์žฅํžˆ ์—ฐ์‚ฐ์ด ์ ์€ ๋„คํŠธ์›Œํฌ์ž…๋‹ˆ๋‹ค. 

์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์ด์ง€๋งŒ, ๊ฐ„๋‹จํ•œ ๊ตฌ์„ฑ์„ ์œ„ํ•ด Sequence Length๋„ 1์ด๊ณ , LSTM Layer๋„ 1์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ต‰์žฅํžˆ ๋นจ๋ฆฌ ๋๋‚ฉ๋‹ˆ๋‹ค. ์•„๋งˆ ๋ณธ๋ฌธ ์ž‘์„ฑ์ž๊ฐ€ CPUํ™˜๊ฒฝ์—์„œ๋„ ์‰ฝ๊ฒŒ ๋”ฐ๋ผ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๊ฐ„๋‹จํ•˜๊ฒŒ ์ž‘์„ฑํ•œ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. 

 

์•„๋ž˜๋Š” Pytorch๋กœ RNN์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ ์—ˆ์ง€๋งŒ, LSTM๊ณผ ๋™์ผํ•ฉ๋‹ˆ๋‹ค. 

๊ธฐ๋ณธ ๋™์ž‘ ์›๋ฆฌ๋งŒ ์ดํ•ดํ•˜์‹œ๋ฉด, ์‰ฝ๊ฒŒ ๋”ฐ๋ผ ํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

coding-yoon.tistory.com/55

 

[๋”ฅ๋Ÿฌ๋‹] RNN with PyTorch ( RNN ๊ธฐ๋ณธ ๊ตฌ์กฐ, ์‚ฌ์šฉ ๋ฐฉ๋ฒ• )

์˜ค๋Š˜์€ Pytorch๋ฅผ ํ†ตํ•ด RNN์„ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. https://www.youtube.com/watch?v=bPRfnlG6dtU&t=2674s RNN์˜ ๊ธฐ๋ณธ๊ตฌ์กฐ๋ฅผ ๋ชจ๋ฅด์‹œ๋ฉด ์œ„ ๋งํฌ๋ฅผ ๋ณด์‹œ๋Š”๊ฑธ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค. Pytorch document์— RNN์„ ํ™•์ธํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ht..

coding-yoon.tistory.com

 

 

5. ๋„คํŠธ์›Œํฌ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ตฌ์„ฑํ•˜๊ธฐ 

num_epochs = 30000 #1000 epochs
learning_rate = 0.00001 #0.001 lr

input_size = 5 #number of features
hidden_size = 2 #number of features in hidden state
num_layers = 1 #number of stacked lstm layers

num_classes = 1 #number of output classes 
lstm1 = LSTM1(num_classes, input_size, hidden_size, num_layers, X_train_tensors_final.shape[1]).to(device)

loss_function = torch.nn.MSELoss()    # mean-squared error for regression
optimizer = torch.optim.Adam(lstm1.parameters(), lr=learning_rate)  # adam optimizer

 

6. ํ•™์Šตํ•˜๊ธฐ

for epoch in range(num_epochs):
  outputs = lstm1.forward(X_train_tensors_final.to(device)) #forward pass
  optimizer.zero_grad() #caluclate the gradient, manually setting to 0
 
  # obtain the loss function
  loss = loss_function(outputs, y_train_tensors.to(device))

  loss.backward() #calculates the loss of the loss function
 
  optimizer.step() #improve from loss, i.e backprop
  if epoch % 100 == 0:
    print("Epoch: %d, loss: %1.5f" % (epoch, loss.item())) 

 

7. ์˜ˆ์ธกํ•˜๊ธฐ

df_X_ss = ss.transform(df.drop(columns='Volume'))
df_y_mm = mm.transform(df.iloc[:, 5:6])

df_X_ss = Variable(torch.Tensor(df_X_ss)) #converting to Tensors
df_y_mm = Variable(torch.Tensor(df_y_mm))
#reshaping the dataset
df_X_ss = torch.reshape(df_X_ss, (df_X_ss.shape[0], 1, df_X_ss.shape[1]))
train_predict = lstm1(df_X_ss.to(device))#forward pass
data_predict = train_predict.data.detach().cpu().numpy() #numpy conversion
dataY_plot = df_y_mm.data.numpy()

data_predict = mm.inverse_transform(data_predict) #reverse transformation
dataY_plot = mm.inverse_transform(dataY_plot)
plt.figure(figsize=(10,6)) #plotting
plt.axvline(x=4500, c='r', linestyle='--') #size of the training set

plt.plot(dataY_plot, label='Actuall Data') #actual plot
plt.plot(data_predict, label='Predicted Data') #predicted plot
plt.title('Time-Series Prediction')
plt.legend()
plt.show() 

 

 

๋นจ๊ฐ„์ƒ‰ ์„  ์ดํ›„๋ถ€ํ„ฐ ๋ชจ๋ธ์ด ์˜ˆ์ธก์„ ํ•œ ๊ฒƒ์ธ๋ฐ ๋‚˜๋ฆ„ ๋น„์Šทํ•˜๊ฒŒ ๋‚˜์˜จ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

ํ•˜์ง€๋งŒ ์ธ๊ณต์ง€๋Šฅ์ด๋ผ๋„ ํŒ”๋งŒ ์ „์ž๋Š” ์˜ˆ์ƒํ•˜์ง€ ๋ชปํ–ˆ๋‚˜ ๋ด…๋‹ˆ๋‹ค. 

728x90
๋ฐ˜์‘ํ˜•
18์ง„์ˆ˜