현재 폴더 경로
각 폴더에는 아래와 같이 전처리 되지 않은 CSV가 존재한다.
[JS01-210210_222834_p9_고정, JS02-210210_222841_p9고정, JS03-210210_222859_p9_고정]과 같이 데이터 프레임을 병합하고, 분석에 알맞게 전처리를 하는 과정을 보여준다.
아래는 실험 데이터로 중간에 빈 데이터가 보이고, 전처리 과정이 필요한 원본 데이터이다.
이제부터 Pandas를 통해 데이터 분석에 알맞게 전처리를 할 것이다. (위 preprocessing data가 최종 목표 데이터)
* Section 1 *
0. 모든 CSV를 불러온다.
1. Null/NaN(결손) 데이터를 제거한다.
ex) (df : dataframe) df.isna.sum() 를 통해서 결손데이터를 파악할 수 있다. ( df.info()도 확인가능하다. )
2. 필요없는 열(bootms)을 삭제한다.
3. Column 이름을 변경한다. (" aCnt" ▶"aCnt", " RSSI" ▶ "RSSI")
( " aCnt"와 "aCnt"의 차이는 띄어쓰기이다. df.columns 를 통해 정확한 column name을 작성한다. 띄어쓰기 구별 必)
4. 1001 이상 aCnt 행을 삭제한다.
5. aCnt를 Group 내어 평균한다.
6. CSV 데이터로 저장한다.
import pandas as pd
import os
# 모든 CSV 경로를 변수에 저장한다..
list_folder_name = ['JS01', 'JS02', 'JS03']
list_csv_folder = [os.path.join(os.getcwd(), file_name) for file_name in list_folder_name]
list_p = []
list_path = []
for i in list_csv_folder:
_, _, file_names = next(os.walk(i))
for n, j in enumerate(file_names):
list_path.append(os.path.join(i,j))
print(list_path[n])
# result
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_222834_p9_고정.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_224020.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_225507.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_225821.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230150.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230314.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230540.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230845.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_222834_p9_고정.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_224020.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_225507.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_225821.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230150.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230314.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230540.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230845.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_222834_p9_고정.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_224020.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_225507.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_225821.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230150.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230314.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230540.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230845.csv
# 파일 별로 1번 ~ 6번 과정을 반복한다.
# 0. 모든 CSV를 불러온다.
for i in range(len(list_path)):
print(list_path[i])
# 0_1. CSV를 불러온다.
csv_data = pd.read_csv(list_path[i])
# 1. Null/NaN(결손) 데이터를 제거한다.
csv_data.dropna(inplace=True)
# 2. 필요없는 열(bootms)을 삭제한다.
csv_data.rename(columns={" aCnt":"aCnt", " RSSI":"RSSI"}, inplace=True)
# 3. Column 이름을 변경한다. (" aCnt" ▶"aCnt", " RSSI" ▶ "RSSI")
csv_data.drop(columns=['bootms'], inplace=True)
# 4. 1001 이상 aCnt 행을 삭제한다.
csv_data.query('aCnt<=1000', inplace=True)
# 5. aCnt를 Group 내어 평균한다.
csv_data_group = csv_data.groupby(['aCnt']).mean().round(1)
# 6. CSV 데이터로 저장한다.
csv_data_group.to_csv(list_path[i], mode='w', index=True)
1차적으로 필요없는 열 제거, 결손 데이터 제거 등 필수 작업은 끝이 났다.
이제 분석을 하기 위해 Dataframe 병합, 분석에 필요한 열 추가를 할 것이다.
* Section 2 *
import pandas as pd
import numpy as np
import os
list_folder_name = ['JS01', 'JS02', 'JS03']
list_csv_folder = [os.path.join(os.getcwd(), file_name) for file_name in list_folder_name]
list_path = []
for i in list_csv_folder:
_, _, file_names = next(os.walk(i))
for j in file_names:
list_path.append(os.path.join(i,j))
# result
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_222834_p9_고정.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_224020.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_225507.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_225821.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230150.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230314.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230540.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230845.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_222834_p9_고정.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_224020.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_225507.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_225821.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230150.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230314.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230540.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230845.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_222834_p9_고정.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_224020.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_225507.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_225821.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230150.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230314.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230540.csv
c:\Users\mangnani\Desktop\Develop\VS Code\Python37\[티스토리]__Pandas전처리\JS01\210210_230845.csv
np_path = np.array(list_path) # (24)
np_path = np_path.reshape(len(list_folder_name), -1) # (3, 8)
np_path_trans = np_path.T # Transpose (8, 3)
print(np_path_trans)
# result
"""
[['c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS01\\210210_222834_p9_고정.csv'
'c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS02\\210210_222841_p9고정.csv'
'c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS03\\210210_222859_p9_고정.csv']
['c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS01\\210210_224020.csv'
'c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS02\\210210_224045.csv'
'c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS03\\210210_224041.csv']
['c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS01\\210210_225507.csv'
'c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS02\\210210_225511.csv'
'c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS03\\210210_225519.csv']
['c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS01\\210210_225821.csv'
'c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS02\\210210_225828.csv'
'c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS03\\210210_225837.csv']
['c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS01\\210210_230150.csv'
'c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS02\\210210_230159.csv'
'c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS03\\210210_230155.csv']
['c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS01\\210210_230314.csv'
'c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS02\\210210_230325.csv'
'c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS03\\210210_230329.csv']
['c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS01\\210210_230540.csv'
'c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS02\\210210_230607.csv'
'c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS03\\210210_230601.csv']
['c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS01\\210210_230845.csv'
'c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS02\\210210_230850.csv'
'c:\\Users\\mangnani\\Desktop\\Develop\\VS Code\\Python37\\[티스토리]__Pandas전처리\\JS03\\210210_230903.csv']]
"""
df_1 = pd.read_csv(np_path_trans[file_index][0])
df_2 = pd.read_csv(np_path_trans[file_index][1])
df_3 = pd.read_csv(np_path_trans[file_index][2])
print(df_1, df_2, df_3, sep="\n")
# result
aCnt RSSI
0 0.0 -73.2
1 1.0 -71.5
2 2.0 -72.5
3 3.0 -72.0
4 4.0 -73.0
.. ... ...
988 996.0 -73.3
989 997.0 -73.3
990 998.0 -74.0
991 999.0 -72.7
992 1000.0 -73.0
[993 rows x 2 columns]
aCnt RSSI
0 0.0 -85.0
1 1.0 -84.0
2 2.0 -85.0
3 4.0 -81.0
4 5.0 -82.0
.. ... ...
741 996.0 -84.0
742 997.0 -83.0
743 998.0 -81.5
744 999.0 -80.0
745 1000.0 -81.5
[746 rows x 2 columns]
aCnt RSSI
0 0.0 -80.5
1 1.0 -82.0
2 2.0 -81.0
3 3.0 -85.0
4 4.0 -81.0
.. ... ...
759 982.0 -83.0
760 984.0 -83.0
761 990.0 -87.0
762 992.0 -86.0
763 999.0 -85.0
[764 rows x 2 columns]
데이터 프레임을 자세히 보면 세 개의 df 행의 갯수가 다름을 알 수 있다. ( 이는 데이터 특성상 aCnt가 없는 것을 의미한다.)
각각 데이터 프레임 aCnt의 교집합을 찾아 병합할 것이다.
df_tmp = pd.merge(df_1, df_2, on='aCnt', how='inner')
df_rssi = pd.merge(df_tmp, df_3, on='aCnt', how='inner')
df_rssi
# result
aCnt RSSI_x RSSI_y RSSI
0 0.0 -73.2 -85.0 -80.5
1 1.0 -71.5 -84.0 -82.0
2 2.0 -72.5 -85.0 -81.0
3 4.0 -73.0 -81.0 -81.0
4 5.0 -73.0 -82.0 -83.5
... ... ... ... ...
586 982.0 -73.3 -83.3 -83.0
587 984.0 -73.3 -81.0 -83.0
588 990.0 -73.0 -83.0 -87.0
589 992.0 -73.7 -82.5 -86.0
590 999.0 -72.7 -80.0 -85.0
591 rows × 4 columns
Colum name을 알맞게 변경한다.
df_rssi.rename(columns={"RSSI_x":"RSSI_1", "RSSI_y":"RSSI_2", "RSSI":"RSSI_3"}, inplace=True)
df_rssi
# result
aCnt RSSI_1 RSSI_2 RSSI_3
0 0.0 -73.2 -85.0 -80.5
1 1.0 -71.5 -84.0 -82.0
2 2.0 -72.5 -85.0 -81.0
3 4.0 -73.0 -81.0 -81.0
4 5.0 -73.0 -82.0 -83.5
... ... ... ... ...
586 982.0 -73.3 -83.3 -83.0
587 984.0 -73.3 -81.0 -83.0
588 990.0 -73.0 -83.0 -87.0
589 992.0 -73.7 -82.5 -86.0
590 999.0 -72.7 -80.0 -85.0
591 rows × 4 columns
RSSI를 통한 Distance를 추가할 것이다. ( lambda 함수 사용 )
def lambda_distance(rssi, tp=-64):
ratio = rssi*1.0/tp
if ratio < 1.0:
return pow(ratio, 10)
else:
distance = (0.89976)*pow(ratio,7.7095) + 0.111
return distance
df_rssi['distance_1'] = df_rssi['RSSI_1'].apply(lambda rssi:lambda_distance(rssi))
df_rssi['distance_2'] = df_rssi['RSSI_2'].apply(lambda rssi:lambda_distance(rssi))
df_rssi['distance_3'] = df_rssi['RSSI_3'].apply(lambda rssi:lambda_distance(rssi))
df_rssi
# result
aCnt RSSI_1 RSSI_2 RSSI_3 distance_1 distance_2 distance_3
0 0.0 -73.2 -85.0 -80.5 2.645139 8.132164 5.384701
1 1.0 -71.5 -84.0 -82.0 2.225242 7.432724 6.191371
2 2.0 -72.5 -85.0 -81.0 2.464196 8.132164 5.642557
3 4.0 -73.0 -81.0 -81.0 2.592246 5.642557 5.642557
4 5.0 -73.0 -82.0 -83.5 2.592246 6.191371 7.103364
... ... ... ... ... ... ... ...
586 982.0 -73.3 -83.3 -83.0 2.671951 6.975277 6.786975
587 984.0 -73.3 -81.0 -83.0 2.671951 5.642557 6.786975
588 990.0 -73.0 -83.0 -87.0 2.592246 6.786975 9.707345
589 992.0 -73.7 -82.5 -86.0 2.781686 6.483119 8.889049
590 999.0 -72.7 -80.0 -85.0 2.514708 5.137370 8.132164
591 rows × 7 columns
df_rssi.to_csv(csv_name, mode='w', index=False)
'🐍 Python > Project' 카테고리의 다른 글
네이버 블로그 자동 공감 누르기 (0) | 2023.10.02 |
---|---|
네이버 블로그 서로 이웃 추가 프로그램 ver1.0, ver2.0 (1) | 2023.10.02 |
[파이썬 프로젝트] Python Struct (feat.c언어) (1) | 2021.09.09 |
[파이썬 프로젝트] Python CSV 이어서 저장하기, header 붙이기 (0) | 2021.09.07 |