69. Python Classification 연습문제

LEE_BOMB 2021. 12. 3. 21:34

2021. 12. 3. 21:34

문1) weatherAUS.csv 파일을 시용하여 NB 모델을 생성하시오
단계1> NaN 값을 가진 모든 row 삭제
단계2> 1,2,8,10,11,22,23 칼럼 제외
단계3> 변수 선택 : y변수 : RainTomorrow, x변수 : 나머지 변수(16개)
단계4> 7:3 비율 train/test 데이터셋 구성
단계5> GaussianNB 모델 생성
단계6> model 평가 : accuracy, confusion matrix, classification_report

import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split 
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

data = pd.read_csv('C:/ITWILL/4_Python-II/data/weatherAUS.csv')
print(data.head())
print(data.info())

단계1> NaN 값을 가진 모든 row 삭제

data=data.dropna()
print(data.head())

조건2> 1,2,8,10,11,22,23 칼럼 제외

cols = list(data.columns) # 전체 칼럼 추출 
colnames = [] # 사용할 칼럼 저장 

for i in range(24) :
    if i not in [0,1,7,9,10,21,22] : # 해당 칼럼 제외 
        colnames.append(cols[i]) 
    
new_data = data[colnames]
print(new_data.info()) # x+y

new_data.RainTomorrow.value_counts()

No 13426
Yes 3952

단계3> 변수 선택 : y변수 : RainTomorrow, x변수 : 나머지 변수(16개)

cols = list(new_data.columns)
cols 
y = new_data[cols[-1]]
X = new_data[cols[:-1]]
X.shape # (17378, 16)

단계4> 7:3 비율 train/test 데이터셋 구성

X_train,X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=123)

단계5> GaussianNB 모델 생성

model = GaussianNB().fit(X=X_train, y=y_train)

단계6> model 평가 : accuracy, confusion matrix, classification_report

y_pred = model.predict(X = X_test) # class 예측치 
y_true = y_test

1) confusion_matrix

con_mat = confusion_matrix(y_true, y_pred)
con_mat

array([[3397, 658],
[ 349, 810]], dtype=int64)

2) accuracy_score

acc = accuracy_score(y_true, y_pred)      
acc # 0.8068661296509397

3) classification_report

report = classification_report(y_true, y_pred)
print(report)

precision    recall  f1-score   support

          No       0.91      0.84      0.87      4055
         Yes       0.55      0.70      0.62      1159

    accuracy                           0.81      5214
   macro avg       0.73      0.77      0.74      5214
weighted avg       0.83      0.81      0.81      5214

문2) SVM
- 선형 SVM, 비선형 SVM
- Hyper parameger : kernel, C, gamma

from sklearn.svm import SVC #svm model 
from sklearn.datasets import load_iris #dataset 
from sklearn.model_selection import train_test_split #dataset split
from sklearn.metrics import accuracy_score #평가

1. dataset load

X, y = load_iris(return_X_y= True)
X.shape #(569, 30)

2. train/test split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=123)

3. 비선형 SVM 모델

model = SVC(C=1.0, kernel='rbf', gamma='scale').fit(X=X_train, y=y_train)

model 평가

y_pred = model.predict(X = X_test)

acc = accuracy_score(y_test, y_pred)
print('rbf accuracy =',acc) #accuracy = 0.9005847953216374

4. 선형 SVM : 선형분류 가능한 데이터(noise 없는 데이터)

model2 = SVC(C=1.0, kernel='linear', gamma='scale').fit(X=X_train, y=y_train)

model 평가

y_pred = model2.predict(X = X_test)

acc = accuracy_score(y_test, y_pred)
print('linear accuracy =',acc) #accuracy = 0.9707602339181286

5. GridSearch model : best parameters

from sklearn.model_selection import GridSearchCV #best parameters

parmas = {'kernel' : ['rbf', 'linear'],
          'C' : [0.01, 0.1, 1.0, 10, 100],
          'gamma': ['scale', 'auto']} #dict 정의

cv = 5 : 5겹 교차검정

grid_model = GridSearchCV(model, param_grid=parmas, 
                          scoring='accuracy',cv=5, n_jobs=-1)
grid_model = grid_model.fit(X, y)

print('best score =', grid_model.best_score_) #best score = 0.980000000000000
print('best parameters =', grid_model.best_params_)
'''
best score = 0.9800000000000001
best parameters = {'C': 1.0, 'gamma': 'scale', 'kernel': 'linear'}
'''

'개인공부 > Python' 카테고리의 다른 글

71. NIPA AI온라인 교육 AI 실무 응용 과정(3) 지도학습 - 분류 (0)	2021.12.05
70. NIPA AI온라인 교육 AI 실무 응용 과정(2) 지도학습 - 회귀 (0)	2021.12.04
68. NIPA AI온라인 교육 AI 실무 응용 과정(1) 자료형태, 데이터 전처리 (0)	2021.12.02
67. Python Regression 연습문제 (0)	2021.12.01
66. Python Statis Scipy 연습문제 (0)	2021.11.30

💣

69. Python Classification 연습문제

'개인공부 > Python' 카테고리의 다른 글

+ Recent posts

티스토리툴바