๋ฌธ1) weatherAUS.csv ํŒŒ์ผ์„ ์‹œ์šฉํ•˜์—ฌ NB ๋ชจ๋ธ์„ ์ƒ์„ฑํ•˜์‹œ์˜ค
๋‹จ๊ณ„1> NaN ๊ฐ’์„ ๊ฐ€์ง„ ๋ชจ๋“  row ์‚ญ์ œ 
๋‹จ๊ณ„2> 1,2,8,10,11,22,23 ์นผ๋Ÿผ ์ œ์™ธ 
๋‹จ๊ณ„3> ๋ณ€์ˆ˜ ์„ ํƒ  : y๋ณ€์ˆ˜ : RainTomorrow, x๋ณ€์ˆ˜ : ๋‚˜๋จธ์ง€ ๋ณ€์ˆ˜(16๊ฐœ)
๋‹จ๊ณ„4> 7:3 ๋น„์œจ train/test ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์„ฑ   
๋‹จ๊ณ„5> GaussianNB ๋ชจ๋ธ ์ƒ์„ฑ 
๋‹จ๊ณ„6> model ํ‰๊ฐ€ : accuracy, confusion matrix, classification_report

 

import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split 
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

data = pd.read_csv('C:/ITWILL/4_Python-II/data/weatherAUS.csv')
print(data.head())
print(data.info())


๋‹จ๊ณ„1> NaN ๊ฐ’์„ ๊ฐ€์ง„ ๋ชจ๋“  row ์‚ญ์ œ

data=data.dropna()
print(data.head())


์กฐ๊ฑด2> 1,2,8,10,11,22,23 ์นผ๋Ÿผ ์ œ์™ธ 

cols = list(data.columns) # ์ „์ฒด ์นผ๋Ÿผ ์ถ”์ถœ 
colnames = [] # ์‚ฌ์šฉํ•  ์นผ๋Ÿผ ์ €์žฅ 

for i in range(24) :
    if i not in [0,1,7,9,10,21,22] : # ํ•ด๋‹น ์นผ๋Ÿผ ์ œ์™ธ 
        colnames.append(cols[i]) 
    
new_data = data[colnames]
print(new_data.info()) # x+y

new_data.RainTomorrow.value_counts()

No     13426
Yes     3952

 


๋‹จ๊ณ„3> ๋ณ€์ˆ˜ ์„ ํƒ  : y๋ณ€์ˆ˜ : RainTomorrow, x๋ณ€์ˆ˜ : ๋‚˜๋จธ์ง€ ๋ณ€์ˆ˜(16๊ฐœ)

cols = list(new_data.columns)
cols 
y = new_data[cols[-1]]
X = new_data[cols[:-1]]
X.shape # (17378, 16)

 


๋‹จ๊ณ„4> 7:3 ๋น„์œจ train/test ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์„ฑ

X_train,X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=123)



๋‹จ๊ณ„5> GaussianNB ๋ชจ๋ธ ์ƒ์„ฑ 

model = GaussianNB().fit(X=X_train, y=y_train)


  
๋‹จ๊ณ„6> model ํ‰๊ฐ€ : accuracy, confusion matrix, classification_report

y_pred = model.predict(X = X_test) # class ์˜ˆ์ธก์น˜ 
y_true = y_test


1) confusion_matrix

con_mat = confusion_matrix(y_true, y_pred)
con_mat

array([[3397,  658],
       [ 349,  810]], dtype=int64)

2) accuracy_score

acc = accuracy_score(y_true, y_pred)      
acc # 0.8068661296509397


3) classification_report

report = classification_report(y_true, y_pred)
print(report)

              precision    recall  f1-score   support

          No       0.91      0.84      0.87      4055
         Yes       0.55      0.70      0.62      1159

    accuracy                           0.81      5214
   macro avg       0.73      0.77      0.74      5214
weighted avg       0.83      0.81      0.81      5214

 

 

 

 

 

๋ฌธ2) SVM
- ์„ ํ˜• SVM, ๋น„์„ ํ˜• SVM
- Hyper parameger : kernel, C, gamma

 

from sklearn.svm import SVC #svm model 
from sklearn.datasets import load_iris #dataset 
from sklearn.model_selection import train_test_split #dataset split
from sklearn.metrics import accuracy_score #ํ‰๊ฐ€



1. dataset load 

X, y = load_iris(return_X_y= True)
X.shape #(569, 30)



2. train/test split 

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=123)



3. ๋น„์„ ํ˜• SVM ๋ชจ๋ธ 

model = SVC(C=1.0, kernel='rbf', gamma='scale').fit(X=X_train, y=y_train)


model ํ‰๊ฐ€ 

y_pred = model.predict(X = X_test)

acc = accuracy_score(y_test, y_pred)
print('rbf accuracy =',acc) #accuracy = 0.9005847953216374



4. ์„ ํ˜• SVM : ์„ ํ˜•๋ถ„๋ฅ˜ ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ(noise ์—†๋Š” ๋ฐ์ดํ„ฐ) 

model2 = SVC(C=1.0, kernel='linear', gamma='scale').fit(X=X_train, y=y_train)


model ํ‰๊ฐ€ 

y_pred = model2.predict(X = X_test)

acc = accuracy_score(y_test, y_pred)
print('linear accuracy =',acc) #accuracy = 0.9707602339181286

 


5. GridSearch model : best parameters 

from sklearn.model_selection import GridSearchCV #best parameters

parmas = {'kernel' : ['rbf', 'linear'],
          'C' : [0.01, 0.1, 1.0, 10, 100],
          'gamma': ['scale', 'auto']} #dict ์ •์˜


cv = 5 : 5๊ฒน ๊ต์ฐจ๊ฒ€์ • 

grid_model = GridSearchCV(model, param_grid=parmas, 
                          scoring='accuracy',cv=5, n_jobs=-1)
grid_model = grid_model.fit(X, y)

print('best score =', grid_model.best_score_) #best score = 0.980000000000000
print('best parameters =', grid_model.best_params_)
'''
best score = 0.9800000000000001
best parameters = {'C': 1.0, 'gamma': 'scale', 'kernel': 'linear'}
'''

+ Recent posts