๋ฌธ1) weatherAUS.csv ํ์ผ์ ์์ฉํ์ฌ NB ๋ชจ๋ธ์ ์์ฑํ์์ค
๋จ๊ณ1> NaN ๊ฐ์ ๊ฐ์ง ๋ชจ๋ row ์ญ์
๋จ๊ณ2> 1,2,8,10,11,22,23 ์นผ๋ผ ์ ์ธ
๋จ๊ณ3> ๋ณ์ ์ ํ : y๋ณ์ : RainTomorrow, x๋ณ์ : ๋๋จธ์ง ๋ณ์(16๊ฐ)
๋จ๊ณ4> 7:3 ๋น์จ train/test ๋ฐ์ดํฐ์ ๊ตฌ์ฑ
๋จ๊ณ5> GaussianNB ๋ชจ๋ธ ์์ฑ
๋จ๊ณ6> model ํ๊ฐ : accuracy, confusion matrix, classification_report
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
data = pd.read_csv('C:/ITWILL/4_Python-II/data/weatherAUS.csv')
print(data.head())
print(data.info())
๋จ๊ณ1> NaN ๊ฐ์ ๊ฐ์ง ๋ชจ๋ row ์ญ์
data=data.dropna()
print(data.head())
์กฐ๊ฑด2> 1,2,8,10,11,22,23 ์นผ๋ผ ์ ์ธ
cols = list(data.columns) # ์ ์ฒด ์นผ๋ผ ์ถ์ถ
colnames = [] # ์ฌ์ฉํ ์นผ๋ผ ์ ์ฅ
for i in range(24) :
if i not in [0,1,7,9,10,21,22] : # ํด๋น ์นผ๋ผ ์ ์ธ
colnames.append(cols[i])
new_data = data[colnames]
print(new_data.info()) # x+y
new_data.RainTomorrow.value_counts()
No 13426
Yes 3952
๋จ๊ณ3> ๋ณ์ ์ ํ : y๋ณ์ : RainTomorrow, x๋ณ์ : ๋๋จธ์ง ๋ณ์(16๊ฐ)
cols = list(new_data.columns)
cols
y = new_data[cols[-1]]
X = new_data[cols[:-1]]
X.shape # (17378, 16)
๋จ๊ณ4> 7:3 ๋น์จ train/test ๋ฐ์ดํฐ์
๊ตฌ์ฑ
X_train,X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=123)
๋จ๊ณ5> GaussianNB ๋ชจ๋ธ ์์ฑ
model = GaussianNB().fit(X=X_train, y=y_train)
๋จ๊ณ6> model ํ๊ฐ : accuracy, confusion matrix, classification_report
y_pred = model.predict(X = X_test) # class ์์ธก์น
y_true = y_test
1) confusion_matrix
con_mat = confusion_matrix(y_true, y_pred)
con_mat
array([[3397, 658],
[ 349, 810]], dtype=int64)
2) accuracy_score
acc = accuracy_score(y_true, y_pred)
acc # 0.8068661296509397
3) classification_report
report = classification_report(y_true, y_pred)
print(report)
precision recall f1-score support
No 0.91 0.84 0.87 4055
Yes 0.55 0.70 0.62 1159
accuracy 0.81 5214
macro avg 0.73 0.77 0.74 5214
weighted avg 0.83 0.81 0.81 5214
๋ฌธ2) SVM
- ์ ํ SVM, ๋น์ ํ SVM
- Hyper parameger : kernel, C, gamma
from sklearn.svm import SVC #svm model
from sklearn.datasets import load_iris #dataset
from sklearn.model_selection import train_test_split #dataset split
from sklearn.metrics import accuracy_score #ํ๊ฐ
1. dataset load
X, y = load_iris(return_X_y= True)
X.shape #(569, 30)
2. train/test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=123)
3. ๋น์ ํ SVM ๋ชจ๋ธ
model = SVC(C=1.0, kernel='rbf', gamma='scale').fit(X=X_train, y=y_train)
model ํ๊ฐ
y_pred = model.predict(X = X_test)
acc = accuracy_score(y_test, y_pred)
print('rbf accuracy =',acc) #accuracy = 0.9005847953216374
4. ์ ํ SVM : ์ ํ๋ถ๋ฅ ๊ฐ๋ฅํ ๋ฐ์ดํฐ(noise ์๋ ๋ฐ์ดํฐ)
model2 = SVC(C=1.0, kernel='linear', gamma='scale').fit(X=X_train, y=y_train)
model ํ๊ฐ
y_pred = model2.predict(X = X_test)
acc = accuracy_score(y_test, y_pred)
print('linear accuracy =',acc) #accuracy = 0.9707602339181286
5. GridSearch model : best parameters
from sklearn.model_selection import GridSearchCV #best parameters
parmas = {'kernel' : ['rbf', 'linear'],
'C' : [0.01, 0.1, 1.0, 10, 100],
'gamma': ['scale', 'auto']} #dict ์ ์
cv = 5 : 5๊ฒน ๊ต์ฐจ๊ฒ์
grid_model = GridSearchCV(model, param_grid=parmas,
scoring='accuracy',cv=5, n_jobs=-1)
grid_model = grid_model.fit(X, y)
print('best score =', grid_model.best_score_) #best score = 0.980000000000000
print('best parameters =', grid_model.best_params_)
'''
best score = 0.9800000000000001
best parameters = {'C': 1.0, 'gamma': 'scale', 'kernel': 'linear'}
'''
'๊ฐ์ธ๊ณต๋ถ > Python' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
71. NIPA AI์จ๋ผ์ธ ๊ต์ก AI ์ค๋ฌด ์์ฉ ๊ณผ์ (3) ์ง๋ํ์ต - ๋ถ๋ฅ (0) | 2021.12.05 |
---|---|
70. NIPA AI์จ๋ผ์ธ ๊ต์ก AI ์ค๋ฌด ์์ฉ ๊ณผ์ (2) ์ง๋ํ์ต - ํ๊ท (0) | 2021.12.04 |
68. NIPA AI์จ๋ผ์ธ ๊ต์ก AI ์ค๋ฌด ์์ฉ ๊ณผ์ (1) ์๋ฃํํ, ๋ฐ์ดํฐ ์ ์ฒ๋ฆฌ (0) | 2021.12.02 |
67. Python Regression ์ฐ์ต๋ฌธ์ (0) | 2021.12.01 |
66. Python Statis Scipy ์ฐ์ต๋ฌธ์ (0) | 2021.11.30 |