๊ฐœ์ธ๊ณต๋ถ€/Python

105. ํŒŒ์ด๋„ ํ”„๋กœ์ ํŠธ (11)Random Forest, Dicision Tree๋ชจ๋ธ ๋งŒ๋“ค๊ธฐ

LEE_BOMB 2022. 1. 14. 20:15

0. ๋ชจ๋“ˆ ์ž„ํฌํŠธ

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import sklearn.metrics as mt
import matplotlib.pyplot as plt
import seaborn as sns



1. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

movie = pd.read_csv(r'๊ฒฝ๋กœ๋ช…\ํŒŒ์ผ๋ช….csv')
movie.info()
movie.isna().sum() #๊ฒฐ์ธก์น˜ ํ™•์ธ
#movie = movie.dropna(axis=0) #๊ฒฐ์ธก์น˜ ํ–‰ ์‚ญ์ œ
movie



2. ํƒ€๊ฒŸ(y)๋ณ€์ˆ˜, x๋ณ€์ˆ˜ ์ง€์ •

y = movie["์ฒœ๋งŒ๊ด€๊ฐ๋ŒํŒŒ์—ฌ๋ถ€"]
#y = y.to_numpy().ravel() # 1์ฐจ์› ๋ฒกํ„ฐ ํ˜•ํƒœ๋กœ ์ถœ๋ ฅํ•˜๊ธฐ ์œ„ํ•ด ravel ์‚ฌ์šฉ
y

x = movie.drop(columns=['์ฒœ๋งŒ๊ด€๊ฐ๋ŒํŒŒ์—ฌ๋ถ€'])
x



3. ํ›ˆ๋ จ์…‹, ๊ฒ€์ •์…‹ ๋‚˜๋ˆ„๊ธฐ

x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.3, random_state=123)



4. RandomForest ๊ตฌํ˜„

model = RandomForestClassifier(random_state=123)
model.fit(x_train, y_train)



5. ๊ฒฐ๊ณผ ํ™•์ธ

y_pred = model.predict(x_test)
print('Train_Accuracy: ', model.score(x_train, y_train),'\n')

accuracy = mt.accuracy_score(y_test, y_pred)
recall = mt.recall_score(y_test, y_pred)
precision = mt.precision_score(y_test, y_pred)
f1_score = mt.f1_score(y_test, y_pred)
matrix = mt.confusion_matrix(y_test, y_pred)

print('Accuracy: ', format(accuracy,'.2f'))
print('Recall: ', format(recall,'.2f'))
print('Precision: ', format(precision,'.2f'))
print('F1_score: ', format(f1_score,'.2f'))
print('Confusion Matrix:','\n', matrix)

Train_Accuracy:  1.0

Accuracy:  0.93
Recall:  0.93
Precision:  0.93
F1_score:  0.93
Confusion Matrix: 
 [[14  1]
 [ 1 14]]

   
 6. ์‹œ๊ฐํ™”

ftr_importances_values = model.feature_importances_
ftr_importances = pd.Series(ftr_importances_values, index=x_train.columns)
ftr_top = ftr_importances.sort_values(ascending=False)

plt.rc("font", family = "Malgun Gothic")
sns.set(font="Malgun Gothic", 
        rc={"axes.unicode_minus":False}, style="white")
plt.figure(figsize=(8,6))
plt.title('Feature Importances')
sns.barplot(x=ftr_top, y=ftr_top.index, palette="Set3")
sns.despine(left=True, bottom=True)
   
plt.show()

 

 

 

 

 

 

 

#0. ๋ชจ๋“ˆ ์ž„ํฌํŠธ
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import sklearn.metrics as mt
from sklearn.tree import export_graphviz #์‹œ๊ฐํ™”
from graphviz import Source #์‹œ๊ฐํ™”


1. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

movie = pd.read_csv(r'C:\Users\KIM YOON\Desktop\21.12.27~22.01.07 ํ”„๋กœ์ ํŠธ\๋ณ€์ˆ˜์ตœ์ข…(์ƒ๊ด€๊ด€๊ณ„๋†’์Œ).csv')
movie.info()
movie.isna().sum() #๊ฒฐ์ธก์น˜ ํ™•์ธ
#movie = movie.dropna(axis=0) # ๊ฒฐ์ธก์น˜ ํ–‰ ์‚ญ์ œ
movie



2. ํƒ€๊ฒŸ(y)๋ณ€์ˆ˜ ์ง€์ •

y = movie["์ฒœ๋งŒ๊ด€๊ฐ๋ŒํŒŒ์—ฌ๋ถ€"]

#y = y.to_numpy().ravel() #1์ฐจ์› ๋ฒกํ„ฐ ํ˜•ํƒœ๋กœ ์ถœ๋ ฅํ•˜๊ธฐ ์œ„ํ•ด ravel ์‚ฌ์šฉ
y

x = movie.drop(columns=['์ฒœ๋งŒ๊ด€๊ฐ๋ŒํŒŒ์—ฌ๋ถ€'])
x



3. ํ›ˆ๋ จ์…‹, ๊ฒ€์ •์…‹ ๋‚˜๋ˆ„๊ธฐ

x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.3, random_state=123)



4. Dicision Tree ๋ชจ๋ธ ๊ตฌํ˜„

model = DecisionTreeClassifier(random_state=123)
model.fit(x_train, y_train)



5. ๊ฒฐ๊ณผ ํ™•์ธ

y_pred = model.predict(x_test)

print('Train_Accuracy: ', model.score(x_train, y_train),'\n')

accuracy = mt.accuracy_score(y_test, y_pred)
recall = mt.recall_score(y_test, y_pred)
precision = mt.precision_score(y_test, y_pred)
f1_score = mt.f1_score(y_test, y_pred)
matrix = mt.confusion_matrix(y_test, y_pred)

print('Accuracy: ', format(accuracy,'.2f'))
print('Recall: ', format(recall,'.2f'))
print('Precision: ', format(precision,'.2f'))
print('F1_score: ', format(f1_score,'.2f'))
print('Confusion Matrix:','\n', matrix)

Train_Accuracy:  1.0 

Accuracy:  0.90
Recall:  0.93
Precision:  0.88
F1_score:  0.90
Confusion Matrix: 
 [[13  2]
 [ 1 14]]


7. ์‹œ๊ฐํ™”
* ํŠธ๋ฆฌ ์‹œ๊ฐํ™”๋ฅผ ์œ„ํ•œ export_graphviz ๋ชจ๋“ˆ ์ž„ํฌํŠธ (์ด๋ฅผ ์œ„ํ•ด graphviz ๋ณ„๋„ ์„ค์น˜ ํ•„์š”)

x_list = list(x.columns)

y_list = list(y.drop_duplicates(inplace=False))
x_list = ['screen', 'cost', 'naver_score', 'watcha_score', 'news', 'tv', 'preview', 'google_t', 'director', 'actor']

graph = export_graphviz(model, 
                        out_file='tree_graph.dot', 
                        feature_names=x_list,
                        class_names=str(y_list),
                        filled=True, #์ƒ‰์น 
                        rounded=True, #๋ชจ์„œ๋ฆฌ ๋‘ฅ๊ธ€๊ฒŒ
                        special_characters=True, #ํŠน์ˆ˜๋ฌธ์ž
                        impurity=True) #GINI ๊ณ„์ˆ˜


dot file read

file = open("tree_graph.dot")
dot_graph = file.read()


dot file ์‹œ๊ฐํ™”

Source(dot_graph)

 

 

 

 

์ฐธ๊ณ 

graphviz ์„ค์น˜/์‹คํ–‰ ์—๋Ÿฌ https://free-eunb.tistory.com/14