๊ฐœ์ธ๊ณต๋ถ€/Tensorflow

86. Tensorflow Classification ์—ฐ์Šต๋ฌธ์ œ

LEE_BOMB 2021. 12. 21. 22:17
๋ฌธ1) bmi.csv ๋ฐ์ดํ„ฐ์…‹์„ ์ด์šฉํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์ด sigmoid classifier์˜ ๋ชจ๋ธ์„ ์ƒ์„ฑํ•˜์‹œ์˜ค. 
์กฐ๊ฑด1> bmi.csv ๋ฐ์ดํ„ฐ์…‹ 
-> x๋ณ€์ˆ˜ : 1,2๋ฒˆ์งธ ์นผ๋Ÿผ(height, weight) 
-> y๋ณ€์ˆ˜ : 3๋ฒˆ์งธ ์นผ๋Ÿผ(label)
์กฐ๊ฑด2> ๋”ฅ๋Ÿฌ๋‹ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜ : Adam
์กฐ๊ฑด3> learning rage = 0.01    
์กฐ๊ฑด4> ๋ฐ˜๋ณตํ•™์Šต : 2,000๋ฒˆ, 200 step ๋‹จ์œ„๋กœ loss ์ถœ๋ ฅ 
์กฐ๊ฑด5> ์ตœ์ ํ™” ๋ชจ๋ธ ํ…Œ์ŠคํŠธ :  ๋ถ„๋ฅ˜์ •ํ™•๋„(Accuracy report) ์ถœ๋ ฅ 
   
<์ถœ๋ ฅ๊ฒฐ๊ณผ>
step = 200 , loss = 0.532565
step = 400 , loss = 0.41763392
step = 600 , loss = 0.34404162
step = 800 , loss = 0.29450226
step = 1000 , loss = 0.25899038
step = 1200 , loss = 0.23218009
step = 1400 , loss = 0.2111086
step = 1600 , loss = 0.19401966
step = 1800 , loss = 0.17981105
step = 2000 , loss = 0.16775638
========================================
accuracy= 0.9894053767712886  

 

import tensorflow as tf 
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import minmax_scale #x๋ณ€์ˆ˜ ์ •๊ทœํ™” 
import numpy as np
import pandas as pd


 csv file load

bmi = pd.read_csv('C:/ITWILL/5_Tensorflow/data/bmi.csv')
print(bmi.info())
bmi['label'].value_counts()

normal    7677
fat       7425
thin      4898

subset ์ƒ์„ฑ : label์—์„œ normal, fat ์ถ”์ถœ 

bmi = bmi[bmi.label.isin(['normal','fat'])]
print(bmi.head())

bmi['label'].value_counts()

normal    7677 -> 0
fat       7425 -> 1
thin      4898 -> ์ œ๊ฑฐ 

์นผ๋Ÿผ ์ถ”์ถœ 

col = list(bmi.columns)
print(col)


x,y ๋ณ€์ˆ˜ ์ถ”์ถœ 

x_data = bmi[col[:2]] #x๋ณ€์ˆ˜(1,2์นผ๋Ÿผ)
y_data = bmi[col[2]] #y๋ณ€์ˆ˜(3์นผ๋Ÿผ)
y_data #dtype: object


๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ : label ๋”๋ฏธ๋ณ€์ˆ˜ ๋ณ€ํ™˜(normal -> 0, fat -> 1)

map_data = {'normal': 0,'fat' : 1}
y_data= y_data.map(map_data) #dict mapping
print(y_data) #0/1


x_data ์ •๊ทœํ™” ํ•จ์ˆ˜ 

x_data = minmax_scale(x_data)


numpy ๊ฐ์ฒด ๋ณ€ํ™˜ 

x_data = np.array(x_data)
y_data = np.transpose(np.array([y_data])) #(1, 15102) -> (15102, 1)

print(x_data.shape) #(15102, 2)
print(y_data.shape) #(15102, 1)

* X,Y ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ์™„๋ฃŒ 

 

 


1. X,Y ๋ณ€์ˆ˜ ์ •์˜   

X = tf.constant(x_data, tf.float32) 
Y = tf.constant(y_data, tf.float32)

 


2. w,b ๋ณ€์ˆ˜ ์ •์˜ : ์ดˆ๊ธฐ๊ฐ’(์ •๊ทœ๋ถ„ํฌ ๋‚œ์ˆ˜ )

w = tf.Variable(tf.random.normal([2, 1])) #[์ž…๋ ฅ์ˆ˜,์ถœ๋ ฅ์ˆ˜]
b = tf.Variable(tf.random.normal([1])) #[์ถœ๋ ฅ์ˆ˜]



3. ํšŒ๊ท€๋ฐฉ์ •์‹ 

def linear_model(X) : #train, test
    y_pred = tf.linalg.matmul(X, w) + b 
    return y_pred #2์ฐจ์›



4. sigmoid ํ™œ์„ฑํ•จ์ˆ˜ ์ ์šฉ 

def sig_fn(X):
    y_pred = linear_model(X)
    sig = tf.nn.sigmoid(y_pred) 
    return sig



5. ์†์‹ค ํ•จ์ˆ˜ ์ •์˜ : ์†์‹ค๊ณ„์‚ฐ์‹ ์ˆ˜์ • 

def loss_fn() : #์ธ์ˆ˜ ์—†์Œ 
    sig = sig_fn(X) 
    loss = -tf.reduce_mean(Y*tf.math.log(sig)+(1-Y)*tf.math.log(1-sig))
    return loss



6. ์ตœ์ ํ™” ๊ฐ์ฒด : learning_rate= 0.01

optimizer = tf.optimizers.Adam(learning_rate= 0.01)



7. ๋ฐ˜๋ณตํ•™์Šต : ๋ฐ˜๋ณตํ•™์Šต : 2,000๋ฒˆ, 200 step ๋‹จ์œ„๋กœ loss ์ถœ๋ ฅ 

for step in range(2000) :
    #model ์ตœ์ ํ™” -> w, b ์—…๋ฐ์ดํŠธ 
    optimizer.minimize(loss=loss_fn, var_list=[w, b])
    
    if (step+1) % 200 == 0 :
        print('step : ', (step+1), ", loss val = ", loss_fn().numpy())
            
print('='*30)



8. model ์ตœ์ ํ™” ํ…Œ์ŠคํŠธ

y_pred = tf.cast(sig_fn(X) > 0.5, tf.float32)

acc = accuracy_score(Y, y_pred)
print('accuracy =', acc)

report = classification_report(Y, y_pred)
print(report)

 

 

 

 

 

๋ฌธ2) bmi.csv ๋ฐ์ดํ„ฐ์…‹์„ ์ด์šฉํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์ด softmax classifier ๋ชจ๋ธ์„ ์ƒ์„ฑํ•˜์‹œ์˜ค. 
์กฐ๊ฑด1> bmi.csv ๋ฐ์ดํ„ฐ์…‹ 
-> x๋ณ€์ˆ˜ : height, weight ์นผ๋Ÿผ 
-> y๋ณ€์ˆ˜ : label(3๊ฐœ ๋ฒ”์ฃผ) ์นผ๋Ÿผ
์กฐ๊ฑด2> ๋”ฅ๋Ÿฌ๋‹ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜ : Adam
์กฐ๊ฑด3> learning rage : 0.001 or 0.005 ์„ ํƒ(๋ถ„๋ฅ˜์ •ํ™•๋„ ๋†’์€๊ฒƒ) 
์กฐ๊ฑด4> ๋ฐ˜๋ณตํ•™์Šต, step ๋‹จ์œ„๋กœ loss : <์ถœ๋ ฅ๊ฒฐ๊ณผ> ์ฐธ๊ณ  
์กฐ๊ฑด5> ๋ถ„๋ฅ˜์ •ํ™•๋„ ์ถœ๋ ฅ
์กฐ๊ฑด6> ์˜ˆ์ธก์น˜์™€ ์ •๋‹ต 15๊ฐœ ์ถœ๋ ฅ   
    
<์ถœ๋ ฅ ๊ฒฐ๊ณผ>
step = 500 , loss = 0.44498476
step = 1000 , loss = 0.34861678
step = 1500 , loss = 0.28995454
step = 2000 , loss = 0.24887484
step = 2500 , loss = 0.2177721
step = 3000 , loss = 0.19313334
step = 3500 , loss = 0.17303815
step = 4000 , loss = 0.15629826
step = 4500 , loss = 0.1421249
step = 5000 , loss = 0.12996733
========================================
accuracy = 0.9769
========================================
y_pred :  [0 0 1 1 1 1 0 2 0 2 1 2 1 0 2]
y_true :  [0 0 1 1 1 1 0 2 0 2 1 2 1 0 2]  
========================================

 

import tensorflow as tf  ver1.x
from sklearn.preprocessing import minmax_scale #x data ์ •๊ทœํ™”(0~1)
from sklearn.metrics import accuracy_score
import numpy as np
import pandas as pd
 
bmi = pd.read_csv('C:/ITWILL/5_Tensorflow/data/bmi.csv')
print(bmi.info())


์นผ๋Ÿผ ์ถ”์ถœ 

col = list(bmi.columns)
print(col)


x,y ๋ณ€์ˆ˜ ์ถ”์ถœ 

x_data = bmi[col[:2]] #x๋ณ€์ˆ˜


x_data ์ •๊ทœํ™” 

x_data = minmax_scale(x_data)


label one hot encoding 

label_map = {"thin": [1,0,0], "normal": [0,1,0], "fat": [0,0,1]}
bmi["label"] = bmi["label"].apply(lambda x : np.array(label_map[x]))

y_data = list(bmi["label"]) #์ค‘์ฒฉlist : [[1,0,0], [1,0,0]]


numpy ๊ฐ์ฒด ๋ณ€ํ™˜ 

x_data = np.array(x_data)
y_data = np.array(y_data)

* X,Y ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ์™„๋ฃŒ

 


1. X,Y๋ณ€์ˆ˜ ์ •์˜ : ๊ณต๊ธ‰ํ˜• ๋ณ€์ˆ˜ 

X = tf.constant(x_data, tf.float32) #[?, 2]
Y = tf.constant(y_data, tf.float32) #[?, 3]



2. w,b ๋ณ€์ˆ˜ ์ •์˜ 

w = tf.Variable(tf.random.normal([2, 3])) #[์ž…๋ ฅ์ˆ˜, ์ถœ๋ ฅ์ˆ˜]
b = tf.Variable(tf.zeros([3])) #[์ถœ๋ ฅ์ˆ˜]



3. ํšŒ๊ท€๋ฐฉ์ •์‹ 

def linear_model(X) : #train, test
    y_pred = tf.matmul(X, w) + b  #ํ–‰๋ ฌ๊ณฑ : [None,3]*[3,1]=[None,1]
    return y_pred



4. softmax ํ™œ์„ฑํ•จ์ˆ˜ ์ ์šฉ 

def soft_fn(X):
    y_pred = linear_model(X)
    soft = tf.nn.softmax(y_pred)
    return soft



5. ์†์‹ค ํ•จ์ˆ˜ ์ •์˜ : ์†์‹ค๊ณ„์‚ฐ์‹ ์ˆ˜์ • 

def loss_fn() : #์ธ์ˆ˜ ์—†์Œ 
    soft = soft_fn(X) #ํ›ˆ๋ จ์…‹ -> ์˜ˆ์ธก์น˜ : ํšŒ๊ท€๋ฐฉ์ •์‹  
    loss = -tf.reduce_mean(Y*tf.math.log(soft)+(1-Y)*tf.math.log(1-soft))
    return loss



6. ์ตœ์ ํ™” ๊ฐ์ฒด 

optimizer = tf.optimizers.Adam(lr=0.005)



7. ๋ฐ˜๋ณตํ•™์Šต 

for step in range(5000) : 
    #์˜ค์ฐจ์ œ๊ณฑํ‰๊ท  ์ตœ์ ํ™” : ์†์‹ค๊ฐ’ ์ตœ์†Œํ™” -> [a, b] ๊ฐฑ์‹ (update)
    optimizer.minimize(loss_fn, var_list=[w, b]) #(์†์‹ค๊ฐ’, ์ˆ˜์ • ๋Œ€์ƒ)
    
    #500๋ฐฐ์ˆ˜ ๋‹จ์œ„ ์ถœ๋ ฅ 
    if (step+1) % 500 == 0 :
        print("step =", (step+1), ", loss =", loss_fn().numpy())



8. ์ตœ์ ํ™”๋œ model ๊ฒ€์ • 

soft_re = soft_fn(X).numpy()

y_pred = tf.argmax(soft_re, 1) # demension : 2d
y_true = tf.argmax(Y, 1) # demension : 2d

acc = accuracy_score(y_true, y_pred)
print("="*40)
print('accuracy =', acc) # accuracy = 0.98


y_true vs y_pred  

print("="*40) 
print('y_pred : ', y_pred.numpy()[:15])
print('y_true : ', y_true.numpy()[:15])

 

 

 

 

๋ฌธ3) ๋‹ค์Œ digits ๋ฐ์ดํ„ฐ์…‹์„ ์ด์šฉํ•˜์—ฌ ๋‹คํ•ญ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ์ž‘์„ฑํ•˜์‹œ์˜ค.
<์กฐ๊ฑด1> digits ๋ฐ์ดํ„ฐ์…‹์˜ ํŠน์„ฑ์„ ๋ณด๊ณ  ์ „์ฒ˜๋ฆฌ/๊ณต๊ธ‰data ์ƒ์„ฑ  
<์กฐ๊ฑด2> ์•„๋ž˜ <์ถœ๋ ฅ๊ฒฐ๊ณผ>๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ํ•™์Šต์œจ๊ณผ ๋ฐ˜๋ณตํ•™์Šต ์ ์šฉ
<์กฐ๊ฑด3> epoch์— ๋”ฐ๋ฅธ loss value ์‹œ๊ฐํ™” : ์ด๋ฏธ์ง€ํŒŒ์ผ ์ฐธ๊ณ  (exam03_lossValue.png)

<์ถœ๋ ฅ๊ฒฐ๊ณผ>
step = 200 , loss = 0.06003735238669643
step = 400 , loss = 0.02922042555340125
step = 600 , loss = 0.01916724251850193
step = 800 , loss = 0.01418028865527556
step = 1000 , loss = 0.011102086315873883
step = 1200 , loss = 0.008942419709185086
step = 1400 , loss = 0.007311927138572721
step = 1600 , loss = 0.006023632246639046
step = 1800 , loss = 0.004981346240771604
step = 2000 , loss = 0.004163072611802871
========================================
accuracy = 0.9648148148148148

 

import tensorflow as tf #ver 2.0
from sklearn.preprocessing import OneHotEncoder #y data -> one hot
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_digits

digits ๋ฐ์ดํ„ฐ์…‹ : ์ˆซ์ž ํ•„๊ธฐ์ฒด ์ด๋ฏธ์ง€ -> ์ˆซ์ž ์˜ˆ์ธก(0~9)

• ํƒ€๊ฒŸ ๋ณ€์ˆ˜ : y
 - 0 ~ 9 : 10์ง„์ˆ˜ ์ •์ˆ˜ 
• ํŠน์ง• ๋ณ€์ˆ˜(64ํ”ฝ์…€) : X 
 - 0๋ถ€ํ„ฐ 9๊นŒ์ง€์˜ ์ˆซ์ž๋ฅผ ์†์œผ๋กœ ์“ด ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ
 - ๊ฐ ์ด๋ฏธ์ง€๋Š” 0๋ถ€ํ„ฐ 15๊นŒ์ง€์˜ 16๊ฐœ ๋ช…์•”์„ ๊ฐ€์ง€๋Š” 8x8=64ํ”ฝ์…€ ํ•ด์ƒ๋„์˜ ํ‘๋ฐฑ ์ด๋ฏธ์ง€

digits = load_digits() #dataset load

X = digits.data  #X๋ณ€์ˆ˜ 
y = digits.target #y๋ณ€์ˆ˜ 
print(X.shape) #(1797, 64) : 64=8x8
print(y.shape) #(1797,)



1. digits dataset split

x_train, x_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=123)



2. ์ „์ฒ˜๋ฆฌ : X๋ณ€์ˆ˜ ์ •๊ทœํ™”, Y๋ณ€์ˆ˜ one-hot encoding  

print(X.max()) #16.0
x_train_nor, x_test_nor = x_train / 16.0, x_test / 16.0


reshape

y_train = y_train.reshape([-1, 1])
y_test = y_test.reshape([-1, 1])


one-hot encoding

obj = OneHotEncoder()
y_train = obj.fit_transform(y_train).toarray()
y_train.shape #(1257, 10)

y_test = obj.fit_transform(y_test).toarray()
y_test.shape #(540, 10)

print(y_train.dtype) #float64



3. ๊ณต๊ธ‰ data : 

x_train = x_train_nor
x_test = x_test_nor
x_train.shape #(1257, 64)
x_test.shape #(540, 64)



4. w, b ๋ณ€์ˆ˜ ์ •์˜ 

w = tf.Variable(tf.random.normal([64, 10], dtype=tf.float64)) #[์ž…๋ ฅ์ˆ˜, ์ถœ๋ ฅ์ˆ˜]
b = tf.Variable(tf.random.normal([10], dtype=tf.float64)) #[์ถœ๋ ฅ์ˆ˜]
#type ์ผ์น˜



5. ํšŒ๊ท€๋ฐฉ์ •์‹ 

def linear_model(X) : #train, test
    y_pred = tf.matmul(X, w) + b  #ํ–‰๋ ฌ๊ณฑ : [None,10]*[10,1]=[None,1]
    return y_pred



6. softmax ํ™œ์„ฑํ•จ์ˆ˜ ์ ์šฉ 

def soft_fn(X):
    y_pred = linear_model(X)
    soft = tf.nn.softmax(y_pred)
    return soft



7. ์†์‹ค ํ•จ์ˆ˜ ์ •์˜ : ์†์‹ค๊ณ„์‚ฐ์‹ ์ˆ˜์ • 

def loss_fn() : #์ธ์ˆ˜ ์—†์Œ 
    soft = soft_fn(x_train) #ํ›ˆ๋ จ์…‹ -> ์˜ˆ์ธก์น˜ : ํšŒ๊ท€๋ฐฉ์ •์‹  
    loss = -tf.reduce_mean(y_train*tf.math.log(soft)+(1-y_train)*tf.math.log(1-soft))
    return loss



8. ์ตœ์ ํ™” ๊ฐ์ฒด 

optimizer = tf.optimizers.Adam(lr=0.01)



9. ๋ฐ˜๋ณตํ•™์Šต 

loss_val = []
for step in range(2000) : 
    #์˜ค์ฐจ์ œ๊ณฑํ‰๊ท  ์ตœ์ ํ™” : ์†์‹ค๊ฐ’ ์ตœ์†Œํ™” -> [a, b] ๊ฐฑ์‹ (update)
    optimizer.minimize(loss_fn, var_list=[w, b]) #(์†์‹ค๊ฐ’, ์ˆ˜์ • ๋Œ€์ƒ)
    
    #100๋ฐฐ์ˆ˜ ๋‹จ์œ„ ์ถœ๋ ฅ 
    if (step+1) % 200 == 0 :
        print("step =", (step+1), ", loss =", loss_fn().numpy())
    loss_val.append(loss_fn().numpy())


    
10. ์ ์ ํ™”๋œ model ๊ฒ€์ฆ 

soft_re = soft_fn(x_test).numpy()

y_pred = tf.argmax(soft_re, 1) #demension : 2d
y_true = tf.argmax(y_test, 1) #demension : 2d

acc = accuracy_score(y_true, y_pred)
print('accuracy =', acc) #accuracy = 0.98

step = 200 , loss = 0.06003735238669643
step = 400 , loss = 0.02922042555340125
step = 600 , loss = 0.01916724251850193
step = 800 , loss = 0.01418028865527556
step = 1000 , loss = 0.011102086315873883
step = 1200 , loss = 0.008942419709185086
step = 1400 , loss = 0.007311927138572721
step = 1600 , loss = 0.006023632246639046
step = 1800 , loss = 0.004981346240771604
step = 2000 , loss = 0.004163072611802871
========================================
accuracy = 0.9648148148148148


11. loss value vs epochs ์‹œ๊ฐํ™” 

import matplotlib.pyplot as plt
plt.plot(loss_val, 'r--')
plt.show()