LEE_BOMB 2021. 10. 22. 16:07

01. ์—ฐ๊ด€๋ถ„์„๊ฐœ์š”

์–ด๋–ค ์‚ฌ๊ฑด์ด ์–ผ๋งˆ๋‚˜ ์ž์ฃผ ๋™์‹œ์— ๋ฐœ์ƒํ•˜๋Š”๊ฐ€๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๊ทœ์น™ ๋˜๋Š” ์กฐ๊ฑด

์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ์‚ฌ๊ฑด์˜ ์—ฐ๊ด€๊ทœ์น™์„ ์ฐพ๋Š” ๋ฌด๋ฐฉํ–ฅ์„ฑ ๋ฐ์ดํ„ฐ ๋งˆ์ด๋‹ ๊ธฐ๋ฒ•

๋งˆ์ผ€ํŒ…์—์„œ ๊ณ ๊ฐ์˜ ์žฅ๋ฐ”๊ตฌ๋‹ˆ์— ๋“ค์–ด์žˆ๋Š” ํ’ˆ๋ชฉ ๊ฐ„์˜ ๊ด€๊ณ„ ํƒ๊ตฌ

y๋ณ€์ˆ˜๊ฐ€ ์—†๋Š” ๋น„์ง€๋„ ํ•™์Šต์— ์˜ํ•œ ํŒจํ„ด ๋ถ„์„

์‚ฌ๊ฑด๊ณผ ์‚ฌ๊ฑด ๊ฐ„ ์—ฐ๊ด€์„ฑ(๊ด€๊ณ„)๋ฅผ ์ฐพ๋Š” ๋ฐฉ๋ฒ•(์˜ˆ:๊ธฐ์ €๊ท€์™€ ๋งฅ์ฃผ) ์˜ˆ) ์žฅ๋ฐ”๊ตฌ๋‹ˆ ๋ถ„์„ : ์žฅ๋ฐ”๊ตฌ๋‹ˆ ์ •๋ณด๋ฅผ ํŠธ๋žœ์žญ์…˜์ด๋ผ๊ณ  ํ•˜๋ฉฐ, ํŠธ๋žœ์žญ์…˜ ๋‚ด์˜ ์—ฐ๊ด€์„ฑ์„ ์‚ดํŽด๋ณด๋Š” ๋ถ„์„๊ธฐ๋ฒ•

๋ถ„์„์ ˆ์ฐจ : ๊ฑฐ๋ž˜๋‚ด์—ญ -> ํ’ˆ๋ชฉ ๊ด€์ฐฐ -> ๊ทœ์น™(Rule) ๋ฐœ๊ฒฌ

 

 

๊ด€๋ จ๋ถ„์•ผ

: ๋Œ€ํ˜• ๋งˆํŠธ, ๋ฐฑํ™”์ , ์‡ผํ•‘๋ชฐ ํŒ๋งค์ž -> ๊ณ ๊ฐ ๋Œ€์ƒ ์ƒํ’ˆ์ถ”์ฒœ

1. ๊ณ ๊ฐ๋“ค์€ ์–ด๋–ค ์ƒํ’ˆ๋“ค์„ ๋™์‹œ์— ๊ตฌ๋งคํ•˜๋Š”๊ฐ€?

2. ๋ผ๋ฉด์„ ๊ตฌ๋งคํ•œ ๊ณ ๊ฐ์€ ์ฃผ๋กœ ๋‹ค๋ฅธ ์–ด๋–ค ์ƒํ’ˆ์„ ๊ตฌ๋งคํ•˜๋Š”๊ฐ€

 

ํ™œ์šฉ๋ฐฉ์•ˆ

: ์œ„์™€ ๊ฐ™์€ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋ถ„์„์„ ํ† ๋Œ€๋กœ ๊ณ ๊ฐ๋“ค์—๊ฒŒ

1) ์ƒํ’ˆ์ •๋ณด ๋ฐœ์†ก

2) ํ…”๋ ˆ๋งˆ์ผ€ํŒ…๋ฅผ ํ†ตํ•ด์„œ ํŒจํ‚ค์ง€ ์ƒํ’ˆ ํŒ๋งค ๊ธฐํš

3) ๋งˆํŠธ์˜ ์ƒํ’ˆ์ง„์—ด

 

 

์—ฐ๊ด€๊ทœ์น™ ํ‰๊ฐ€์ฒ™๋„

1. ์ง€์ง€๋„(support) : ์ƒํ’ˆA์™€ ์ƒํ’ˆB๋ฅผ ํ•จ๊ป˜ ๊ตฌ๋งคํ•  ํ™•๋ฅ  A->B ์ง€์ง€๋„ ์‹

-> A์™€ B๋ฅผ ํฌํ•จํ•œ ๊ฑฐ๋ž˜์ˆ˜ / ์ „์ฒด ๊ฑฐ๋ž˜์ˆ˜

2. ์‹ ๋ขฐ๋„(confidence) : ์ƒํ’ˆA๋ฅผ ๊ตฌ๋งคํ•  ๋•Œ ์ƒํ’ˆB๋ฅผ ๊ฐ™์ด ๊ตฌ๋งคํ•  ์กฐ๊ฑด๋ถ€ ํ™•

A->B ์‹ ๋ขฐ๋„ ์‹ : ์ˆœ์„œ๋ฅผ ๊ณ ๋ คํ•œ๋‹ค๋Š” ์˜๋ฏธ -> ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ 

-> A์™€ B๋ฅผ ํฌํ•จํ•œ ๊ฑฐ๋ž˜์ˆ˜ / A๋ฅผ ํฌํ•จํ•œ ๊ฑฐ๋ž˜์ˆ˜

3. ํ–ฅ์ƒ๋„(Lift) : ์ง€์ง€๋„์™€ ์‹ ๋ขฐ๋„๋ฅผ ๋™์‹œ์— ๊ณ ๋ คํ•œ ์ƒํ’ˆA์™€ ์ƒํ’ˆB ๊ฐ„์˜ ์—ฐ๊ด€์„ฑ ํ–ฅ์ƒ๋„ 

-> ์‹ ๋ขฐ๋„ / B๊ฐ€ ํฌํ•จ๋  ๊ฑฐ๋ž˜์œจ

 

ํ–ฅ์ƒ๋„ > 1 : ๋‘ ํ’ˆ๋ชฉ์ด ์–‘์˜ ์—ฐ๊ด€์„ฑ์„ ๊ฐ€์ง(๋นต๊ณผ ๋ฒ„ํ„ฐ)

ํ–ฅ์ƒ๋„ = 1 : ๋…๋ฆฝ์— ๊ฐ€๊นŒ์šด ์‚ฌ๊ฑด(๊ณผ์ž์™€ ํ›„์ถ”)

ํ–ฅ์ƒ๋„ < 1 : ๋‘ ํ’ˆ๋ชฉ์ด ์Œ์˜ ์—ฐ๊ด€์„ฑ์„ ๊ฐ€์ง(์„ค์‚ฌ์•ฝ์™€ ๋ณ€๋น„์•ฝ)

 

์‹ ๋ขฐ๋„(confidence) : A๊ฐ€ ํฌํ•จ๋œ ๊ฑฐ๋ž˜ ์ค‘์—์„œ B๋ฅผ ํฌํ•จํ•œ ๊ฑฐ๋ž˜์˜ ๋น„์œจ(์กฐ๊ฑด๋ถ€ ํ™•๋ฅ )
A->B ์‹ ๋ขฐ๋„ ์‹
-> A์™€ B๋ฅผ ํฌํ•จํ•œ ๊ฑฐ๋ž˜์ˆ˜ / A๋ฅผ ํฌํ•จํ•œ ๊ฑฐ๋ž˜์ˆ˜

ํ–ฅ์ƒ๋„(Lift) : ํ•˜์œ„ ํ•ญ๋ชฉ๋“ค์ด ๋…๋ฆฝ์—์„œ ์–ผ๋งˆ๋‚˜ ๋ฒ—์–ด๋‚˜๋Š”์ง€์˜ ์ •๋„๋ฅผ ์ธก์ •ํ•œ ๊ฐ’
ํ–ฅ์ƒ๋„ ์‹
-> ์‹ ๋ขฐ๋„ / B๊ฐ€ ํฌํ•จ๋  ๊ฑฐ๋ž˜์œจ

 

 

 

 

[์‹ค์Šต]
<์ง€์ง€๋„์™€ ์‹ ๋ขฐ๋„ ์˜ˆ์‹œ>
t1 : ๋ผ๋ฉด,๋งฅ์ฃผ,์šฐ์œ 
t2 : ๋ผ๋ฉด,๊ณ ๊ธฐ,์šฐ์œ 
t3 : ๋ผ๋ฉด,๊ณผ์ผ,๊ณ ๊ธฐ
t4 : ๊ณ ๊ธฐ,๋งฅ์ฃผ,์šฐ์œ 
t5 : ๋ผ๋ฉด,๊ณ ๊ธฐ,์šฐ์œ 
t6 : ๊ณผ์ผ,์šฐ์œ 


    A -> B                   ์ง€์ง€๋„         ์‹ ๋ขฐ๋„          ํ–ฅ์ƒ๋„
๋งฅ์ฃผ -> ๊ณ ๊ธฐ             1/6=0.166        1/2=0.5      0.5/0.66(4/6)=0.75
๋ผ๋ฉด,์šฐ์œ  -> ๋งฅ์ฃผ       1/6=0.166       1/3=0.33     0.33/2/6(0.33)=1 

์—ฐ๊ด€์„ฑ ๊ทœ์น™ ๋ถ„์„์„ ์œ„ํ•œ ํŒจํ‚ค์ง€

install.packages("arules") # association Rule

 

read.transactions(),  apriori(), Adult ๋ฐ์ดํ„ฐ์…‹ ์ œ๊ณต

library(arules) #read.transactions()ํ•จ์ˆ˜ ์ œ๊ณต


1) transaction ๊ฐ์ฒด ์ƒ์„ฑ(ํŒŒ์ผ ์ด์šฉ)
setwd("C:/ITWILL/2_Rwork/data")
# read.transactions() : ํ‰์„œ๋ฌธ -> ํŠธ๋žœ์žญ์…˜ ๊ฐ์ฒด ๋ณ€ํ™˜ 
tran <- read.transactions("tran.txt", format="basket", sep=",")
tran
#6 transactions (rows) and
#5 items (columns)

2. transaction ๋ฐ์ดํ„ฐ ๋ณด๊ธฐ
inspect(tran)

3. rule ๋ฐœ๊ฒฌ(์ƒ์„ฑ) - ์ง€์ง€๋„,์‹ ๋ขฐ๋„ = 0.1
apriori(ํŠธ๋žœ์žญ์…˜ data, parameter=list(supp, conf))
์—ฐ๊ด€์„ฑ ๊ทœ์น™ ํ‰๊ฐ€ ์ฒ™๋„ - ์ง€์ง€๋„์™€ ์‹ ๋ขฐ๋„

rule1 <- apriori(tran, parameter = list(supp=0.3, conf=0.1)) # 16 rule
rule2 <- apriori(tran, parameter = list(supp=0.1, conf=0.1)) # 35 rule 
inspect(rule1) # ๊ทœ์น™ ๋ณด๊ธฐ
inspect(rule2) # ๊ทœ์น™ ๋ณด๊ธฐ

 

์ง€์ง€๋„, ์‹ ๋ขฐ๋„, maxlen ์ธ์ˆ˜  

help("apriori") # support 0.1, confidence 0.8, and maxlen 10 
rule <- apriori(tran)  # 6 rule(s)
rule<- apriori(tran, parameter = list(supp=0.1, conf=0.8, maxlen=10)) 
inspect(rule)

  

 


2. ํŠธ๋žœ์žญ์…˜ ๊ฐ์ฒด ์ƒ์„ฑ 
ํ˜•์‹) read.transactions(file, format=c("basket", "single"), sep = NULL, cols=NULL, rm.duplicates=FALSE,encoding="unknown")


file : file name
format : data set์˜ ํ˜•์‹ ์ง€์ •(basket ๋˜๋Š” single)
-> single : ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ(2๊ฐœ ์นผ๋Ÿผ) -> transaction ID์— ์˜ํ•ด์„œ ์ƒํ’ˆ(item)์ด ๋Œ€์‘๋œ ๊ฒฝ์šฐ
-> basket : ๋ฐ์ดํ„ฐ ์…‹์ด ์—ฌ๋Ÿฌ๊ฐœ์˜ ์ƒํ’ˆ์œผ๋กœ ๊ตฌ์„ฑ -> transaction ID ์—†์ด ์—ฌ๋Ÿฌ ์ƒํ’ˆ(item) ๊ตฌ์„ฑ
sep : ์ƒํ’ˆ ๊ตฌ๋ถ„์ž
cols : single์ธ ๊ฒฝ์šฐ ์ฝ์„ ์ปฌ๋Ÿผ ์ˆ˜ ์ง€์ •, basket์€ ์ƒ๋žต(transaction ID๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ)
rm.duplicates : ์ค‘๋ณต ํŠธ๋žœ์žญ์…˜ ํ•ญ๋ชฉ ์ œ๊ฑฐ
encoding : ์ธ์ฝ”๋”ฉ ์ง€์ •

(1) single ํŠธ๋žœ์žญ์…˜ ๊ฐ์ฒด ์ƒ์„ฑ
read demo data - sep ์ƒ๋žต : ๊ณต๋ฐฑ์œผ๋กœ ์ฒ˜๋ฆฌ, single์ธ ๊ฒฝ์šฐ cols ์ง€์ • 
format = "single" : 1๊ฐœ์˜ transaction id์— ์˜ํ•ด์„œ item์ด ์—ฐ๊ฒฐ๋œ ๊ฒฝ์šฐ 

setwd("C:/ITWILL/2_Rwork/data")
stran <- read.transactions("demo_single",format="single",cols=c(1,2)) 
inspect(stran)

 


[์‹ค์Šต2] ์ค‘๋ณต ํŠธ๋žœ์žญ์…˜ ๊ฐ์ฒด ์ƒ์„ฑ

stran2<- read.transactions("single_format.csv", format="single", sep=",", 
                           cols=c(1,2), rm.duplicates=T)
stran2

summary(stran2) # 248๊ฐœ ํŠธ๋žœ์žญ์…˜์— ๋Œ€ํ•œ ๊ธฐ์ˆ ํ†ต๊ณ„ ์ œ๊ณต


ํŠธ๋žœ์žญ์…˜ ๋ณด๊ธฐ

inspect(stran2) # 248 ํŠธ๋žœ์žญ์…˜ ํ™•์ธ


๊ทœ์น™ ๋ฐœ๊ฒฌ

astran2 <- apriori(stran2) # supp=0.1, conf=0.8์™€ ๋™์ผํ•จ 
#astran2 <- apriori(stran2, parameter = list(supp=0.1, conf=0.8))
astran2 # set of 102 rules
attributes(astran2)
inspect(astran2)


ํ–ฅ์ƒ๋„๊ฐ€ ๋†’์€ ์ˆœ์„œ๋กœ ์ •๋ ฌ 

inspect(sort(astran2, by="lift"))


(2) basket ํŠธ๋žœ์žญ์…˜ ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ

btran <- read.transactions("demo_basket",format="basket",sep=",") 
inspect(btran) # ํŠธ๋žœ์žญ์…˜ ๋ฐ์ดํ„ฐ ๋ณด๊ธฐ

 





3. ์—ฐ๊ด€๊ทœ์น™ ์‹œ๊ฐํ™”(Adult ๋ฐ์ดํ„ฐ ์…‹ ์ด์šฉ)

[Adult ๋ฐ์ดํ„ฐ ์…‹] 
32,000๊ฐœ์˜ ๊ด€์ฐฐ์น˜์™€ 15๊ฐœ์˜ ๋ณ€์ˆ˜๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Œ
์ข…์†๋ณ€์ˆ˜์— ์˜ํ•ด์„œ ๋…„๊ฐ„ ๊ฐœ์ธ ์ˆ˜์ž…์ด 5๋งŒ๋‹ฌ๋Ÿฌ ์ด์ƒ ์ธ์ง€๋ฅผ 
์˜ˆ์ธกํ•˜๋Š” ๋ฐ์ดํ„ฐ ์…‹์œผ๋กœ transactions ๋ฐ์ดํ„ฐ๋กœ ์ฝ์–ด์˜จ 
๊ฒฝ์šฐ 48,842ํ–‰๊ณผ 115 ํ•ญ๋ชฉ์œผ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.

 

data(Adult) #arules์—์„œ ์ œ๊ณต๋˜๋Š” ๋‚ด์žฅ ๋ฐ์ดํ„ฐ ๋กœ๋”ฉ
str(Adult) #Formal class 'transactions' , 48842(ํ–‰)
Adult

attributes(Adult) #ํŠธ๋žœ์žญ์…˜์˜ ๋ณ€์ˆ˜์™€ ๋ฒ”์ฃผ ๋ณด๊ธฐ

 

์š”์•ฝ ํ†ต๊ณ„๋Ÿ‰

summary(Adult)

 

์‹ ๋ขฐ๋„ 80%, ์ง€์ง€๋„ 10%์ด ์ ์šฉ๋œ ์—ฐ๊ด€๊ทœ์น™ 6137 ๋ฐœ๊ฒฌ   

ar1 <- apriori(Adult, parameter = list(supp=0.1, conf=0.8))
ar2 <- apriori(Adult, parameter = list(supp=0.2)) # ์ง€๋„๋„ ๋†’์ž„
ar3 <- apriori(Adult, parameter = list(supp=0.2, conf=0.95)) # ์‹ ๋ขฐ๋„ ๋†’์ž„
ar4 <- apriori(Adult, parameter = list(supp=0.3, conf=0.95)) # ์‹ ๋ขฐ๋„ ๋†’์ž„
ar5 <- apriori(Adult, parameter = list(supp=0.35, conf=0.95)) # ์‹ ๋ขฐ๋„ ๋†’์ž„
ar6 <- apriori(Adult, parameter = list(supp=0.4, conf=0.95)) # ์‹ ๋ขฐ๋„ ๋†’์ž„


๊ฒฐ๊ณผ๋ณด๊ธฐ

inspect(head(ar6)) #์ƒ์œ„ 6๊ฐœ ๊ทœ์น™ ์ œ๊ณต -> inspect() ์ ์šฉ


confidence(์‹ ๋ขฐ๋„) ๊ธฐ์ค€ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ ์ƒ์œ„ 6๊ฐœ ์ถœ๋ ฅ

inspect(head(sort(ar6, decreasing=T, by="confidence")))


lift(ํ–ฅ์ƒ๋„) ๊ธฐ์ค€ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ ์ƒ์œ„ 6๊ฐœ ์ถœ๋ ฅ

inspect(head(sort(ar6, by="lift")))


์—ฐ๊ด€์„ฑ ๊ทœ์น™์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”๋ฅผ ์œ„ํ•œ ํŒจํ‚ค์ง€

install.packages("arulesViz") 
library(arulesViz) # rules๊ฐ’ ๋Œ€์ƒ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ฆฌ๋Š” ํŒจํ‚ค์ง€

plot(ar4) #x:์ง€์ง€๋„(support), y:์‹ ๋ขฐ๋„(conf) , ํ–ฅ์ƒ๋„(lift)์— ๋Œ€ํ•œ ์‚ฐํฌ๋„
plot(ar5, method="graph") #์—ฐ๊ด€๊ทœ์น™ ๋„คํŠธ์›Œํฌ ๊ทธ๋ž˜ํ”„


* ๊ฐ ์—ฐ๊ด€๊ทœ์น™ ๋ณ„๋กœ ์—ฐ๊ด€์„ฑ ์žˆ๋Š” ํ•ญ๋ชฉ(item) ๋ผ๋ฆฌ ๋ฌถ์—ฌ์„œ ๋„คํŠธ์›Œํฌ ํ˜•ํƒœ๋กœ ์‹œ๊ฐํ™”


 

 

 

4. ์‹๋ฃŒํ’ˆ์  ํŒŒ์ผ ์˜ˆ์ œ 

library(arules)

#transactions ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ
data("Groceries")  #์‹๋ฃŒํ’ˆ์  ๋ฐ์ดํ„ฐ ๋กœ๋”ฉ
str(Groceries) # Formal class 'transactions' [package "arules"] with 4 slots
Groceries #9835 transactions (rows) and 169 items (columns)


ํ˜•์‹) object@slots

Groceries@data

rules <- apriori(Groceries, parameter=list(supp=0.001, conf=0.8))
rules #410

inspect(rules)


๊ทœ์น™์„ ๊ตฌ์„ฑํ•˜๋Š” ์™ผ์ชฝ(LHS) -> ์˜ค๋ฅธ์ชฝ(RHS)์˜ item ๋นˆ๋„์ˆ˜ ๋ณด๊ธฐ  

plot(rules, method="grouped")


์ตœ๋Œ€ ๊ธธ์ด 3์ด๋‚ด๋กœ ๊ทœ์น™ ์ƒ์„ฑ

rules <- apriori(Groceries, parameter=list(supp=0.001, conf=0.80, maxlen=3))
inspect(rules) # 29๊ฐœ ๊ทœ์น™


confidence(์‹ ๋ขฐ๋„) ๊ธฐ์ค€ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ๊ทœ์น™ ์ •๋ ฌ

rules <- sort(rules, decreasing=T, by="confidence")
inspect(rules) 

library(arulesViz) # rules๊ฐ’ ๋Œ€์ƒ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ฆฌ๋Š” ํŒจํ‚ค์ง€
plot(rules, method="graph", control=list(type="items"))



 

 

 

 

์ค‘์‹ฌ์–ด ๊ธฐ์ค€ subset๋งŒ๋“ค๊ธฐ
1. rhs = 'whole milk'

wmilk = subset(rules, rhs %in% 'whole milk')
wmilk #18
plot(wmilk, method="graph")

์ง€์ง€๋„ ๋†’์€ ์ƒํ’ˆ : ๋™์‹œ์— ๊ตฌ๋งค ๊ฐ€๋Šฅ์„ฑ ๋†’์€ ์ƒํ’ˆ ex.ํ—ˆ๋ธŒ, ๋กค๋นต, ์น˜์ฆˆ+ํ–„๋ฒ„๊ฑฐ
ํ–ฅ์ƒ๋„ ๋†’์€ ์ƒํ’ˆ : ๊ด€๋ จ์„ฑ ๋†’์€ ์ƒํ’ˆ ex.์Œ€+์„คํƒ•

2. rhs = 'other vegetables'

ovege = subset(rules, rhs %in% 'other vegetables')
ovege #10
plot(ovege, method='graph')

์ง€์ง€๋„+ํ–ฅ์ƒ๋„ ์ƒํ’ˆ : ๊ณ ๊ธฐ๋ฅ˜, ์‡ผํ•‘๋ฐฑ, ์Œ€+์š”๊ฑฐํŠธ

#3.lhs+์—ฌ๋Ÿฌ๊ฐœ item์กฐํ•ฉ (yogurt + rice)

yog_rice = subset(rules, lhs %in% c('yogurt','rice'))
yog_rice #6
inspect(yog_rice)


4.ํฌํ•จ๋ฌธ์ž (๋ถ€๋ถ„ ๋ฌธ์ž์—ด)

milk = subset(rules, rhs %pin% 'milk')
milk #18