01. tranExam.csv ํŒŒ์ผ์„ ๋Œ€์ƒ์œผ๋กœ ์ค‘๋ณต๋œ ํŠธ๋žœ์žญ์…˜ ์—†์ด 1~2์ปฌ๋Ÿผ๋งŒ single ํ˜•์‹์œผ๋กœ ํŠธ๋žœ์žญ์…˜ ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•˜์‹œ์˜ค.
(ํŒŒ์ผ๊ฒฝ๋กœ : tranExam.csv)

๋‹จ๊ณ„1 : ํŠธ๋žœ์žญ์…˜ ๊ฐ์ฒด ์ƒ์„ฑ ๋ฐ ํ™•์ธ
๋‹จ๊ณ„2 : ๊ฐ item๋ณ„๋กœ ๋นˆ๋„์ˆ˜ ํ™•์ธ
๋‹จ๊ณ„3 : ํŒŒ๋ผ๋ฏธํ„ฐ(supp=0.3, conf=0.1)๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ทœ์น™(rule) ์ƒ์„ฑ 
๋‹จ๊ณ„4 : ์—ฐ๊ด€๊ทœ์น™ ๊ฒฐ๊ณผ ๋ณด๊ธฐ

 

setwd("C:/ITWILL/2_Rwork/data")



 

 

๋‹จ๊ณ„1 : ํŠธ๋žœ์žญ์…˜ ๊ฐ์ฒด ์ƒ์„ฑ ๋ฐ ํ™•์ธ

library(arules)
tranExam <- read.transactions("tranExam.csv", format="single", 
                              sep=",", cols=c(1,2), rm.duplicates=T)


๋‹จ๊ณ„2 : ๊ฐ item๋ณ„๋กœ ๋นˆ๋„์ˆ˜ ํ™•์ธ : summary() ํ•จ์ˆ˜ ์ด์šฉ 

summary(tranExam)

5 rows (elements/itemsets/transactions) and : ๊ฑฐ๋ž˜์ˆ˜ 
4 columns (items) and a density of 0.6 : ์ƒํ’ˆ์ˆ˜ 

most frequent items:
  1       2       3      4 (Other) 
 4       3       3      2       0
inspect(tranExam)
 

๋‹จ๊ณ„3 : ํŒŒ๋ผ๋ฏธํ„ฐ(supp=0.3, conf=0.1)๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ทœ์น™(rule) ์ƒ์„ฑ 

rules = apriori(tranExam, parameter = list(supp=0.3, conf=0.1)) 
rules # set of 12 rules


๋‹จ๊ณ„4 : ์—ฐ๊ด€๊ทœ์น™ ๊ฒฐ๊ณผ ๋ณด๊ธฐ : inspect() ํ•จ์ˆ˜ ์ด์šฉ 

inspect(rules)

 





02. Adult ๋ฐ์ดํ„ฐ์…‹์„ ๋Œ€์ƒ์œผ๋กœ ๋‹ค์Œ ๋‹จ๊ณ„๋ณ„๋กœ ์—ฐ๊ด€๋ถ„์„์„ ์ˆ˜ํ–‰ํ•˜์‹œ์˜ค.


๋‹จ๊ณ„1: ์ตœ์†Œ support=0.5, ์ตœ์†Œ confidence=0.9๋ฅผ ์ง€์ •ํ•˜์—ฌ ์—ฐ๊ด€๊ทœ์น™ ์ƒ์„ฑ

data(Adult)
library(arulesViz)
rules = apriori(Adult, parameter = list(supp=0.5, conf=0.9)) # 52 rule(s)


๋‹จ๊ณ„2: ์ˆ˜ํ–‰ํ•œ ๊ฒฐ๊ณผ๋ฅผ lift ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•˜์—ฌ ์ƒ์œ„ 10๊ฐœ ๊ทœ์น™ ํ™•์ธ

inspect(head(sort(rules, by='lift'), 10))


๋‹จ๊ณ„3: ์—ฐ๊ด€๋ถ„์„ ๊ฒฐ๊ณผ๋ฅผ  LHS์™€ RHS์˜ ๋นˆ๋„์ˆ˜๋กœ ์‹œ๊ฐํ™” 

plot(rules, method="grouped")


๋‹จ๊ณ„4: ์—ฐ๊ด€๋ถ„์„ ๊ฒฐ๊ณผ๋ฅผ ์—ฐ๊ด€์–ด ๋„คํŠธ์›Œํฌ ํ˜•ํƒœ๋กœ ์‹œ๊ฐํ™”

plot(rules, method="graph")


๋‹จ๊ณ„5: ์—ฐ๊ด€์–ด ์ค‘์‹ฌ ๋‹จ์–ด ํ•ด์„ค
ํ›„ํ–‰์‚ฌ๊ฑด(rhs) : capital-loss=None, captial-gain=None
์„ ํ–‰์‚ฌ๊ฑด(lhs) : race=White, workclass=Private, sex=Male ๋“ฑ 

 

 




03. Adult ๋ฐ์ดํ„ฐ์…‹์„ ๋Œ€์ƒ์œผ๋กœ ๋‹ค์Œ ๋‹จ๊ณ„๋ณ„๋กœ ์—ฐ๊ด€๋ถ„์„์„ ์ˆ˜ํ–‰ํ•˜์‹œ์˜ค.

๋‹จ๊ณ„1 : support=0.3, confidence=0.95๊ฐ€ ๋˜๋„๋ก ์—ฐ๊ด€๊ทœ์น™ ์ƒ์„ฑ

rules = apriori(Adult, parameter = list(supp=0.3, conf=0.95)) # 124 rule(s)



๋‹จ๊ณ„2 : ์™ผ์ชฝ item์ด ๋ฐฑ์ธ(White)์ธ ๊ทœ์น™๋งŒ ์„œ๋ธŒ์…‹์œผ๋กœ ์ž‘์„ฑํ•˜๊ณ , ์‹œ๊ฐํ™”

white = subset(rules, lhs %in% 'race=White')
white # set of 46 rules 
plot(white, method='graph')


๋‹จ๊ณ„3 : ์™ผ์ชฝ item์ด ๋ฐฑ์ธ์ด๊ฑฐ๋‚˜ ๋ฏธ๊ตญ์ธ์„ ๋Œ€์ƒ์œผ๋กœ ์„œ๋ธŒ์…‹์„ ์ž‘์„ฑํ•˜๊ณ , ์‹œ๊ฐํ™”

white_usa = subset(rules, lhs %in% c('race=White', 'native-country=United-States'))
white_usa # set of 76 rules 
plot(white_usa, method='graph')


๋‹จ๊ณ„4 : ์˜ค๋ฅธ์ชฝ item์—์„œ 'Husband' ๋‹จ์–ด๋ฅผ ํฌํ•จํ•œ ๊ทœ์น™์„ ์„œ๋ธŒ์…‹์œผ๋กœ ์ž‘์„ฑํ•˜๊ณ , ์‹œ๊ฐํ™”

husband = subset(rules, rhs %pin% 'Husband')
husband # set of 12 rules 
plot(husband, method='graph')

 

+ Recent posts