T ๋ถ„ํฌ?

ํ‘œ๋ณธ์ˆ˜๊ฐ€ ์ž‘์€ ๊ฒฝ์šฐ(30๊ฐœ ๋ฏธ๋งŒ) ์ •๊ทœ๋ถ„ํฌ ๋Œ€์‹  ์‚ฌ์šฉํ•˜๋Š” ํ™•๋ฅ ๋ถ„ํฌ

์ •๊ทœ๋ถ„ํฌ ๊ฐ€์ • : ํ‘œ๋ณธ ํฌ๊ธฐ๊ฐ€ ํด ์ˆ˜๋ก ์ •๊ทœ๋ถ„ํฌ ๋ชจ์–‘์ด ๋น„์Šท

* ์ž์œ ๋„(df)๊ฐ€ ํด ์ˆ˜๋ก ์ •๊ทœ๋ถ„ํฌ์™€ ๋น„์Šทํ•ด์ง

* Z๋ถ„ํฌ : ํ‘œ๋ณธ์˜ ํฌ๊ธฐ๊ฐ€ ์ถฉ๋ถ„ํžˆ ํฐ(n > 30)๊ฒฝ์šฐ, ์ •๊ทœ๋ถ„ํฌ ๋ชจ์–‘์„ ๊ฐ–๋Š” ํ™•๋ฅ ๋ถ„ํฌ

 

 

T ๊ฒ€์ •

๋ชจ์ง‘๋‹จ์ด ์ •๊ทœ๋ถ„ํฌ์ด๊ณ , ๋ชจ์ง‘๋‹จ์˜ ๋ถ„์‚ฐ(σ2)์ด ์•Œ๋ ค์ง€์ง€ ์•Š์€ ๊ฒฝ์šฐ

ํ‘œ๋ณธ์˜ ํ‘œ์ค€ํŽธ์ฐจ ์ด์šฉํ•˜์—ฌ ๋ชจํ‰๊ท  ์ถ”์ •/๊ฒ€์ •(T๋ถ„ํฌํ‘œ ์ด์šฉ)

๊ธฐ๋ณธ ๊ฐ€์ • : ์ •๊ทœ์„ฑ, ๋“ฑ๋ถ„์‚ฐ์„ฑ(๋ชจ์ˆ˜ or ๋น„๋ชจ์ˆ˜ ๋ฐฉ๋ฒ• ๊ฒฐ์ •)

๋ฐฉ๋ฒ• : ๋‹จ์ผํ‘œ๋ณธ t๊ฒ€์ •, ๋…๋ฆฝํ‘œ๋ณธ t๊ฒ€์ •, ๋Œ€์‘ํ‘œ๋ณธ t๊ฒ€์ •

๋ชจ์ˆ˜๊ฒ€์ •(์ •๊ทœ๋ถ„ํฌ) ๊ธฐ๋ณธ๊ฐ€์ • ๋น„๋ชจ์ˆ˜๊ฒ€์ •(๋น„์ •๊ทœ๋ถ„ํฌ)
๋‹จ์ผํ‘œ๋ณธ t๊ฒ€์ • ์ •๊ทœ์„ฑ Wilcoxon signed rank test
๋…๋ฆฝํ‘œ๋ณธ t๊ฒ€์ • ๋“ฑ๋ถ„์‚ฐ์„ฑ Wilcoxon rank sum test, Mann-Whitney U-test
๋Œ€์‘ํ‘œ๋ณธ t๊ฒ€์ • ์ •๊ทœ์„ฑ Wilcoxon signed rank test

* ๋น„๋ชจ์ˆ˜๊ฒ€์ • : ๊ธฐ๋ณธ ๊ฐ€์ •์„ ๋งŒ์กฑํ•˜์ง€ ๋ชปํ•˜๊ฑฐ๋‚˜ ์ž๋ฃŒ๊ฐ€ ์ ์€ ๊ฒฝ์šฐ ์‚ฌ์šฉ๋˜๋Š” ๊ฒ€์ •๋ฐฉ๋ฒ•.

* R์—์„œ๋Š” ๋ชจ๋‘ wilcox.test() ํ•จ์ˆ˜ ์ด์šฉ

 

 

T๊ฒ€์ •ํ†ต๊ณ„๋Ÿ‰

 

 

T๋ถ„ํฌํ‘œ

 

* ๋‹จ์ธก๊ฒ€์ • : tํ†ต๊ณ„๋Ÿ‰(์ ˆ๋Œ€๊ฐ’) > ์ž„๊ณ„๊ฐ’ : ๊ท€๋ฌด๊ฐ€์„ค ๊ธฐ๊ฐ

 

 

์–‘์ธก๊ฒ€์ •?

์–‘์ธก๊ฒ€์ • ์ž„๊ณ„๊ฐ’ : = ๐œถ/2 = 0.025(2.5%) , df = n – 1 ์ด์šฉ(์ •๊ทœ๋ถ„ํฌ์ธ ๊ฒฝ์šฐ)

์–‘์ธก๊ฒ€์ •์—์„œ ๊ฒ€์ •ํ†ต๊ณ„๋Ÿ‰ t์˜ ์ ˆ๋Œ€๊ฐ’์ด ์ž„๊ณ„๊ฐ’๋ณด๋‹ค ํฌ๋ฉด ๊ท€๋ฌด๊ฐ€์„ค ๊ธฐ๊ฐ

 

ex.ํ‘œ๋ณธ ์ˆ˜ n = 10, ์œ ์˜์ˆ˜์ค€(α) = 0.05 ์ผ ๋•Œ ์–‘์ธก๊ฒ€์ •์˜ ์ž„๊ณ„๊ฐ’๊ณผ ๊ฐ€์„ค๊ฒ€์ •์€?

๊ฒ€์ •ํ†ต๊ณ„๋Ÿ‰ : t = 2.4154, df = 9 (T๋ถ„ํฌํ‘œ : ์ž„๊ณ„๊ฐ’ =2.262)

 

* ๊ฐ€์„ค๊ฒ€์ • : t๊ฐ’(2.4154) > ์ž„๊ณ„๊ฐ’(2.262) : ๊ท€๋ฌด๊ฐ€์„ค ๊ธฐ๊ฐ

 

 

 

 

02. ๋‹จ์ผํ‘œ๋ณธ T ๊ฒ€์ •?

๋ชจ์ง‘๋‹จ์˜ ๋ชจํ‰๊ท (๐œ‡๐œ‡)๊ณผ ํ‘œ๋ณธ์˜ ํ‰๊ท  ๊ฐ„์˜ ์ฐจ์ด๊ฐ€ ์žˆ๋Š”์ง€๋ฅผ ๊ฒ€์ •
๊ธฐ๋ณธ ๊ฐ€์ • : ์ •๊ทœ๋ถ„ํฌ(์ •๊ทœ์„ฑ ๊ฒ€์ •)

๊ธฐ๋ณธ ๊ฐ€์„ค : ๋ชจํ‰๊ท ๊ณผ ์ฐจ์ด๊ฐ€ ์—†๋‹ค

 

 

๋‹จ์ผํ‘œ๋ณธ T ๊ฒ€์ •์˜ ๋ชฉ์ 

ํ‘œ๋ณธ์˜ ํฌ๊ธฐ๊ฐ€ 30๊ฐœ ๋ฏธ๋งŒ์ด๊ณ , ๋ชจํ‰๊ท ์€ ์•Œ๊ณ  ์žˆ์ง€๋งŒ, ๋ชจ๋ถ„์‚ฐ์€ ๋ชจ๋ฅด๋Š” ๊ฒฝ์šฐ ํ‘œ๋ณธ์œผ๋กœ ๋ชจํ‰๊ท ์„ ๊ฒ€์ •ํ•˜๋Š” ์œ„ํ•ด์„œ ์‚ฌ์šฉ๋œ๋‹ค. ํ‘œ๋ณธ ์ˆ˜๊ฐ€ 30๊ฐœ ์ด์ƒ์ด๋ฉด ์ค‘์‹ฌ๊ทนํ•œ์ •๋ฆฌ์— ์˜ํ•ด์„œ ์ •๊ทœ์„ฑ์„ ๋งŒ์กฑํ•˜์ง€๋งŒ, 30๊ฐœ ๋ฏธ๋งŒ์ด๋ฉด ํ‘œ๋ณธ์ด ์ •๊ทœ์„ฑ์„ ๋งŒ์กฑํ•˜๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ •๊ทœ์„ฑ์„ ๊ฒ€์ •ํ•œ๋‹ค. ์ •๊ทœ์„ฑ์— ๋งŒ์กฑํ•˜๋ฉด ๋ชจ์ˆ˜ ๊ฒ€์ • ๋งŒ์กฑํ•˜์ง€ ์•Š์œผ๋ฉด ๋น„๋ชจ์ˆ˜ ๊ฒ€์ •์„ ํ•œ๋‹ค

 

 

๋‹จ์ผํ‘œ๋ณธ T๊ฒ€์ • ๋ถ„์„์ ˆ์ฐจ

์‹ค์ŠตํŒŒ์ผ ๊ฐ€์ ธ์˜ค๊ธฐ > ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ > ๊ธฐ์ˆ ํ†ต๊ณ„๋Ÿ‰(ํ‰๊ท ) > ์ •๊ทœ๋ถ„ํฌ (๊ธฐ๋ณธ๊ฐ€์ • : ์ •๊ทœ์„ฑ ๊ฒ€์ •)

> YES : t.test() / NO : wilcox.text() > ๊ฒ€์ •ํ†ต๊ณ„๋Ÿ‰ ๋ถ„์„

 

 

๋‹จ์ผํ‘œ๋ณธ T๊ฒ€์ • ๊ฐ€์„ค

<์—ฐ๊ตฌ๊ฐ€์„ค>
์—ฐ๊ตฌ๊ฐ€์„ค(H1) : ๊ตญ๋‚ด์—์„œ ์ƒ์‚ฐ๋œ ๋…ธํŠธ๋ถ๊ณผ AํšŒ์‚ฌ์—์„œ ์ƒ์‚ฐ๋œ ๋…ธํŠธ๋ถ์˜ ํ‰๊ท  ์‚ฌ์šฉ ์‹œ๊ฐ„์— ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค.
๊ท€๋ฌด๊ฐ€์„ค(H0) : ๊ตญ๋‚ด์—์„œ ์ƒ์‚ฐ๋œ ๋…ธํŠธ๋ถ๊ณผ AํšŒ์‚ฌ์—์„œ ์ƒ์‚ฐ๋œ ๋…ธํŠธ๋ถ์˜ ํ‰๊ท  ์‚ฌ์šฉ ์‹œ๊ฐ„์— ์ฐจ์ด๊ฐ€ ์—†๋‹ค.

<์—ฐ๊ตฌํ™˜๊ฒฝ> ๊ตญ๋‚ด์—์„œ ์ƒ์‚ฐ๋œ ๋…ธํŠธ๋ถ ํ‰๊ท  ์‚ฌ์šฉ ์‹œ๊ฐ„์ด 5.2์‹œ๊ฐ„์œผ๋กœ ํŒŒ์•…๋œ ์ƒํ™ฉ์—์„œ AํšŒ์‚ฌ์—์„œ ์ƒ์‚ฐ๋œ ๋…ธํŠธ๋ถ ํ‰๊ท  ์‚ฌ์šฉ์‹œ๊ฐ„๊ณผ ์ฐจ์ด๊ฐ€ ์žˆ๋Š”์ง€๋ฅผ ๊ฒ€์ •ํ•˜๊ธฐ ์œ„ํ•ด์„œ AํšŒ์‚ฌ ๋…ธํŠธ๋ถ150๋Œ€๋ฅผ ๋žœ๋ค์œผ๋กœ ์„ ์ •ํ•˜์—ฌ ๊ฒ€์ •์„ ์‹ค์‹œํ•œ๋‹ค.

1. ์‹ค์ŠตํŒŒ์ผ ๊ฐ€์ ธ์˜ค๊ธฐ

setwd("c:/ITWILL/2_Rwork/data")
data <- read.csv("one_sample.csv", header=TRUE)
str(data) # 150
head(data)
x <- data$time
head(x)


2. ๊ธฐ์ˆ ํ†ต๊ณ„๋Ÿ‰ ํ‰๊ท  ๊ณ„์‚ฐ

summary(x) # NA-41๊ฐœ
mean(x) # NA
mean(x, na.rm=T) # NA ์ œ์™ธ ํ‰๊ท (๋ฐฉ๋ฒ•1)
x <- na.omit(x) # NA ์ œ์™ธ ํ‰๊ท (๋ฐฉ๋ฒ•2)
mean(x)


3. ์ •๊ทœ๋ถ„ํฌ ๊ฒ€์ •
* ์ •๊ทœ๋ถ„ํฌ(๋ฐ”๋ฅธ ๋ถ„ํฌ) : ํ™•๋ฅ ๋ณ€์ˆ˜ x์— ๋Œ€ํ•œ ์ •๊ทœ์„ฑ ๊ฒ€์ • 
* ๊ท€๋ฌด๊ฐ€์„ค(H0) : ์ •๊ทœ๋ถ„ํฌ์™€ ์ฐจ์ด๊ฐ€ ์—†๋‹ค.

shapiro.test(x) # ์ •๊ทœ๋ถ„ํฌ ๊ฒ€์ • ์ˆ˜ํ–‰


4. ๊ฐ€์„ค๊ฒ€์ • - ๋ชจ์ˆ˜/๋น„๋ชจ์ˆ˜
์ •๊ทœ๋ถ„ํฌ(๋ชจ์ˆ˜๊ฒ€์ •) -> t.test()
๋น„์ •๊ทœ๋ถ„ํฌ(๋น„๋ชจ์ˆ˜๊ฒ€์ •) -> wilcox.test()

1) ์–‘์ธก๊ฒ€์ • - ์ •์ œ ๋ฐ์ดํ„ฐ์™€ 5.2์‹œ๊ฐ„ ๋น„๊ต

t.test(x, mu=5.2, alter="two.side", conf.level=0.95)

 

2) ๋ฐฉํ–ฅ์„ฑ์ด ์žˆ๋Š” ๋Œ€๋ฆฝ๊ฐ€์„ค ๊ฒ€์ • 

t.test(x, mu=5.2, alter="greater", conf.level=0.95)

 

 

[์—ฐ์Šต๋ฌธ์ œ] 01. ์šฐ๋ฆฌ๋‚˜๋ผ ์ „์ฒด ์ค‘ํ•™๊ต 2ํ•™๋…„ ์—ฌํ•™์ƒ ํ‰๊ท  ํ‚ค๊ฐ€ 148.5cm๋กœ ์•Œ๋ ค์ ธ ์žˆ๋Š” ์ƒํƒœ์—์„œ A์ค‘ํ•™๊ต 2ํ•™๋…„ ์ „์ฒด 500๋ช…์„ ๋Œ€์ƒ์œผ๋กœ 10%์ธ 50๋ช…์„ ํ‘œ๋ณธ์œผ๋กœ ์„ ์ •ํ•˜์—ฌ ํ‘œ๋ณธํ‰๊ท ์‹ ์žฅ์„ ๊ณ„์‚ฐํ•˜๊ณ , ๋ชจ์ง‘๋‹จ์˜ ํ‰๊ท ๊ณผ ์ฐจ์ด๊ฐ€ ์žˆ๋Š”์ง€๋ฅผ ๊ฒ€์ •ํ•˜์‹œ์˜ค.(๋‹จ์ผํ‘œ๋ณธ T๊ฒ€์ •)

๋‹จ๊ณ„1 : ๋ฐ์ดํ„ฐ์…‹ ๊ฐ€์ ธ์˜ค๊ธฐ

setwd('C:/ITWILL/2_Rwork/data')

stheight <- read.csv("student_height.csv")
stheight
height <- data$height
head(height)


๋‹จ๊ณ„2 : ๊ธฐ์ˆ ํ†ต๊ณ„๋Ÿ‰/๊ฒฐ์ธก์น˜ ํ™•์ธ

length(height) #50
summary(height) #ํ‰๊ท  149.4


๊ฒฐ์ธก์น˜๊ฐ€ ์žˆ๋‹ค๋ฉด

x = na.omit(height)
mean(x)

 

๋‹จ๊ณ„3 : ์ •๊ทœ์„ฑ ๊ฒ€์ • - ๊ธฐ๋ณธ๊ฐ€์ • 

shapiro.test(x) #p-value = 0.0001853 < 0.05 : ๊ท€๋ฌด๊ฐ€์„ค ๊ธฐ๊ฐ
hist(x)

* ํ‰๊ท ์— ๋น„ํ•ด ์™ผ์ชฝ์œผ๋กœ ์น˜์šฐ์นœ ๊ทธ๋ž˜ํ”„ ์ถœ๋ ฅ(์™œ๊ณกํ˜„์ƒ)


๋‹จ๊ณ„4 : ๊ฐ€์„ค๊ฒ€์ • - ์–‘์ธก๊ฒ€์ • : ๋น„๋ชจ์ˆ˜ ๊ฒ€์ •  

wilcox.test(x, mu=148.5) #๋ฐ์ดํ„ฐ๋ถ„ํฌ๊ฐ€ ๋น„์ •์ƒ์ ์ผ ๋•Œ wilcox.test()

* V = 826, p-value = 0.067 > 0.05

[ํ•ด์„ค] ๋ชจํ‰๊ท ๊ณผ ์ฐจ์ด๊ฐ€ ์—†๋‹ค

 

 

 

 

 

03. ๋…๋ฆฝํ‘œ๋ณธ T ๊ฒ€์ •(์„œ๋กœ ๋‹ค๋ฅธ ๋ชจ์ง‘๋‹จ)

์„œ๋กœ ๋…๋ฆฝ๋œ ๋ชจ์ง‘๋‹จ์œผ๋กœ ๋ถ€ํ„ฐ ์ถ”์ถœ๋œ ํ‘œ๋ณธ์˜ ํ‰๊ท  ์ฐจ์ด ๊ฒ€์ •

๊ธฐ๋ณธ ๊ฐ€์ • : ๋‘ ์ง‘๋‹จ์˜ ๋ถ„ํฌ๋Š” ๋™์ผํ•˜๋‹ค.(๋“ฑ๋ถ„์‚ฐ์„ฑ ๊ฒ€์ •)

๊ธฐ๋ณธ ๊ฐ€์„ค : ๋‘ ์ง‘๋‹จ๊ฐ„ ํ‰๊ท ์˜ ์ฐจ์ด๋Š” ์—†๋‹ค.

ex. A์Œ๋ฃŒ์ˆ˜์— ๋Œ€ํ•œ ๋‚จ๋…€๊ฐ„์˜ ๋งŒ์กฑ๋„์— ์ฐจ์ด๊ฐ€ ์žˆ๋Š”์ง€ or ์—†๋Š”์ง€

 

 

๋…๋ฆฝํ‘œ๋ณธ T๊ฒ€์ • ๋ถ„์„์ ˆ์ฐจ

์‹ค์ŠตํŒŒ์ผ ๊ฐ€์ ธ์˜ค๊ธฐ > ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ > ๋‘ ์ง‘๋‹จ subset์ž‘์„ฑ > ๊ธฐ์ˆ ํ†ต๊ณ„๋Ÿ‰(ํ‰๊ท ) > ๋™์งˆ์„ฑ (๊ธฐ๋ณธ๊ฐ€์ • : ๋“ฑ๋ถ„์‚ฐ์„ฑ ๊ฒ€์ •)

> YES : t.test() / NO : wilcox.text() > ๊ฒ€์ •ํ†ต๊ณ„๋Ÿ‰ ๋ถ„์„

 

 

๋…๋ฆฝํ‘œ๋ณธ T๊ฒ€์ •

<์—ฐ๊ตฌ๊ฐ€์„ค>
์—ฐ๊ตฌ๊ฐ€์„ค(H1) : ๊ต์œก๋ฐฉ๋ฒ•์— ๋”ฐ๋ฅธ ๋‘ ์ง‘๋‹จ ๊ฐ„ ์‹ค๊ธฐ์‹œํ—˜์˜ ํ‰๊ท ์— ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค.
๊ท€๋ฌด๊ฐ€์„ค(H0) : ๊ต์œก๋ฐฉ๋ฒ•์— ๋”ฐ๋ฅธ ๋‘ ์ง‘๋‹จ ๊ฐ„ ์‹ค๊ธฐ์‹œํ—˜์˜ ํ‰๊ท ์— ์ฐจ์ด๊ฐ€ ์—†๋‹ค.

<์—ฐ๊ตฌํ™˜๊ฒฝ> IT๊ต์œก์„ผํ„ฐ์—์„œ PT๋ฅผ ์ด์šฉํ•œ ํ”„๋ ˆ์  ํ…Œ์ด์…˜ ๊ต์œก๋ฐฉ๋ฒ•๊ณผ ์‹ค์‹œ๊ฐ„ ์ฝ”๋”ฉ ๊ต์œก๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ 1๊ฐœ์›” ๋™์•ˆ ๊ต์œก๋ฐ›์€ ๊ต์œก์ƒ ๊ฐ 150๋ช…์„ ๋Œ€์ƒ์œผ๋กœ ์‹ค๊ธฐ์‹œํ—˜์„ ์‹ค์‹œํ•˜์˜€๋‹ค. ๋‘ ์ง‘๋‹จ๊ฐ„ ์‹ค๊ธฐ์‹œํ—˜์˜ ํ‰๊ท ์— ์ฐจ์ด๊ฐ€ ์žˆ๋Š”๊ฐ€ ๊ฒ€์ •ํ•œ๋‹ค.

1. ์‹ค์ŠตํŒŒ์ผ ๊ฐ€์ ธ์˜ค๊ธฐ

data <- read.csv("two_sample.csv")
data 
head(data) #4๊ฐœ ๋ณ€์ˆ˜ ํ™•์ธ
summary(data) # score - NA's : 73๊ฐœ


2. ๋‘ ์ง‘๋‹จ subset ์ž‘์„ฑ(๋ฐ์ดํ„ฐ ์ •์ œ,์ „์ฒ˜๋ฆฌ)
result <- subset(data, !is.na(score), c(method, score))

dataset <- data[c('method', 'score')]
table(dataset$method)


3. ๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌ
1) ๊ต์œก๋ฐฉ๋ฒ• ๋ณ„๋กœ ๋ถ„๋ฆฌ

method1 <- subset(dataset, method==1)
method2 <- subset(dataset, method==2)


2) ๊ต์œก๋ฐฉ๋ฒ•์—์„œ ์ ์ˆ˜ ์ถ”์ถœ

method1_score <- method1$score
method2_score <- method2$score


3) ๊ธฐ์ˆ ํ†ต๊ณ„๋Ÿ‰ 

length(method1_score); # 150
length(method2_score); # 150


4. ๋“ฑ๋ถ„์‚ฐ์„ฑ ๊ฒ€์ • : ๋‘ ์ง‘๋‹จ์˜ ๋ถ„์‚ฐ ์ฐจ์ด ๊ฒ€์ •

var.test(method1_score, method2_score) #F = 1.2158, num df = 108, denom df = 117, p-value = 0.3002

* ๊ท€๋ฌด๊ฐ€์„ค : ๋‘ ์ง‘๋‹จ์˜ ๋ถ„ํฌ๋Š” ๋™์ผํ•˜๋‹ค.
* ๋™์งˆ์„ฑ ๋ถ„ํฌ : t.test()
* ๋น„๋™์งˆ์„ฑ ๋ถ„ํฌ : wilcox.test()

5. ๊ฐ€์„ค๊ฒ€์ • - ๋‘์ง‘๋‹จ ํ‰๊ท  ์ฐจ์ด๊ฒ€์ •

t.test(method1_score, method2_score, alter="two.sided", conf.int=TRUE, conf.level=0.95)
#t = -2.0547, df = 218.19, p-value = 0.0411

* -1.961 < t < 1.96 : ์ฑ„ํƒ์—ญ. ์ฆ‰ t๋Š” ๊ธฐ๊ฐ์—ญ, p๋Š” ์‹ ๋ขฐ์ˆ˜์ค€์—์„œ ์•ฝ๊ฐ„ ๋ฒ—์–ด๋‚œ ๊ฐ’์ด๋‹ค.

๋ฐฉํ–ฅ์„ฑ์ด ์žˆ๋Š” ์—ฐ๊ตฌ๊ฐ€์„ค ๊ฒ€์ •(๊ธฐ๊ฐ) ๋ฐฉ๋ฒ•1 > ๋ฐฉ๋ฒ•2

t.test(method1_score, method2_score, alter="greater", conf.int=TRUE, conf.level=0.95)


๋ฐฉํ–ฅ์„ฑ์ด ์žˆ๋Š” ์—ฐ๊ตฌ๊ฐ€์„ค ๊ฒ€์ •(์ฑ„ํƒ) : ๋ฐฉ๋ฒ• 1 < ๋ฐฉ๋ฒ•2

t.test(method1_score, method2_score, alter="less", conf.int=TRUE, conf.level=0.95)
#p-value = 0.02055 < 0.05

 

 

 

[์—ฐ์Šต๋ฌธ์ œ] ๊ต์œก๋ฐฉ๋ฒ•์— ๋”ฐ๋ผ ์‹œํ—˜์„ฑ์ ์— ์ฐจ์ด๊ฐ€ ์žˆ๋Š”์ง€ ๊ฒ€์ •ํ•˜์‹œ์˜ค.(๋…๋ฆฝํ‘œ๋ณธ T๊ฒ€์ •)
์กฐ๊ฑด1) ๋ณ€์ˆ˜ : method : ๊ต์œก๋ฐฉ๋ฒ•, score : ์‹œํ—˜์„ฑ์ 
์กฐ๊ฑด2) ๋ชจ๋ธ : ๊ต์œก๋ฐฉ๋ฒ•(๋ช…๋ชฉ์ฒ™๋„ =๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜)  ->  ์‹œํ—˜์„ฑ์ (๋น„์œจ์ฒ™๋„ =์—ฐ์†ํ˜• ๋ณ€์ˆ˜)
๊ต์œก๋ฐฉ๋ฒ•์— ๋”ฐ๋ผ ์‹œํ—˜ ์ ์ˆ˜์— ์–ด๋–ค ์ฐจ์ด๊ฐ€ ์žˆ๋‚˜?
์กฐ๊ฑด3) ์ „์ฒ˜๋ฆฌ : ๊ฒฐ์ธก์น˜ ์ œ๊ฑฐ : ํ‰๊ท ์œผ๋กœ ๋Œ€์ฒด 

๋‹จ๊ณ„1. ์‹ค์ŠตํŒŒ์ผ ๊ฐ€์ ธ์˜ค๊ธฐ

Data <- read.csv("twomethod.csv", header=TRUE)
head(Data) #3๊ฐœ ๋ณ€์ˆ˜ ํ™•์ธ -> id method score


๋‹จ๊ณ„2. ๋‘ ์ง‘๋‹จ subset ์ž‘์„ฑ

unique(Data$method) # 1 2

* ๋นˆ๋„์ˆ˜ ์กฐํšŒ ํ•จ์ˆ˜ table VS unique (์ค‘๋ณต๋˜์ง€ ์•Š๋Š” ์œ ์ผํ•œ ๊ฐ’์˜ ๋ฒ”์ฃผ ์ถœ๋ ฅ)

๋ณ€์ˆ˜ ์„ ํƒ -> ์„œ๋ธŒ์…‹ ์ƒ์„ฑ 

data_df <- Data[c('method', 'score')]
data_df


๋‹จ๊ณ„3. ๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌ
1) ์ง‘๋‹จ(๊ต์œก๋ฐฉ๋ฒ•)์œผ๋กœ ๋ถ„๋ฆฌ

method1 = subset(data$data_df, method == 1)
method2 = subset(data$data_df, method == 2)
dim(method1) #24๊ฐœ์˜ ๊ด€์ธก์น˜ 2๊ฐœ์˜ ๋ณ€์ˆ˜
dim(method2) #39๊ฐœ์˜ ๊ด€์ธก์น˜ 2๊ฐœ์˜ ๋ณ€์ˆ˜

2) ๊ต์œก๋ฐฉ๋ฒ•์—์„œ ์‹œํ—˜์„ฑ์  ์ถ”์ถœ

score1 = method1$score
score2 = method2$score


๋‹จ๊ณ„4 : ๋ถ„ํฌ๋ชจ์–‘ ๊ฒ€์ •
์„œ๋กœ ๋‹ค๋ฅธ ๋ชจ์ง‘๋‹จ์—์„œ ์ถ”์ถœํ•œ ์ ์ˆ˜์˜ ๋ถ„ํฌ์— ์ฐจ์ด๊ฐ€ ์žˆ๋Š”์ง€? =๋“ฑ๋ถ„์‚ฐ์„ฑ ๊ฒ€์ •. ์—ฐ์†ํ˜• ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•จ.

var.test(score1, score2) #p-value = 0.8494


๋‹จ๊ณ„5: ๊ฐ€์„ค๊ฒ€์ •

t.test(score1, score2)
#t=-5.6056(์ ˆ๋Œ€๊ฐ’), df=43.705, p-value=1.303e-06 =0.000001303

* t๋Š” ์ฑ„ํƒ์—ญ์—์„œ ๋งŽ์ด ๋ฒ—์–ด๋‚ฌ์œผ๋ฏ€๋กœ ๊ท€๋ฌด๊ฐ€์„ค์ด ๊ธฐ๊ฐ๋  ํ™•๋ฅ ์ด ๋†’๋‹ค.

 

 

 

 

 

04. ๋Œ€์‘ํ‘œ๋ณธ T๊ฒ€์ •

๋Œ€์‘ํ‘œ๋ณธ T ๊ฒ€์ •(๋™์ผํ•œ ๋ชจ์ง‘๋‹จ)?

๋™์ผํ•œ ๋ชจ์ง‘๋‹จ ๋Œ€์ƒ ๋‘ ๋ฒˆ ๋ฐ˜๋ณต ์ธก์ •ํ•˜์—ฌ ์ „๊ณผ ํ›„ ํ‰๊ท  ์ฐจ์ด ๊ฒ€์ •

๊ธฐ๋ณธ ๊ฐ€์ • : ๋‘ ์ง‘๋‹จ์˜ ๋ถ„ํฌ๋Š” ๋™์ผํ•˜๋‹ค.(๋“ฑ๋ถ„์‚ฐ์„ฑ ๊ฒ€์ •)

๊ธฐ๋ณธ ๊ฐ€์„ค : ๋‘ ์ง‘๋‹จ๊ฐ„ ํ‰๊ท ์˜ ์ฐจ์ด๋Š” ์—†๋‹ค.

ex. A๋‹ค์ด์–ดํŠธ์‹ํ’ˆ ๋ณต์šฉ ์ „๊ณผ ํ›„ ๋ชธ๋ฌด๊ฒŒ์— ์ฐจ์ด๊ฐ€ ์žˆ๋Š”์ง€ or ์—†๋Š”์ง€

 

 

1. ์‹ค์ŠตํŒŒ์ผ ๊ฐ€์ ธ์˜ค๊ธฐ

getwd()
setwd("c:/ITWILL/2_Rwork/data")
data <- read.csv("paired_sample.csv", header=TRUE)
head(data)


2. ๋‘ ์ง‘๋‹จ subset ์ž‘์„ฑ
1) ๋ฐ์ดํ„ฐ ์ •์ œ
result <- subset(data, !is.na(after), c(before,after))

dataset <- data[ c('before',  'after')]
dataset


2) ์ ์šฉ์ „๊ณผ ์ ์šฉํ›„ ๋ถ„๋ฆฌ

before <- dataset$before# ๊ต์ˆ˜๋ฒ• ์ ์šฉ์ „ ์ ์ˆ˜
after <- dataset$after # ๊ต์ˆ˜๋ฒ• ์ ์šฉํ›„ ์ ์ˆ˜
before; after


3) ๊ธฐ์ˆ ํ†ต๊ณ„๋Ÿ‰ 

length(before) # 100
length(after) # 100
mean(before) # 5.145
mean(after, na.rm = T) # 6.220833


3. ์ •๊ทœ์„ฑ๊ฒ€์ • : diff = before - after 

diff = after-before
shapiro.test(after-before) #p-value = 0.05705 >= 0.05 ์ •๊ทœ๋ถ„ํฌ๋ผ๊ณ  ๊ฐ€์ •ํ•  ์ˆ˜ ์žˆ์Œ

* ์ •๊ทœ๋ถ„ํฌ : t.test()
* ๋น„์ •๊ทœ๋ถ„ํฌ : wilcox.test()

4. ๊ฐ€์„ค๊ฒ€์ •

t.test(before, after, paired=TRUE) # p-value < 2.2e-16 : ๊ท€๋ฌด๊ฐ€์„ค ๊ธฐ๊ฐ


๋ฐฉํ–ฅ์„ฑ์ด ์žˆ๋Š” ์—ฐ๊ตฌ๊ฐ€์„ค ๊ฒ€์ • before > after

t.test(before, after, paired=TRUE,alter="greater",conf.int=TRUE, conf.level=0.95)


๋ฐฉํ–ฅ์„ฑ์ด ์žˆ๋Š” ์—ฐ๊ตฌ๊ฐ€์„ค ๊ฒ€์ •

t.test(before, after, paired=TRUE,alter="less",conf.int=TRUE, conf.level=0.95)

 

 

 

 

 

05. F๋ถ„ํฌ์™€ ๊ฒ€์ •

์นด์ด์ œ๊ณฑ๋ถ„ํฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋งŒ๋“ค์–ด์ง„ ๊ฒ€์ • (์ž์œ ๋„(df)๊ฐ€ ํด ์ˆ˜๋ก ์ขŒ์šฐ๋Œ€์นญ์™€ ๋น„์Šทํ•ด์ง)

 

 

F๊ฒ€์ •(=๋ถ„์‚ฐ๋ถ„์„)?

๋ชจ์ง‘๋‹จ ์ •๊ทœ๋ถ„ํฌ์ด๊ณ , ๋ชจ์ง‘๋‹จ์˜ ๋ถ„์‚ฐ(σ2)์ด ์•Œ๋ ค์ง€์ง€ ์•Š์€ ๊ฒฝ์šฐ, 3๊ฐœ ์ด์ƒ์˜ ๋ชจ์ง‘๋‹จ์˜ ๋ถ„์‚ฐ์ด ๊ฐ™์€์ง€ or ๋‹ค๋ฅธ์ง€ ๊ฒ€์ •

๊ฐ ๋ชจ์ง‘๋‹จ์˜ ๋ถ„์‚ฐ์— ๋Œ€ํ•œ ๋น„์œจ ์ถ”์ •(F๋ถ„ํฌํ‘œ ์ด์šฉ)

๊ธฐ๋ณธ ๊ฐ€์ • : ๊ฐ ์ง‘๋‹จ์˜ ๋ถ„ํฌ๋Š” ๋™์ผํ•˜๋‹ค.(๋“ฑ๋ถ„์‚ฐ์„ฑ ๊ฒ€์ •)

๊ธฐ๋ณธ ๊ฐ€์„ค : ๊ฐ ์ง‘๋‹จ์˜ ํ‰๊ท ์˜ ์ฐจ์ด๋Š” ์—†๋‹ค.

* ๋ฐฉ๋ฒ• : ์ผ์›๋ถ„์‚ฐ๋ถ„์„, ์ด์›๋ถ„์‚ฐ๋ถ„์„, ๋‹ค์› ๋ณ€๋Ÿ‰ ๋ถ„์‚ฐ๋ถ„์„

* ๊ฒ€์ • ๋ฐฉ๋ฒ• : ๋ถ„์‚ฐ๋ถ„์„, ๋ชจ์ˆ˜(์ •๊ทœ๋ถ„ํฌ) : ์ผ์›๋ฐฐ์น˜๋ถ„์‚ฐ๋ถ„์„, ๋น„๋ชจ์ˆ˜(๋น„์ •๊ทœ๋ถ„ํฌ) : ํฌ๋ฃจ์Šค์นผ-์›”๋ฆฌ์Šค(Kruskal-Wallis)๊ฒ€์ •

 

 

F๊ฒ€์ • ๋ฐฉ๋ฒ•

์ข…๋ฅ˜ ๋ณ€์ˆ˜ ๊ฐœ์ˆ˜ ์‚ฌ๋ก€
์ผ์› ๋ถ„์‚ฐ๋ถ„์„ ๋…๋ฆฝ๋ณ€์ˆ˜ : 1๊ฐœ
์ข…์†๋ณ€์ˆ˜ : 1๊ฐœ
๊ต์œก ๋ฐฉ๋ฒ•์— ๋”ฐ๋ฅธ ์„ฑ์  ๋น„๊ต
๋…๋ฆฝ๋ณ€์ˆ˜(๋ฒ”์ฃผํ˜•) : ๋ฐฉ๋ฒ•1, ๋ฐฉ๋ฒ•2, ๋ฐฉ๋ฒ•3
์ข…์†๋ณ€์ˆ˜(์—ฐ์†ํ˜•) : ์„ฑ์ 
์ด์› ๋ถ„์‚ฐ๋ถ„์„ ๋…๋ฆฝ๋ณ€์ˆ˜ : 2๊ฐœ
์ข…์†๋ณ€์ˆ˜ : 1๊ฐœ
์‡ผํ•‘๋ชฐ ๊ณ ๊ฐ์˜ ์—ฐ๋ น๋Œ€(30,40,50๋Œ€), ์‹œ๊ฐ„๋Œ€(์˜ค์ „/์˜คํ›„)๋ณ„ ๊ตฌ๋งคํ˜„ํ™ฉ
๋…๋ฆฝ๋ณ€์ˆ˜(๋ฒ”์ฃผํ˜•) : ์—ฐ๋ น๋Œ€, ์‹œ๊ฐ„๋Œ€
์ข…์†๋ณ€์ˆ˜(์—ฐ์†ํ˜•) : ๊ตฌ๋งคํ˜„ํ™ฉ
๋‹ค์› ๋ณ€๋Ÿ‰ ๋ถ„์‚ฐ๋ถ„์„ ๋…๋ฆฝ๋ณ€์ˆ˜ : 1๊ฐœ, 2๊ฐœ
์ข…์†๋ณ€์ˆ˜ : 2๊ฐœ
์‡ผํ•‘๋ชฐ ๊ณ ๊ฐ์˜ ์—ฐ๋ น๋Œ€(30,40,50๋Œ€), ์‹œ๊ฐ„๋Œ€(์˜ค์ „/์˜คํ›„)๋ณ„ ๊ตฌ๋งคํ˜„ํ™ฉ
๋…๋ฆฝ๋ณ€์ˆ˜(๋ฒ”์ฃผํ˜•) : ์—ฐ๋ น๋Œ€, ์‹œ๊ฐ„๋Œ€
์ข…์†๋ณ€์ˆ˜(์—ฐ์†ํ˜•) : ๊ตฌ๋งคํ˜„ํ™ฉ

 

 

 

 

 

 

6. ๋ถ„์‚ฐ ๋ถ„์„(ANOVA Analysis)

๋ถ„์‚ฐ๋ถ„์„(์„œ๋กœ ๋‹ค๋ฅธ ๋ชจ์ง‘๋‹จ)? ์„œ๋กœ ๋…๋ฆฝ๋œ 3๊ฐœ ์ด์ƒ ๋ชจ์ง‘๋‹จ ๊ฐ„์˜ ํ‰๊ท  ์ฐจ์ด ๊ฒ€์ • ๏ƒผ ๊ธฐ๋ณธ ๊ฐ€์ • : ๊ฐ ์ง‘๋‹จ์˜ ๋ถ„ํฌ๋Š” ๋™์ผํ•˜๋‹ค.(๋“ฑ๋ถ„์‚ฐ์„ฑ ๊ฒ€์ •)

๊ธฐ๋ณธ ๊ฐ€์„ค : ๊ฐ ์ง‘๋‹จ๊ฐ„ ํ‰๊ท ์˜ ์ฐจ์ด๋Š” ์—†๋‹ค.

๋Œ€๋ฆฝ ๊ฐ€์„ค : ์ ์–ด๋„ ํ•œ ์ง‘๋‹จ ์ด์ƒ ํ‰๊ท ์˜ ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค.

ex. A์Œ๋ฃŒ์ˆ˜์— ๋Œ€ํ•œ ์—ฐ๋ น๋ณ„(20๋Œ€,30๋Œ€,40๋Œ€) ๋งŒ์กฑ๋„์— ์ฐจ์ด๊ฐ€ ์žˆ๋Š”์ง€ or ์—†๋Š”์ง€

 

 

๋ถ„์‚ฐ ๋ถ„์„ ์ ˆ์ฐจ

์‹ค์ŠตํŒŒ์ผ ๊ฐ€์ ธ์˜ค๊ธฐ > ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ > ์„ธ ์ง‘๋‹จ subset ์ž‘์„ฑ > ๊ธฐ์ˆ ํ†ต๊ณ„๋Ÿ‰(ํ‰๊ท ) > ๋™์งˆ์„ฑ (๊ธฐ๋ณธ๊ฐ€์ • : ๋“ฑ๋ถ„์‚ฐ ๊ฒ€์ •. bartlett.test()) > YES : aov() / NO : kruskal.test() > ์‚ฌํ›„๊ฒ€์ •

 

 

๋ถ„์‚ฐ ๋ถ„์„

<์—ฐ๊ตฌ๊ฐ€์„ค>
์—ฐ๊ตฌ๊ฐ€์„ค(H1) : ๊ต์œก๋ฐฉ๋ฒ•์— ๋”ฐ๋ฅธ ์„ธ ์ง‘๋‹จ ๊ฐ„ ์‹ค๊ธฐ์‹œํ—˜์˜ ํ‰๊ท ์— ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค.
๊ท€๋ฌด๊ฐ€์„ค(H0) : ๊ต์œก๋ฐฉ๋ฒ•์— ๋”ฐ๋ฅธ ์„ธ ์ง‘๋‹จ ๊ฐ„ ์‹ค๊ธฐ์‹œํ—˜์˜ ํ‰๊ท ์— ์ฐจ์ด๊ฐ€ ์—†๋‹ค.

<์—ฐ๊ตฌํ™˜๊ฒฝ> ์„ธ ๊ฐ€์ง€ ๊ต์œก๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ 1๊ฐœ์›” ๋™์•ˆ ๊ต์œก๋ฐ›์€ ๊ต์œก์ƒ ๊ฐ 50๋ช…์”ฉ์„ ๋Œ€์ƒ์œผ๋กœ ์‹ค๊ธฐ์‹œํ—˜์„ ์‹ค์‹œํ•˜์˜€๋‹ค. ์„ธ ์ง‘๋‹จ๊ฐ„ ์‹ค๊ธฐ์‹œํ—˜์˜ ํ‰๊ท ์— ์ฐจ์ด๊ฐ€ ์žˆ๋Š”๊ฐ€ ๊ฒ€์ •ํ•œ๋‹ค.

 

1. ํŒŒ์ผ ๊ฐ€์ ธ์˜ค๊ธฐ

data <- read.csv("three_sample.csv")
data


2. ๋ฐ์ดํ„ฐ ์ •์ œ/์ „์ฒ˜๋ฆฌ - NA, outline ์ œ๊ฑฐ

data <- subset(data, !is.na(score), c(method, score)) 
data # method, score

(1) ์ฐจํŠธ์ด์šฉ - ontlier ๋ณด๊ธฐ(๋ฐ์ดํ„ฐ ๋ถ„ํฌ ํ˜„ํ™ฉ ๋ถ„์„)

plot(data$score) # ์ฐจํŠธ๋กœ outlier ํ™•์ธ : 50์ด์ƒ๊ณผ ์Œ์ˆ˜๊ฐ’
barplot(data$score) # ๋ฐ” ์ฐจํŠธ
mean(data$score) # 14.45

(2) outlier ์ œ๊ฑฐ - ํ‰๊ท (14) ์ด์ƒ ์ œ๊ฑฐ

length(data$score)#91
data2 <- subset(data, score <= 14) # 14์ด์ƒ ์ œ๊ฑฐ
length(data2$score) #88(3๊ฐœ ์ œ๊ฑฐ)

(3) ์ •์ œ๋œ ๋ฐ์ดํ„ฐ ๋ณด๊ธฐ 

x <- data2$score
boxplot(x) #์ด์ƒ์น˜ ํ™•์ธ
plot(x)


3. ์ง‘๋‹จ๋ณ„ subset ์ž‘์„ฑ
* method: 1:๋ฐฉ๋ฒ•1, 2:๋ฐฉ๋ฒ•2, 3:๋ฐฉ๋ฒ•3

data2$method2[data2$method==1] <- "๋ฐฉ๋ฒ•1" 
data2$method2[data2$method==2] <- "๋ฐฉ๋ฒ•2"
data2$method2[data2$method==3] <- "๋ฐฉ๋ฒ•3"

table(data2$method2) # ๊ต์œก๋ฐฉ๋ฒ• ๋ณ„ ๋นˆ๋„์ˆ˜

 

4. ๋“ฑ๋ถ„์‚ฐ์„ฑ ๊ฒ€์ • : ๋™์งˆ์„ฑ ๊ฒ€์ • 

#bartlett.test(์ข…์†๋ณ€์ˆ˜ ~ ๋…๋ฆฝ๋ณ€์ˆ˜) # ๋…๋ฆฝ๋ณ€์ˆ˜(์„ธ ์ง‘๋‹จ)
bartlett.test(score ~ method2, data=data2)
# p-value = 0.1905 >= 0.05

* ๊ท€๋ฌด๊ฐ€์„ค : ์ง‘๋‹จ ๊ฐ„ ๋ถ„ํฌ์˜ ๋ชจ์–‘์ด ๋™์งˆ์ ์ด๋‹ค.
* [ํ•ด์„ค] ์œ ์˜์ˆ˜์ค€ 0.05๋ณด๋‹ค ํฌ๊ธฐ ๋•Œ๋ฌธ์— ๊ท€๋ฌด๊ฐ€์„ค์„ ๊ธฐ๊ฐํ•  ์ˆ˜ ์—†๋‹ค. 

* ๋™์งˆํ•œ ๊ฒฝ์šฐ : aov() - Analysis of Variance(๋ถ„์‚ฐ๋ถ„์„)
* ๋™์งˆํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ - kruskal.test()

 

5. ๋ถ„์‚ฐ๊ฒ€์ •(์ง‘๋‹จ์ด 2๊ฐœ ์ด์ƒ์ธ ๊ฒฝ์šฐ ๋ถ„์‚ฐ๋ถ„์„์ด๋ผ๊ณ  ํ•จ)
* aov(์ข…์†๋ณ€์ˆ˜ ~ ๋…๋ฆฝ๋ณ€์ˆ˜, data=data set)
* ๊ท€๋ฌด๊ฐ€์„ค : ์ง‘๋‹จ ๊ฐ„ ํ‰๊ท ์— ์ฐจ์ด๊ฐ€ ์—†๋‹ค.

result <- aov(score ~ method2, data=data2)

* aov()์˜ ๊ฒฐ๊ณผ๊ฐ’์€ summary()ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด์•ผ p-value ํ™•์ธ 

summary(result)

* F๊ฐ€์„ค์ด ํด์ˆ˜๋ก, P-value(๊ท€๋ฌด๊ฐ€์„ค ์ง€์ง€๊ฐ’)์€ ๋‚ฎ์•„์ง„๋‹ค.
* Pr(>F)=9.39e-14 >= 0.05 ์œ ์˜๋ฏธํ•˜๊ฒŒ ๊ท€๋ฌด๊ฐ€์„ค ๊ธฐ๊ฐ
[ํ•ด์„ค] ์ ์–ด๋„ ํ•œ ์ง‘๋‹จ ์ด์ƒ์—์„œ ํ‰๊ท ์— ์ฐจ์ด๋ฅผ ๋ณด์ธ๋‹ค.

6. ์‚ฌํ›„๊ฒ€์ • : ์„ธ๋ถ€์ ์ธ ์ง‘๋‹จ ๊ฐ„ ์ฐจ์ด ๊ฒ€์ •

TukeyHSD(result)

               diff        lwr        upr     p adj
๋ฐฉ๋ฒ•2-๋ฐฉ๋ฒ•1  2.612903  1.9424342  3.2833723 0.0000000 : ์ ์ˆ˜ ํ‰๊ท  ์ฐจ์ด๊ฐ€ ๊ฐ€์žฅ ๋งŽ์ด ๋ณด์ž„
๋ฐฉ๋ฒ•3-๋ฐฉ๋ฒ•1  1.422903  0.7705979  2.0752085 0.0000040 : ์ฐจ์ด ์žˆ์Œ
๋ฐฉ๋ฒ•3-๋ฐฉ๋ฒ•2 -1.190000 -1.8656509 -0.5143491 0.0001911 : ์ฐจ์ด ์žˆ์Œ
* diff : ์‹ ๋ขฐ๊ตฌ๊ฐ„(95%)์˜ ์ƒํ•œ๊ฐ’(upr)-ํ•˜ํ•œ๊ฐ’(lwr)
* p adj : p-value (=์œ ์˜ํ™•๋ฅ )

์ฐจํŠธ๋กœ ํ•ด์„

plot (TukeyHSD(result))

 

[ํ•ด์„ค] 3๊ฐœ์˜ ๋ชจ๋“  ์‹ ๋ขฐ๊ตฌ๊ฐ„์ด ์ค‘๊ฐ„์˜ 0์„ ํฌํ•จํ•˜๊ณ  ์žˆ์ง€ ์•Š์œผ๋ฏ€๋กœ ํ‰๊ท ์˜ ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค

 

 

 


๊ทธ๋ฃน๋ณ„ ํ†ต๊ณ„ : ๋ถ„์‚ฐ๋ถ„์„ ์‚ฌํ›„๊ฒ€์ •์—์„œ ์ด์šฉ

install.packages('dplyr')

* library(dplyr) #%>%์—ฐ์‚ฐ์ž, group_by(), summarize()

 

ํ˜•์‹) df %>% group_by('๋ฒ”์ฃผํ˜•๋ณ€์ˆ˜') %>% summarize(var_name = function(column_name))
* function : sum, mean, median, sd, var, min, max ๋“ฑ 
* ๊ฒฝ๊ณ ๋ฉ”์‹œ์ง€ ๋ฌด์‹œ 

๊ต์œก๋ฐฉ๋ฒ•๋ณ„ ์ ์ˆ˜ ํ‰๊ท  

data2 %>% group_by(method2) %>% summarize(avg = mean(score))

method2   avg
 <chr>   <dbl>
1 ๋ฐฉ๋ฒ•1    4.19
2 ๋ฐฉ๋ฒ•2    6.8

3 ๋ฐฉ๋ฒ•3    5.61
* ๋ฐฉ๋ฒ•2(6.8) - ๋ฐฉ๋ฒ•1(4.19) = 2.612903

 

* subset ์ด์šฉํ•œ ๊ต์œก๋ฐฉ๋ฒ•๋ณ„ ์ ์ˆ˜ ํ‰๊ท 

names(data2) "method"  "score"   "method2"
method1 = subset(data2, method2 == '๋ฐฉ๋ฒ•1')
method2 = subset(data2, method2 == '๋ฐฉ๋ฒ•2')
method3 = subset(data2, method2 == '๋ฐฉ๋ฒ•3')
mean(method1$score) #4.187097
mean(method2$score) #6.8
mean(method3$score) #5.61




๋น„๋ชจ์ˆ˜ ๊ฒ€์ • : iris ์ ์šฉ

str(iris)
table(iris) #3๊ฐœ์˜ ์ง‘๋‹จ ํ™•์ธ

* Species:๋…๋ฆฝ๋ณ€์ˆ˜(๋ฒ”์ฃผํ˜•๋ณ€์ˆ˜)
* Sepal.Length, Sepal.Width, Petal.Length, Petal.Width:์ข…์†๋ณ€์ˆ˜(์—ฐ์†ํ˜•๋ณ€์ˆ˜)

1. ๋“ฑ๋ถ„์‚ฐ์„ฑ๊ฒ€์ •

bartlett.test(Sepal.Width ~ Species, data=iris) #p-value = 0.3515 : ๋ชจ์ˆ˜๊ฒ€์ •
bartlett.test(Petal.Length ~ Species, data=iris) #p-value = 9.229e-13 : ๋น„๋ชจ์ˆ˜๊ฒ€์ •


2. ๋ถ„์‚ฐ๋ถ„์„ : ๋น„๋ชจ์ˆ˜ ๊ฒ€์ •

model = kruskal.test(Petal.Length ~ Species, data=iris)
model #p-value < 2.2e-16

[ํ•ด์„ค] ์ ์–ด๋„ ํ•œ ์ง‘๋‹จ์— ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค


3. ์‚ฌํ›„๊ฒ€์ • (๊ฝƒ์˜ ์ข…๋ณ„๋กœ ๊ฝƒ๋ฐ›์นจ์˜ ํ‰๊ท )
#* ukeyHSD(model)์€ error

iris %>% group_by(Species) %>% summarise(avg=mean(Sepal.Length))

Species      avg
 <fct>      <dbl>
1 setosa      5.01
2 versicolor  5.94
3 virginica   6.59

 

 




[์˜ˆ์ œ] ์‡ผํ•‘๋ชฐ ๊ณ ๊ฐ์˜ ์—ฐ๋ น๋Œ€(20,30,40)๋ณ„, ์‹œ๊ฐ„๋Œ€(์˜ค์ „/์˜คํ›„)๋ณ„ ๊ตฌ๋งคํ˜„ํ™ฉ
์ข…์†๋ณ€์ˆ˜ : ๊ตฌ๋งค์ˆ˜๋Ÿ‰(์—ฐ์†ํ˜•)
๋…๋ฆฝ๋ณ€์ˆ˜1 : ์—ฐ๋ น๋Œ€๋ณ„(๋ฒ”์ฃผํ˜•)
๋…๋ฆฝ๋ณ€์ˆ˜2 : ์‹œ๊ฐ„๋Œ€๋ณ„(๋ฒ”์ฃผํ˜•)

1. dataset ์ƒ์„ฑ : ๊ท ๋“ฑ๋ถ„ํฌ

age = round(runif(100, min=20, max=49))
age

time = round(runif(100, min=0, max=1))
time 0:์˜ค์ „ 1:์˜คํ›„

buy = round(runif(100,min=1, max=10))
buy

datas = data.frame(age, time, buy)
datas
str(datas)


์—ฐ๋ น๋Œ€ ๋ณ€์ˆ˜ ๋ฆฌ์ฝ”๋”ฉ

datas$age2[datas$age <= 29] = 20 29์„ธ ๋ฏธ๋งŒ์€ 20
datas$age2[datas$age > 29 & datas$age <= 39] = 30
datas$age2[datas$age > 39] = 40


๋…๋ฆฝ๋ณ€์ˆ˜ : ์š”์ธํ˜• ๋ณ€ํ™˜

datas$age2 = as.factor(datas$age2)
datas$time = as.factor(datas$time)
str(datas)


2. ๋“ฑ๋ถ„์‚ฐ์„ฑ๊ฒ€์ •

bartlett.test(buy ~ age2, data=datas) #p-value = 0.6989
bartlett.test(buy~time, data=datas) #p-value = 0.6989


3. ๋ถ„์‚ฐ๋ถ„์„ : ์ด์›๋ฐฐ์น˜ ๋ถ„์‚ฐ๋ถ„์„

model = aov(buy~age2 + time, data=datas)


4. ๋ถ„์‚ฐ๋ถ„์„ ๊ฒฐ๊ณผ ํ•ด์„

summary(model)

             Df Sum Sq Mean Sq F value Pr(>F)  
 age2         2    9.6    4.78   0.682 0.5080  :์—ฐ๋ น๋Œ€๋ณ„ ์ฐจ์ด ์—†์Œ
 time         1   40.4   40.36   5.757 0.0184 * :์‹œ๊ฐ„๋Œ€๋ณ„ ์ฐจ์ด ์žˆ์Œ
 Residuals   96  673.0    7.01

5. ์‚ฌํ›„๊ฒ€์ •

TukeyHSD(model)

 $age2
           diff       lwr       upr     p adj
 30-20 -0.2013889 -1.848509 1.4457314 0.9543943 : 30๋Œ€์™€ 20๋Œ€ ๊ฐ„ ๊ตฌ๋งค์ˆ˜๋Ÿ‰์— ์ฐจ์ด๊ฐ€ ์—†์Œ
 40-20 -0.7181572 -2.280358 0.8440436 0.5198682
 40-30 -0.5167683 -2.003562 0.9700254 0.6869840
 
 $time
       diff       lwr      upr     p adj
 1-0 1.265588 0.2142492 2.316926 0.0188252

plot(TukeyHSD(model))

 

library(dplyr)
datas %>% group_by(age2) %>% summarise(mean(buy))

  age2       `mean(buy)`
  <fct>         <dbl>
 1 20           5.89
 2 30           5.69
 3 40           5.17

+ Recent posts