๋ฐ์ดํ„ฐ๋ถ„์„๊ฐ€ ๊ณผ์ •/R

DAY21. ์ฃผ์„ฑ๋ถ„๋ถ„์„, ์š”์ธ๋ถ„์„

LEE_BOMB 2021. 10. 14. 17:39
์ฃผ์„ฑ๋ถ„๋ถ„์„

01. ๊ณ ์œ ๊ฐ’๊ณผ ๊ณ ์œ ํ–‰๋ ฌ

์ •๋ฐฉํ–‰๋ ฌA(n, n)์— ๋Œ€ํ•ด Av = λv๋ฅผ ๋งŒ์กฑํ•  ๋•Œ (v:0์ด ์•„๋‹Œ ๊ณ ์œ  ๋ฒกํ„ฐ, λ:์ƒ์ˆ˜ ๊ณ ์œ ๊ฐ’)

 

* ์ •๋ฐฉํ–‰๋ ฌ? ๊ฐ™์€ ์ˆ˜์˜ ํ–‰ X ๊ฐ™์€ ์ˆ˜์˜ ์—ด

* ๊ณ ์œ ๊ฐ’? ์ƒ์ˆ˜(scala)๊ฐ’ λ(๋žŒ๋‹ค)

์ •๋ฐฉํ–‰๋ ฌA(n, n)๋ฅผ ์š”์•ฝํ•˜๋Š” ๊ฐ’(=ํŠน์ด๊ฐ’์„ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฐ’)

ํ–‰๋ ฌ์—์„œ ์ฐจ์›(์—ด:๋ณ€์ˆ˜)์˜ ํŠน์ง•

์ˆ˜์น˜์˜ ํฌ๊ธฐ๋Š” ํŠน์ง•์˜ ๊ฐ•๋„

* ๊ณ ์œ ๋ฒกํ„ฐ? ๊ณ ์œ ๊ฐ’์— ํ•ด๋‹นํ•˜๋Š” 0์ด ์•„๋‹Œ ๋ฒกํ„ฐ

* ์‘์šฉ๋ถ„์•ผ : PCA(์ฃผ์„ฑ๋ถ„๋ถ„์„), SVD(ํŠน์ด๊ฐ’๋ถ„ํ•ด), Pseudo-Inverse(์œ ์‚ฌ ์—ญํ–‰๋ ฌ), ์„ ํ˜•์—ฐ๋ฆฝ๋ฐฉ์ •์‹ ๋“ฑ

 

 

์„ ํ˜•๊ฒฐํ•ฉ-์„ ํ˜•๋ณ€ํ™˜

์„ ํ˜•๋ณ€ํ™˜? n์ฐจ์›์˜ ํ–‰๋ ฌA๋ฅผ ๋ฒกํ„ฐv์™€ ๊ณฑํ•˜์—ฌ(์„ ํ˜•๊ฒฐํ•ฉํ•˜์—ฌ) 1์ฐจ์›์˜ ๋‹ค๋ฅธ ๋ฒกํ„ฐAv๋กœ ๋ณ€ํ™˜(์ถ•์†Œ)

ํ–‰๋ ฌ๊ณฑ ์ˆ˜์‹ : Av = A %*% v

[์˜ˆ] Y = a1X1 + a2X2 + … +anXn

y = 1*3 + 2*0 = 3

y = 1*8 + 2*(-1) = 6

 

 

๊ณ ์œ ๊ฐ’๊ณผ ๊ณ ์œ ๋ฒกํ„ฐ

์„ ํ˜•๋ณ€ํ™˜์— ์˜ํ•ด ๋งŒ๋“ค์–ด์ง„ Av๋ฅผ ๋ฒกํ„ฐv๋กœ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด์„œ λ=3 ์ง€์ •ํ•  ๋•Œ, ๋‹ค์Œ ์‹์ด ์„ฑ๋ฆฝ

 

λ = 3 (ํ–‰๋ ฌA์˜ ๊ณ ์œ ๊ฐ’)

Av == λv (v : ํ–‰๋ ฌA์˜ λ์— ๋Œ€ํ•œ ๊ณ ์œ ๋ฒกํ„ฐ)

 

* ๋ฒกํ„ฐv๋ฅผ ํ–‰๋ ฌA๋กœ ์„ ํ˜•๋ณ€ํ™˜ ์‹œํ‚จ ๊ฒฐ๊ณผ, Av๊ฐ€ ๋ฒกํ„ฐv์˜ ์ƒ์ˆ˜(λ)๋ฐฐ๊ฐ€ ์„ฑ๋ฆฝ ๋  ๋•Œ ๊ณ ์œ ๋ฒกํ„ฐ ์กด์žฌ

 

 

๊ณ ์œ ๊ฐ’๊ณผ ๊ณ ์œ ๋ฒกํ„ฐ ๊ด€๊ณ„

๊ณ ์œ ๋ฒกํ„ฐ(v) : ๋ฐฉํ–ฅ์€ ์œ ์ง€๋˜๊ณ , ํฌ๊ธฐ๋งŒ ๋ณ€ํ™”๋˜๋Š” ๋ฐฉํ–ฅ๋ฒกํ„ฐ

๊ณ ์œ ๊ฐ’(λ) : ๊ณ ์œ ๋ฒกํ„ฐ์˜ ๋ณ€ํ™”๋˜๋Š” ํฌ๊ธฐ(scale)

 

 

 

 

 

02. ์ฃผ์„ฑ๋ถ„๋ถ„์„ vs ์š”์ธ๋ถ„์„

๊ณตํ†ต์  ์ฐจ์ด์ 
๋ณ€์ˆ˜ ์ถ•์†Œ ๊ธฐ๋Šฅ : ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์žˆ๋Š” ๋ณ€์ˆ˜๋“ค์„ ์„ ํ˜•๊ฒฐํ•ฉ ๋ณ€์ˆ˜ ํ†ตํ•ฉ ๊ธฐ์ค€
• ์ฃผ์„ฑ๋ถ„๋ถ„์„ : ์ˆ˜์น˜์ ์ธ ์ƒ๊ด€์„ฑ์„ ๊ธฐ์ค€์œผ๋กœ ๋ณ€์ˆ˜ ํ†ตํ•ฉ
• ์š”์ธ๋ถ„์„ : ๊ฐœ๋…์ /๋…ผ๋ฆฌ์ ์ธ ์ƒ๊ด€์„ฑ์„ ๊ธฐ์ค€์œผ๋กœ ๋ณ€์ˆ˜ ํ†ตํ•ฉ
๋ฐ์ดํ„ฐ ํŒจํ„ด ํƒ์ƒ‰ : ์ฃผ์„ฑ๋ถ„/์š”์ธ์„ ํ†ตํ•ด์„œ ๋ณ€์ˆ˜ ํŠน์„ฑ ์ดํ•ด ๋ณ€์ˆ˜ ์ถ•์†Œ์˜ ๋ชฉ์ ๊ณผ ์ƒ์„ฑ๋œ ๋ณ€์ˆ˜ ๊ฐœ์ˆ˜
• ์ฃผ์„ฑ๋ถ„๋ถ„์„ : ํƒ์ƒ‰์  ๊ด€์ ์œผ๋กœ ๋ณดํ†ต 2~3๊ฐœ
• ์š”์ธ๋ถ„์„ : ํ™•์ธ์  ๊ด€์ ์œผ๋กœ ์ฃผ์–ด์ง„ ์ƒ์œ„ ์š”์ธ์˜ ์ˆ˜
๋‹ค๋ฅธ ๋ถ„์„์„ ์œ„ํ•œ ์‚ฌ์ „๋ถ„์„ : ํšŒ๊ท€๋ถ„์„์—์„œ ๋…๋ฆฝ๋ณ€์ˆ˜ ๊ฐ„ ๋‹ค์ค‘๊ณต์„  ์„ฑ์ด ์กด์žฌํ•˜๋Š” ๊ฒฝ์šฐ ์ƒ๊ด€์„ฑ์ด ๋†’์€ ๋ณ€์ˆ˜๋ฅผ ์ฃผ์„ฑ๋ถ„/์š”์ธ์œผ๋กœ ์ถ•์†Œ ๋ณ€์ˆ˜ ๊ฐ„์˜ ์ค‘์š”๋„
• ์ฃผ์„ฑ๋ถ„๋ถ„์„ : ์ œ1์ฃผ์„ฑ๋ถ„์ด ๋ถ„์‚ฐ ๋ณ€๋™๋Ÿ‰ ๊ฐ€์žฅ ๋งŽ์ด ๊ฐ€์ง€๊ณ  ์žˆ์Œ(์ค‘์š”ํ•œ ๋ณ€์ˆ˜)
• ์š”์ธ๋ถ„์„ : ๋ณ€์ˆ˜๋“ค์˜ ์ค‘์š”๋„๋Š” ๋Œ€๋“ฑํ•œ ๊ด€๊ณ„
์ž…๋ ฅ๋ณ€์ˆ˜ : ํšŒ๊ท€๋ถ„์„, ๊ตฐ์ง‘๋ถ„์„, ์‹œ๊ณ„์—ด๋ถ„์„ ๋“ฑ์—์„œ ์ž…๋ ฅ๋ณ€์ˆ˜ ์‚ฌ์šฉ  

 

 

 

 

 

 

03. ์ฃผ์„ฑ๋ถ„๋ถ„์„

๋‹ค๋ณ€๋Ÿ‰ ์ž๋ฃŒ ๋Œ€์ƒ ์ˆ˜์น˜์  ์ƒ๊ด€์„ฑ์ด ์žˆ๋Š”์ง€ ๊ด€์ฐฐ, ‘์ฃผ์„ฑ๋ถ„’ ํ†ตํ•ฉ

๋ชจ๋“  ๋ณ€์ˆ˜๋“ค์˜ ๊ณตํ†ต์ ์ธ ์„ ํ˜•๋ณ€ํ™”๋ฅผ ํ†ตํ•ด ์ƒ๊ด€์„ฑ ์žˆ๋Š” ๋ณ€์ˆ˜๋“ค ๊ฐ„์˜ ์ •๋ณด ๋‹จ์ˆœํ™”

๋ณดํ†ต 2~3๊ฐœ ์ •๋„ ์„ฑ๋ถ„ ์ถ”์ถœ, ์ œ1์ฃผ์„ฑ๋ถ„์˜ ๋ณ€ํ™”๋Ÿ‰(์„ค๋ช…๋ ฅ)์ด ๊ฐ€์žฅ ํฌ๋‹ค. (์ค‘์š”๋„๊ฐ€ ๊ฐ€์žฅ ํผ)

์ฃผ์„ฑ๋ถ„์˜ ๋ˆ„์ ๋ณ€ํ™”๋Ÿ‰ 85%์ •๋„ ์ด์ƒ์ด๋ฉด ์ฃผ์„ฑ๋ถ„์œผ๋กœ ๋„์ถœ

 

 

๋‹ค๋ณ€๋Ÿ‰ ์ž๋ฃŒ ๋ถ„์„

๋‹ค๋ณ€๋Ÿ‰ ์ž๋ฃŒ? ๋‘˜ ์ด์ƒ์˜ ์„œ๋กœ ์ƒ๊ด€๊ด€๊ณ„์— ์žˆ๋Š” ๋ณ€์ˆ˜๋“ค์„ ํฌํ•จํ•˜๊ณ  ์žˆ๋Š” ์ž๋ฃŒ

์ฃผ์„ฑ๋ถ„๋ถ„์„์€ ๋‹ค๋ณ€๋Ÿ‰ ์ž๋ฃŒ๋ถ„์„ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜

 

 

์ฃผ์„ฑ๋ถ„๋ถ„์„ ํ•„์š”์„ฑ

1) ์ฐจ์›์˜ ์ €์ฃผ? ์ฐจ์›์ด ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ(=๋ณ€์ˆ˜์˜ ์ˆ˜ ์ฆ๊ฐ€) ๋ชจ๋ธ ์„ฑ๋Šฅ์ด ์•ˆ ์ข‹์•„์ง€๋Š” ํ˜„์ƒ

2) ๋‹ค์ค‘๊ณต์„ ์„ฑ(Multicollinearity) : ํ•œ ๋…๋ฆฝ๋ณ€์ˆ˜์˜ ๊ฐ’์ด ์ฆ๊ฐ€ํ•  ๋•Œ ๋‹ค๋ฅธ ๋…๋ฆฝ๋ณ€์ˆ˜์˜ ๊ฐ’์ด ์ด์™€ ๊ด€๋ จํ•˜์—ฌ ์ฆ๊ฐ€ํ•˜๊ฑฐ๋‚˜ ๊ฐ์†Œํ•˜๋Š” ํ˜„์ƒ (ํšŒ๊ท€๋ถ„์„ ๊ฒฐ๊ณผ ์™œ๊ณก)

3) ๊ณผ์ ํ•ฉ(Overfitting) : ํ•™์Šต๋ฐ์ดํƒ€์— ๋Œ€ํ•ด์„œ๋Š” ์˜ค์ฐจ๊ฐ€ ๊ฐ์†Œํ•˜์ง€๋งŒ ์‹ค์ œ ๋ฐ ์ดํƒ€์— ๋Œ€ํ•ด์„œ๋Š” ์˜ค์ฐจ๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š” ํ˜„์ƒ

4) ์„ฑ๋Šฅ์ €ํ•˜ : ๋ชจ๋ธ๋ง ๊ณผ์ •์—์„œ ์ €์žฅ๊ณต๊ฐ„๊ณผ ์ฒ˜๋ฆฌ์‹œ๊ฐ„์ด ๋ถˆํ•„์š”ํ•˜๊ฒŒ ์ฆ๊ฐ€

 

 

์„ ํ˜•๋ณ€ํ™˜

์—ฌ๋Ÿฌ ๋ณ€์ˆ˜๋“ค์„ ๋Œ€์ƒ์œผ๋กœ ๊ฐ€์ค‘๊ฒฐํ•ฉ ์‹œํ‚จ ํ˜•ํƒœ

n์ฐจ์›์˜ ์ •๋ณด๋ฅผ ๊ฐ€์ค‘์น˜ a์™€ ๊ณฑํ•˜์—ฌ 1์ฐจ์›์œผ๋กœ ์ถ•์†Œํ•˜๋Š” ์—ฐ์‚ฐ ๊ณผ์ •

 

 

๋ฐ์ดํ„ฐ ํ‘œ์ค€ํ™”

์ฃผ์„ฑ๋ถ„๋ถ„์„์€ ์ธก์ • ๋‹จ์œ„์— ๋”ฐ๋ผ์„œ ๋ถ„์‚ฐ์ด ํฌ๊ฒŒ ๋‹ฌ๋ผ์ง

ํ‘œ์ค€ํ™” ํ•˜๋Š” ๊ฒฝ์šฐ ํ‘œ์ค€ํ™” ํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ
์ธก์ • ๋‹จ์œ„๊ฐ€ ๋‹ค๋ฅธ ๊ฒฝ์šฐ
์ƒ๊ด€ํ–‰๋ ฌ๋กœ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ์ฃผ์„ฑ๋ถ„๋ถ„์„
์ž๋ฃŒ์˜ ๋‹จ์œ„๊ฐ€ ๋™์ผํ•œ ๊ฒฝ์šฐ
๋ถ„์‚ฐ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ๋กœ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ์ฃผ์„ฑ๋ถ„๋ถ„์„
ex. ๋ณ€์ˆ˜ ์ค‘ ํ•˜๋‚˜๋Š” cm, ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” kg์ธ ๊ฒฝ์šฐ
ex. ๋‹ค๋ฅธ ํ•œ ๋ณ€์ˆ˜๋Š” 1์ž๋ฆฟ์ˆ˜, ๋‹ค๋ฅธ ๋ณ€์ˆ˜๋Š” 3์ž๋ฆฟ์ˆ˜์ธ ๊ฒฝ์šฐ
๋ณ€์ˆ˜์˜ ๋‹จ์œ„ ๊ทธ๋Œ€๋กœ, ๋ณ€๋™ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ์ดํ„ฐ์™€ ๋ชจ์ง‘๋‹จ์˜ ํŠน์„ฑ์„ ์ž˜ ๋“œ๋Ÿฌ๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

 

 

์ฃผ์„ฑ๋ถ„๋ถ„์„ ๋‹จ๊ณ„

๋ฐ์ดํ„ฐ ํŠน์„ฑ ํŒŒ์•… : ์ƒ๊ด€๋ถ„์„์œผ๋กœ ๋ณ€์ˆ˜ ๊ฐ„ ๊ด€๊ณ„ ํŒŒ์•…

๊ฐ€์ค‘๊ณ„์ˆ˜ ์ถ”์ถœ : ์ƒ๊ด€๊ณ„์ˆ˜ ํ–‰๋ ฌ์˜ ํŠน์ง•์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ณ ์œ ๊ฐ’๊ณผ ๊ณ ์œ ๋ฒกํ„ฐ ์ถ”์ถœ

์ฐจ์›์ถ•์†Œ : ํ‘œ์ค€ํ™”๋œ ๋ฐ์ดํ„ฐ์…‹ ๋Œ€์ƒ์œผ๋กœ ์ฃผ์„ฑ๋ถ„ ๋ถ„์„

์ฃผ์„ฑ๋ถ„ ํŒ์ • : ์ฃผ์„ฑ๋ถ„ ๊ฐœ์ˆ˜ ํŒ์ • ๋ฐ ์ž…๋ ฅ๋ณ€์ˆ˜ ์ƒ์„ฑ

 

[์‹ค์Šต]
์‹ค์Šต ๋ชฉ์  : 6๊ฐœ ๊ณผ๋ชฉ์˜ ํŠน์ง•์„ ๋Œ€์ƒ์œผ๋กœ ์ฃผ์„ฑ๋ถ„๋ถ„์„ ์ˆ˜ํ–‰ -> ์œ ์‚ฌ ๊ณผ๋ชฉ ํŒŒ์•…    
๋ณ€์ˆ˜ ์„ค๋ช… : 6๊ฐœ ๊ณผ๋ชฉ์— ๋Œ€ํ•œ 10๊ฐœ์˜ ํŠน์ง• ์ˆ˜์น˜ํ™”(5์  ์ฒ™๋„) 
s1 : ์ž์—ฐ๊ณผํ•™, s2 : ๋ฌผ๋ฆฌํ™”ํ•™
s3 : ์ธ๋ฌธ์‚ฌํšŒ, s4 : ์‹ ๋ฌธ๋ฐฉ์†ก
s5 : ์‘์šฉ์ˆ˜ํ•™, s6 : ์ถ”๋ก ํ†ต๊ณ„
s1 <- c(1, 2, 1, 2, 3, 4, 2, 3, 4, 5)
s2 <- c(1, 3, 1, 2, 3, 4, 2, 4, 3, 4)
s3 <- c(2, 3, 2, 3, 2, 3, 5, 3, 4, 2)
s4 <- c(2, 4, 2, 3, 2, 3, 5, 3, 4, 1)
s5 <- c(4, 5, 4, 5, 2, 1, 5, 2, 4, 3)
s6 <- c(4, 3, 4, 4, 2, 1, 5, 2, 4, 2)
name <-1:10


dataset ๊ฐ€์ ธ์˜ค๊ธฐ : 1์ฐจ(iris), 2์ฐจ(data.frame)

dataset <- data.frame(s1, s2, s3, s4, s5, s6)
dataset
str(dataset) # 'data.frame': 10 obs. of  6 variables:



[๋‹จ๊ณ„1] ๋ฐ์ดํ„ฐ ํŠน์„ฑ ํŒŒ์•… : ๋ณ€์ˆ˜์˜ ์ƒ๊ด€์„ฑ ๋ถ„์„ 

cor(dataset) # ์ƒ๊ด€๊ณ„์ˆ˜ํ–‰๋ ฌ(์ •๋ฐฉํ–‰๋ ฌ)

         s1          s2         s3         s4         s5         s6
s1  1.00000000  0.86692145 0.05847768 -0.1595953 -0.5504588 -0.6262758
s2  0.86692145  1.00000000 0.06745441 -0.0240123 -0.6349581 -0.7968892
s3  0.05847768  0.06745441 1.00000000  0.9239433  0.3506967  0.4428759
s4 -0.15959528 -0.02401230 0.92394333  1.0000000  0.4207582  0.4399890
s5 -0.55045878 -0.63495808 0.35069667  0.4207582  1.0000000  0.8733514
s6 -0.62627585 -0.79688923 0.44287589  0.4399890  0.8733514  1.0000000
* s1:s2, s3:s4, s5:s6 ์ƒ๊ด€์„ฑ ๋†’์Œ 


[๋‹จ๊ณ„2] ๊ฐ€์ค‘๊ณ„์ˆ˜ ์ถ”์ถœ : ๊ณ ์œ ๊ฐ’๊ณผ ๊ณ ์œ ๋ฒกํ„ฐ : chap12_1_Eigenvalue ์ฐธ๊ณ  
๊ณ ์œ ๊ฐ’(λ)  

en <- eigen(cor(dataset)) # ์ƒ๊ด€๊ณ„์ˆ˜ํ–‰๋ ฌ -> ๊ณ ์œ ๊ฐ’,๊ณ ์œ ๋ฒกํ„ฐ    
names(en) # "values"  "vectors"

en$values # $values : ๊ณ ์œ ๊ฐ’ ๋ณด๊ธฐ(๋ณ€์ˆ˜ ๊ฐœ์ˆ˜์™€ ์ผ์น˜)

# 3.44393944 1.88761725 0.43123968 0.19932073 0.02624961 0.01163331
* - ๊ณ ์œ ๊ฐ’์ด๋ž€ ์–ด๋–ค ํ–‰๋ ฌ(์ƒ๊ด€๊ด€๊ณ„์ˆ˜ ํ–‰๋ ฌ)๋กœ๋ถ€ํ„ฐ ์œ ๋„๋˜๋Š” ํŠน์ •ํ•œ ์ƒ์ˆ˜๊ฐ’

๊ณ ์œ ๊ฐ’ ์ „์ฒดํ•ฉ = ์ด๋ถ„์‚ฐ 

sum(en$values) #6


๊ฐ ๋ถ„์‚ฐ์˜ ๋น„์œจ = ๊ณ ์œ ๊ฐ’ / ์ด๋ถ„์‚ฐ 

en$values / sum(en$values)

0.573989906 0.314602874 0.071873280 0.033220121 0.004374934 0.001938885
0.573989906 + 0.314602874  # 0.8885928
0.573989906 + 0.314602874 + 0.071873280 # 0.9604661

en$vectors # ๊ณ ์œ ๋ฒกํ„ฐ

           [,1]         [,2]        [,3]       [,4]        [,5]        [,6]
[1,] -0.4062499 -0.351093036  0.63460534  0.3149622  0.45699508  0.03041553
[2,] -0.4319311 -0.400526644  0.11564711 -0.4422216 -0.57042232  0.34452594
[3,]  0.2542077 -0.628807884 -0.06984072  0.3339036 -0.35389906 -0.54622817
[4,]  0.3017115 -0.566028650 -0.37734321 -0.2468016  0.50326085  0.36333366
[5,]  0.4763815  0.008436692  0.58035475 -0.6016209  0.05643527 -0.26654314
[6,]  0.5155637  0.021286661  0.31595023  0.4133867 -0.28995329  0.61559319

plot(en$values, type="o") # ๊ณ ์œ ๊ฐ’์„ ์ด์šฉํ•œ ์‹œ๊ฐํ™”(elbow point ๊ธฐ์ค€)


[๋‹จ๊ณ„3] ์ฐจ์›์ถ•์†Œ : ์ฃผ์„ฑ๋ถ„๋ถ„์„ 
์ƒ๊ด€๊ณ„์ˆ˜๋ฅผ ์ ์šฉํ•œ ๊ณ ์œ ๋ฒกํ„ฐ์™€ ๊ด€์ธก์น˜ ๊ฐ„์˜ ์„ ํ˜•๊ฒฐํ•ฉ : dataset์˜ ์ƒ๊ด€๊ณ„์ˆ˜ํ–‰๋ ฌ์„ ์ด์šฉํ•˜์—ฌ ์ฃผ์„ฑ๋ถ„๋ถ„์„ ์ˆ˜ํ–‰   

PCA <- prcomp(dataset, center = TRUE, scale. = TRUE) # ๋ณ€์ˆ˜๋“ค์˜ ๋‹จ์œ„๊ฐ€ ๋‹ค๋ฅธ๊ฒฝ์šฐ ์ ์šฉ  
# center = TRUE, scale. = TRUE -> ํ‰๊ท =0, ๋ถ„์‚ฐ=1

 

PCA # ์ฃผ์„ฑ๋ถ„์˜ ๋ถ„์‚ฐ๊ณผ ํšŒ์ „๊ฐ’

Standard deviations (1, .., p=6): ํ‘œ๋ถ„ํŽธ์ฐจ(๋ถ„์‚ฐ) = ๊ณ ์œ ๊ฐ’ ์ œ๊ณฑ๊ทผ 
1.8557854 1.3739058 0.6566884 0.4464535 0.1620173 0.1078578

sqrt(3.44393944) # 1.855785


# Rotation (n x k) = (6 x 6): ๊ณ ์œ ๋ฒกํ„ฐ ๋ถ€ํ˜ธ ํšŒ์ „ 

์ฃผ์„ฑ๋ถ„๋ถ„์„์˜ ์†์„ฑ  

names(PCA) # "sdev"     "rotation" "center"   "scale"    "x"   
# sdev, rotation, x ์†์„ฑ์ด ์ค‘์š”ํ•จ


1) ๊ฐ ์ฃผ์„ฑ๋ถ„์˜ ๋ถ„์‚ฐ

PCA$sdev # 1.8557854 1.3739058 0.6566884 0.4464535 0.1620173 0.1078578

 

2) ๊ฐ ์ฃผ์„ฑ๋ถ„์˜ Rotation ๊ฐ’ 

PCA$rotation


3) x : ์ฃผ์„ฑ๋ถ„ ์ ์ˆ˜ 

str(PCA$x) # num [1:10, 1:6] -> [๊ด€์ธก์น˜, ๋ณ€์ˆ˜]

PCA$x

round(PCA$x[,1:3],2)

#         PC1        PC2        PC3
#[1,] -1.2195113 -2.0459190  0.2058941
#[2,] -0.8619728  0.4960040  0.0733061
#[3,] -1.2195113 -2.0459190  0.2058941
#[4,] -1.3831687 -0.3387559 -0.3876923
#[5,]  1.5989191 -0.7852011  0.3581558
#[6,]  2.5004922  0.9502748  0.8197018
#[7,] -2.7991446  1.8549340  0.1375841
#[8,]  1.4637914  0.6653457  0.6438323
#[9,] -0.5785476  1.6426768 -0.6461737
#[10,]  2.4986537 -0.3934403 -1.4105023


[๋‹จ๊ณ„4] ์ฃผ์„ฑ๋ถ„ ๊ฐœ์ˆ˜ ๊ฒฐ์ • 

summary(PCA) #์ฃผ์„ฑ๋ถ„ ๋ถ„์„์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ ์š”์•ฝ ์„ค๋ช…

#Importance of components:
#                         PC1    PC2     PC3     PC4     PC5     PC6
#Standard deviation     1.856 1.3739 0.65669 0.44645 0.16202 0.10786
#Proportion of Variance 0.574 0.3146 0.07187 0.03322 0.00437 0.00194
#Cumulative Proportion  0.574 0.8886 0.96047 0.99369 0.99806 1.00000

* Standard deviation : ํ‘œ์ค€ํŽธ์ฐจ 
* Propertion of Variance : ๋ถ„์‚ฐ๋น„์œจ, ๊ฐ ์ฃผ์„ฑ๋ถ„์ด ์ฐจ์ง€ํ•˜๋Š” ๋น„์œจ
* Cumulative Proportion : ๋ถ„์‚ฐ์˜ ๋ˆ„์  ํ•ฉ๊ณ„(85%์ด์ƒ์ธ ์ฃผ์„ฑ๋ถ„๊นŒ์ง€ ์„ ํƒ) 


* ์Šคํฌ๋ฆฌ ํ”Œ๋กฏ : ์ฃผ์„ฑ๋ถ„ ๊ฐœ์ˆ˜๋ฅผ ์„ ํƒํ•  ์ˆ˜ ์žˆ๋Š” ๊ทธ๋ž˜ํ”„(Elbow Point : ์™„๋งŒํ•ด์ง€๊ธฐ ์ด์ „ ์„ ํƒ)

plot(PCA, type="l",  sub = "Scree Plot") #3
biplot(PCA) # ํ–‰๋ ฌ๋„(biplot)

* - ์›๋ณ€์ˆ˜์™€ PCA๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„(์ฐจ์› ์ถ•์†Œ ์ •๋ณด ์ œ๊ณต)
* - ํ™”์‚ดํ‘œ๋Š” ์›๋ณ€์ˆ˜์™€ PC์˜ ์ƒ๊ด€๊ณ„์ˆ˜. PC์™€ ํ‰ํ–‰ํ• ์ˆ˜๋ก ํ•ด๋‹น PC์— ํฐ ์˜ํ–ฅ

3์ฐจ์› ์‹œ๊ฐํ™” : ์ฃผ์„ฑ๋ถ„ 3๊ฐœ ์‹œ๊ฐํ™” 

install.packages('scatterplot3d')
library(scatterplot3d)


์ฃผ์„ฑ๋ถ„ 3๊ฐœ ์ ์ˆ˜ ์ถ”์ถœ 

PC1 <- PCA$x[,1]
PC2 <- PCA$x[,2]
PC3 <- PCA$x[,3]


scatterplot3d(๋ฐ‘๋ณ€, ์˜ค๋ฅธ์ชฝ๋ณ€, ์™ผ์ชฝ๋ณ€, type='p') # type='p' : ๊ธฐ๋ณธ์‚ฐ์ ๋„ ํ‘œ์‹œ 

d3 <- scatterplot3d(PC1, PC2, PC3)


๊ฐ ์ฃผ์„ฑ๋ถ„์˜ rotation๊ฐ’(๊ณ ์œ ๋ฒกํ„ฐ) 

rotation1 <- PCA$rotation[,1]
rotation2 <- PCA$rotation[,2]
rotation3 <- PCA$rotation[,3] 
d3$points3d(rotation1, rotation2, rotation3, bg='red',pch=21, cex=2, type='h')

 

[๋‹จ๊ณ„5] ์ž…๋ ฅ๋ณ€์ˆ˜ ๋งŒ๋“ค๊ธฐ

str(PCA$x[,1:3]) #matrix -> DataFrame
new_dataset <- data.frame(PCA$x[,1:3])

str(new_dataset) #'data.frame': 10 obs. of  3 variables:

names(new_dataset) <- c('app_science','soc_science','net_science')

new_dataset

# app_science soc_science net_science
#1   -1.2195113  -2.0459190   0.2058941
#2   -0.8619728   0.4960040   0.0733061
#3   -1.2195113  -2.0459190   0.2058941
#4   -1.3831687  -0.3387559  -0.3876923
#5    1.5989191  -0.7852011   0.3581558
#6    2.5004922   0.9502748   0.8197018
#7   -2.7991446   1.8549340   0.1375841
#8    1.4637914   0.6653457   0.6438323
#9   -0.5785476   1.6426768  -0.6461737
#10   2.4986537  -0.3934403  -1.4105023

 

 

 

 

 

 

04. ์š”์ธ๋ถ„์„

์ „์ฒด ๋ณ€์ˆ˜๋“ค ์ค‘์—์„œ ๊ฐœ๋…์ /๋…ผ๋ฆฌ์ ์œผ๋กœ ์ฃผ์ œ๊ฐ€ ๋น„์Šทํ•œ ๋ณ€์ˆ˜๋“ค์„ ์ž ์žฌ์  ์š”์ธ์œผ๋กœ ํ†ตํ•ฉ (ํƒ€๋‹น์„ฑ ๋ถ„์„)

๊ณตํ†ต ์ฐจ์›์œผ๋กœ ์ถ•์•ฝํ•˜๋Š” ํ†ต๊ณ„๊ธฐ๋ฒ•(๋ณ€์ˆ˜ ์ถ•์†Œ)

๋„์ถœ๋˜๋Š” ์š”์ธ์˜ ๊ฐœ์ˆ˜๋Š” ์ œํ•œ ์—†์Œ

๋ˆ„์ ์„ค๋ช…๋ ฅ์˜ ํ•ฉ์ด 85% ์ด์ƒ ์ผ ๋•Œ ์ ์ • ์š”์ธ๊ฐœ์ˆ˜ ํŒ๋‹จ

 

ํ™•์ธ์  ์š”์ธ๋ถ„์„ : ์‚ฌ์ „์— ๋ฌถ์—ฌ์งˆ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋˜๋Š” ํ•ญ๋ชฉ ๋ผ๋ฆฌ ๋ฌถ์—ฌ ์ง€๋Š”์ง€๋ฅผ ๋ถ„์„ํ•˜๋Š” ๋ฐฉ๋ฒ•(์ฃผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๋ถ„์„)

ํƒ์ƒ‰์  ์š”์ธ๋ถ„์„ : ์‚ฌ์ „์— ์–ด๋–ค ๋ณ€์ˆ˜๋“ค๋ผ๋ฆฌ ๋ฌถ์–ด์•ผ ํ•œ๋‹ค๋Š” ์ „์ œ๋ฅผ ๋‘์ง€ ์•Š๊ณ  ๋ถ„์„ํ•˜๋Š” ๋ฐฉ๋ฒ•

 

 

์š”์ธ๋ถ„์„ ์ „์ œ์กฐ๊ฑด

ํ•˜์œ„์š”์ธ์œผ๋กœ ๊ตฌ์„ฑ๋˜๋Š” ๋ฐ์ดํ„ฐ ์…‹์ด ์ค€๋น„๋˜์–ด ์žˆ์–ด์•ผ ํ•œ๋‹ค.

๋ถ„์„์— ์‚ฌ์šฉ๋˜๋Š” ๋ณ€์ˆ˜๋Š” ๋“ฑ๊ฐ„์ฒ™๋„๋‚˜ ๋น„์œจ์ฒ™๋„์ด์—ฌ์•ผ ํ•˜๋ฉฐ, ํ‘œ๋ณธ์˜ ํฌ๊ธฐ๋Š” ์ตœ์†Œ 30~50๊ฐœ ์ด์ƒ์ด ๋ฐ”๋žŒ์งํ•˜๋‹ค. (์ค‘์‹ฌ๊ทนํ•œ์ •๋ฆฌ)

์š”์ธ๋ถ„์„์€ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ๋†’์€ ๋ณ€์ˆ˜๋“ค๋ผ๋ฆฌ ๊ทธ๋ฃนํ™”ํ•˜๋Š” ๊ฒƒ์ด๋ฏ€๋กœ ๋ณ€์ˆ˜๋“ค ๊ฐ„์˜ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ๋งค์šฐ ๋‚ฎ๋‹ค๋ฉด(๋ณดํ†ต ±3 ์ดํ•˜) ๊ทธ ์ž๋ฃŒ๋Š” ์š”์ธ ๋ถ„์„์— ์ ํ•ฉํ•˜์ง€ ์•Š๋‹ค.

 

 

์š”์ธ๋ถ„์„์˜ ๋ชฉ์ 

1) ์ธก์ •๋„๊ตฌ ํƒ€๋‹น์„ฑ ๊ฒ€์ฆ : ๋ณ€์ธ๋“ค์ด ๋™์ผํ•œ ์š”์ธ์œผ๋กœ ๋ฌถ์ด๋Š”์ง€?

2) ์ž๋ฃŒ ์š”์•ฝ : ๋ณ€์ธ์„ ๋ช‡ ๊ฐœ์˜ ๊ณตํ†ต๋œ ๋ณ€์ธ์œผ๋กœ ๋ฌถ์Œ(์ฐจ์› ์ถ•์†Œ)

3) ๋ณ€์ธ ๊ตฌ์กฐ ํŒŒ์•… : ๋ณ€์ธ๋“ค์˜ ์ƒํ˜ธ๊ด€๊ณ„ ํŒŒ์•…(๋…๋ฆฝ์„ฑ ๋“ฑ)

4) ๋ถˆํ•„์š”ํ•œ ๋ณ€์ธ ์ œ๊ฑฐ : ์ค‘์š”๋„๊ฐ€ ๋–จ์–ด์ง„ ๋ณ€์ˆ˜ ์ œ๊ฑฐ

 

 

์š”์ธ๋ถ„์„ ๋‹จ๊ณ„

1) ๋ฐ์ดํ„ฐ ํŠน์„ฑ ํŒŒ์•… : ์ƒ๊ด€๋ถ„์„์œผ๋กœ ๋ณ€์ˆ˜ ๊ฐ„ ๊ด€๊ณ„ ํŒŒ์•…

2) ์š”์ธ ์ˆ˜ ๊ฒฐ์ • : ์ƒ๊ด€๊ณ„์ˆ˜ ํ–‰๋ ฌ์— ๋Œ€ํ•œ ๊ณ ์œ ๊ฐ’์œผ๋กœ ์š”์ธ์ˆ˜ ๊ฒฐ์ •

3) ์š”์ธ๋ถ„์„ : ํ‘œ์ค€ํ™”๋œ ๋ฐ์ดํ„ฐ์…‹ ๋Œ€์ƒ์œผ๋กœ ์š”์ธ๋ถ„์„

4) ์š”์ธํŒ์ • : ์š”์ธ ๊ฐœ์ˆ˜ ํŒ์ • ๋ฐ ์ž…๋ ฅ๋ณ€์ˆ˜ ์ƒ์„ฑ

 

 

[์‹ค์Šต]

์‹ค์Šต ๋ชฉ์  : 6๊ฐœ ๊ณผ๋ชฉ ์ ์ˆ˜(ํŠน์ง•)๋ฅผ ๋Œ€์ƒ์œผ๋กœ ๊ณตํ†ต์š”์ธ์„ ์ฐพ์•„์„œ ๊ณผ๋ชฉ ํ†ตํ•ฉ

๋ณ€์ˆ˜ ์„ค๋ช… : 6๊ฐœ ๊ณผ๋ชฉ์˜ ์ ์ˆ˜(5์  ๋งŒ์  = 5์  ์ฒ™๋„) 
s1 : ์ž์—ฐ๊ณผํ•™, s2 : ๋ฌผ๋ฆฌํ™”ํ•™
s3 : ์ธ๋ฌธ์‚ฌํšŒ, s4 : ์‹ ๋ฌธ๋ฐฉ์†ก
s5 : ์‘์šฉ์ˆ˜ํ•™, s6 : ์ถ”๋ก ํ†ต๊ณ„
s1 <- c(1, 2, 1, 2, 3, 4, 2, 3, 4, 5)
s2 <- c(1, 3, 1, 2, 3, 4, 2, 4, 3, 4)
s3 <- c(2, 3, 2, 3, 2, 3, 5, 3, 4, 2)
s4 <- c(2, 4, 2, 3, 2, 3, 5, 3, 4, 1)
s5 <- c(4, 5, 4, 5, 2, 1, 5, 2, 4, 3)
s6 <- c(4, 3, 4, 4, 2, 1, 5, 2, 4, 2)
name <-1:10 

subject <- data.frame(s1, s2, s3, s4, s5, s6)
subject
str(subject)


[๋‹จ๊ณ„1] ๋ฐ์ดํ„ฐํŠน์„ฑ ํŒŒ์•… 

cor(subject)


[๋‹จ๊ณ„2] ์š”์ธ์ˆ˜ ๊ฒฐ์ • : ๊ณ ์œ ๊ฐ’ ์ด์šฉ  

en <- eigen(cor(subject)) # ์ƒ๊ด€๊ณ„์ˆ˜ ํ–‰๋ ฌ -> ๊ณ ์œ ๊ฐ’ ๊ณ„์‚ฐ 
names(en) #"values" "vectors"
plot(en$values, type="o") # ๊ณ ์œ ๊ฐ’์„ ์ด์šฉํ•œ ์‹œ๊ฐํ™”(elbow point:1~3 ๊ธ‰๊ฒฉํžˆ ๊ฐ์†Œ, 4 ์™„๋งŒ)


[๋‹จ๊ณ„3] ์š”์ธ๋ถ„์„ 
์š”์ธ์ˆ˜ 3 ์ง€์ • 

FA <- factanal(scale(subject), factors = 3, #ํ‘œ์ค€ํ™”๋œ dataset์„ ํ† ๋Œ€๋กœ ํ•จ, factors:์š”์ธ ๊ฐœ์ˆ˜ ์ง€์ • 
               rotation = "varimax", #ํšŒ์ „๋ฐฉ๋ฒ• ์ง€์ •("varimax", "promax", "none")
               scores="regression") #์š”์ธ์ ์ˆ˜ ๊ณ„์‚ฐ ๋ฐฉ๋ฒ•

* ์š”์ธํšŒ์ „๋ฒ• : ์–ด๋–ค ๋ณ€์ˆ˜๊ฐ€ ์–ด๋–ค ์š”์ธ์— ์†ํ•˜๋Š”์ง€๋ฅผ ๊ฒน์ณ ๋ณด์ด์ง€ ์•Š๊ฒŒ๋” ์š”์ธ์ถ•์„ ํšŒ์ „์‹œ์ผœ ์ถœ๋ ฅ

FA

* Uniquenesses : ๋ณ€์ˆ˜์˜ ์œ ์ผ์„ฑ ํ†ต์ƒ -0.5 ์ดํ•˜
* Loadings : Factor1 Factor2 Factor3์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์š”์ธ ์ ์žฌ๋Ÿ‰ (-0.4 ~ + 0.4 ์•ˆ์— ๋“ค์–ด๊ฐ€ ์žˆ์œผ๋ฉด ์˜๋ฏธ์žˆ๋Š” ์ ์žฌ๋Ÿ‰)

                Factor1 Factor2 Factor3
SS loadings      2.122   2.031   1.486 : ๊ฐ ์š”์ธ์˜ ์ œ๊ณฑ์˜ ํ•ฉ
Proportion Var   0.354   0.339   0.248 : ๊ฐ ์š”์ธ์— ๋Œ€ํ•œ ๋ถ„์‚ฐ์˜ ๋น„์œจ
Cumulative Var   0.354   0.692   0.940 : ๋ˆ„์  ๋ถ„์‚ฐ ๋น„์œจ (0.940์ด ์ตœ์ข… ๋ˆ„์  ๋ถ„์‚ฐ ๋น„์œจ)
* ์ •๋ณด์†์‹ค:1-0.940(๋ˆ„์ ๋ถ„์‚ฐ๋น„์œจ) = 0.06 (์ „์ฒด ๋ฐ์ดํ„ฐ ์ค‘ ์•ฝ 6% ์ •๋ณด ์†์‹ค)
* ์ •๋ณด์†์‹ค์ด ํด์ˆ˜๋ก ์š”์ธ๋ถ„์„์˜ ์˜๋ฏธ๊ฐ€ ์—†๋‹ค.

[๋‹จ๊ณ„4] ์š”์ธํŒ์ •
์š”์ธ์ ์žฌ๋Ÿ‰ ๋ณด๊ธฐ

names(FA) # loadings, uniquenesses, scores.. 12๊ฐœ์˜ ์นผ๋Ÿผ ํ˜ธ์ถœ 
load <- FA$loadings
load


์š”์ธ๋ถ€ํ•˜๋Ÿ‰ 0.4 ์ด์ƒ, ์†Œ์ˆ˜์  2์ž๋ฆฌ ํ‘œ๊ธฐ 

print(load, digits = 2, cutoff=0.4)

[ํ•ด์„ค] Factor1 : s5,s6(๋™์ผ๋ถ€ํ˜ธ) / Factor2 : s3,s4 / Factor3 : s1, s2

๋ชจ๋“  ์š”์ธ์ ์žฌ๋Ÿ‰ ๋ณด๊ธฐ :๊ฐ์ถ”์–ด์ง„ ์š”์ธ์ ์žฌ๋Ÿ‰ ๋ณด๊ธฐ

print(load, cutoff=0) # display every loadings



[์š”์ธ์ ์ˆ˜๋ฅผ ์ด์šฉํ•œ ์š”์ธ์ ์žฌ๋Ÿ‰ ์‹œ๊ฐํ™”] 
๋™์‹œ์— ์„ธ ๊ฐœ ์ด์ƒ์˜ ์š”์ธ์„ 2์ฐจ์› ์‚ฐ์ ๋„๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์—†์Œ

1) Factor1, Factor2 ์š”์ธ์ง€ํ‘œ ์‹œ๊ฐํ™” 
์š”์ธ์ ์ˆ˜ํ–‰๋ ฌ

plot(FA$scores[,c(1:2)], main="Factor1๊ณผ Factor2 ์š”์ธ์ ์ˆ˜ ํ–‰๋ ฌ")


๊ด€์ธก์น˜๋ณ„ ์ด๋ฆ„ ๋งคํ•‘(rownames mapping)

text(FA$scores[,1], FA$scores[,2], 
     labels = name, cex = 0.7, pos = 3, col = "blue")


์š”์ธ์ ์žฌ๋Ÿ‰ plotting

points(FA$loadings[,c(1:2)], pch=19, col = "red")

text(FA$loadings[,1], FA$loadings[,2], 
     labels = rownames(FA$loadings), 
     cex = 0.8, pos = 3, col = "red")


2) Factor1, Factor3 ์š”์ธ์ง€ํ‘œ ์‹œ๊ฐํ™” 
์š”์ธ์ ์ˆ˜ํ–‰๋ ฌ

plot(FA$scores[,c(1,3)], main="Factor1๊ณผ Factor3 ์š”์ธ์ ์ˆ˜ ํ–‰๋ ฌ")


๊ด€์ธก์น˜๋ณ„ ์ด๋ฆ„ ๋งคํ•‘(rownames mapping)

text(FA$scores[,1], FA$scores[,3], 
     labels = name, cex = 0.7, pos = 3, col = "blue")


์š”์ธ์ ์žฌ๋Ÿ‰ plotting

points(FA$loadings[,c(1,3)], pch=19, col = "red")

* Factor1, Factor3 ์š”์ธ์ ์žฌ๋Ÿ‰ ํ‘œ์‹œ  

text(FA$loadings[,1], FA$loadings[,3], 
     labels = rownames(FA$loadings), 
     cex = 0.8, pos = 3, col = "red")


3์ฐจ์› ์‹œ๊ฐํ™” : ์š”์ธ์ˆ˜ 3๊ฐœ

library(scatterplot3d)

Factor1 <- FA$scores[,1]
Factor2 <- FA$scores[,2]
Factor3 <- FA$scores[,3]

* scatterplot3d(๋ฐ‘๋ณ€, ์˜ค๋ฅธ์ชฝ๋ณ€, ์™ผ์ชฝ๋ณ€, type='p') # type='p' : ๊ธฐ๋ณธ์‚ฐ์ ๋„ ํ‘œ์‹œ 

d3 <- scatterplot3d(Factor1, Factor2, Factor3)


์š”์ธ์ ์žฌ๋Ÿ‰ ํ‘œ์‹œ 

loadings1 <- FA$loadings[,1]
loadings2 <- FA$loadings[,2]
loadings3 <- FA$loadings[,3] 
d3$points3d(loadings1, loadings2, loadings3, bg='red',pch=21, cex=2, type='h')


์š”์ธ๋ถ„์„ : ํ™•์ธ์  
ํ™•์ธ์  ์š”์ธ๋ถ„์„์œผ๋กœ ์ฃผ๋กœ ์„ค๋ฌธ์ง€ ํƒ€๋‹น์„ฑ ๋ถ„์„์—์„œ ์ด์šฉ๋œ๋‹ค.  
์ž˜๋ชป ๋ถ„๋ฅ˜๋œ ์š”์ธ์ธ ๋ฐœ๊ฒฌ๋œ ๊ฒฝ์šฐ ํ•ด๋‹น ๋ณ€์ˆ˜๋ฅผ ์ œ๊ฑฐํ•œ ํ›„ new dataset์„ ์ƒ์„ฑํ•œ๋‹ค. 

[๋‹จ๊ณ„1] ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ 

install.packages('memisc') #spss(file)์„ R(dataset)์œผ๋กœ ๋ณ€ํ™˜ํ•ด์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
library(memisc)

setwd('C:\\ITWILL\\2_Rwork\\data')
data.spss <- as.data.set(spss.system.file('drinking_water.sav'))
data.spss
drinking_water <- data.spss[1:11]
drinking_water
drinking_water_df <- as.data.frame(drinking_water) 
str(drinking_water_df)

์นœ๋ฐ€๋„ : q1,q2,q3,q4
์ ์ ˆ์„ฑ : q5,q6,q7
๋งŒ์กฑ๋„ : q8,q9,q10,q11

drinking_water_df = read.csv('drinking_water.csv')


[๋‹จ๊ณ„2] ๊ณ ์œ ๊ฐ’์œผ๋กœ ์š”์ธ ์ˆ˜ ํ™•์ธ [์ƒ๋žต : ์š”์ธ์ˆ˜ ์•Œ๊ณ  ์žˆ๋Š” ๊ฒฝ์šฐ] 

en <- eigen(cor(drinking_water_df))
plot(en$values, type="o")


[๋‹จ๊ณ„3] ์š”์ธ๋ถ„์„ 

fact.result <- factanal(drinking_water_df, factors = 3, 
                        rotation = "varimax",
                        scores = "regression")
fact.result


[๋‹จ๊ณ„4] ์š”์ธํŒ์ •
์š”์ธ์ ์žฌ๋Ÿ‰ ๋ณด๊ธฐ  

load <- fact.result$loadings


์š”์ธ๋ถ€ํ•˜๋Ÿ‰ 0.4 ์ด์ƒ, ์†Œ์ˆ˜์  2์ž๋ฆฌ ํ‘œ๊ธฐ 

print(load, digits = 2, cutoff=0.4)


๋ชจ๋“  ์š”์ธ์ ์žฌ๋Ÿ‰ ๋ณด๊ธฐ :๊ฐ์ถ”์–ด์ง„ ์š”์ธ์ ์žฌ๋Ÿ‰ ๋ณด๊ธฐ

print(load, cutoff=0) # display every loadings


์ž˜๋ชป ๋ถ„๋ฅ˜๋œ ๋ณ€์ˆ˜(Q4) ๋ฐœ๊ฒฌ/์ œ๊ฑฐ

new_dw = drinking_water_df[-4] #Q4์ œ๊ฑฐ
dim(new_dw) #๋ณ€์ˆ˜ 11๊ฐœ->10๊ฐœ ํ™•์ธ


์š”์ธ๋ถ„์„ (์žฌํ™•์ธ)

fact.result <- factanal(new_dw, factors = 3, 
                        rotation = "varimax",
                        scores = "regression")
fact.result
names(fact.result2)
load = fact.result2$loadings


์š”์ธ๋ถ€ํ•˜๋Ÿ‰ 0.4์ด์ƒ, ์†Œ์ˆ˜์  2์ž๋ฆฌ ํฌ๊ธฐ

print(load, digits=2, cutoff=0.4)