๊ฐœ์ธ๊ณต๋ถ€/R

์„ธ๋ฏธํ”„๋กœ์ ํŠธ03 ์ถ”๊ฐ€๋ถ„์„

LEE_BOMB 2021. 10. 30. 22:38

5. ์ถ”๊ฐ€๋ถ„์„2
์ง€์—ญ๋ณ„ ํ•œ์˜์›, ์š”์–‘๋ณ‘์›, ๋ณด๊ฑด์†Œ์˜ ๊ฐœ์ˆ˜, ์ง€์—ญ๋ณ„ ์†Œ๋“์ˆ˜์ค€, ๋„์‹œ ๋น„์œจ์ด ๋…ธ์ธ ์ธ๊ตฌ ๋น„์œจ๊ณผ ๊ด€๋ จ์ด ์žˆ์„๊นŒ?

SQL์—์„œ ์›ํ•˜๋Š” ์ •๋ณด(๋ณ‘์› ์ข…๋ฅ˜๋ณ„) ์ถ”์ถœํ•˜๊ณ  dataset ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

hane = read_excel("HANE.xlsx")
yo = read_excel("YOYANG.xlsx")
zin = read_excel("ZINRYO.xlsx")
zi = read_excel("ZISO.xlsx")
head(hane, n=10)


df๋กœ ํ˜•๋ณ€ํ™˜

hane = as.data.frame(hane)
yo = as.data.frame(yo)
zin = as.data.frame(zin)
zi = as.data.frame(zi)


dataset ๋ณ‘ํ•ฉ

plus = merge(total, hane, by="area", all=T)
plus = merge(plus, yo, by="area", all=T)
plus = merge(plus, zin, by="area", all=T)
plus = merge(plus, zi, by="area", all=T)
head(plus)


NA๋ฅผ 0์œผ๋กœ

plus[is.na(plus)]=0


์ƒ๊ด€๋ถ„์„

cor(plus$hane, plus$pop_old_be) #-0.5251034
cor(plus$yoyang, plus$pop_old_be) #-0.3692181
cor(plus$zinryo, plus$pop_old_be) #0.5914815
cor(plus$ziso, plus$pop_old_be) #0.6068566

[ํ•ด์„] ํ•œ์˜์›๊ณผ ์š”์–‘์‹œ์„ค์€ ์Œ์˜ ๊ด€๊ณ„, ๋ณด๊ฑด์ง„๋ฃŒ์†Œ์™€ ๋ณด๊ฑด์ง€์†Œ๋Š” ์–‘์˜ ๊ด€๊ณ„

๋‹ค์ค‘ํšŒ๊ท€๋ถ„์„

plus_model = lm(pop_old_be ~ hane+yoyang+zinryo+ziso, data = plus)
summary(plus_model)

Coefficients:
Estimate          Std.     Error   t value   Pr(>|t|)    
(Intercept) 20.578706   0.870577   23.638    < 2e-16 ***
hane        -0.037452   0.009087   -4.122   5.07e-05 ***
yoyang      -0.255789   0.088803   -2.880   0.004305 ** 
zinryo       0.162956   0.117034    1.392   0.165005    
ziso         0.620955   0.183089    3.392   0.000804 ***
Residual standard error: 6.361 on 258 degrees of freedom
Multiple R-squared:  0.4787, Adjusted R-squared:  0.4706 
F-statistic: 59.23 on 4 and 258 DF,  p-value: < 2.2e-16
[ํ•ด์„] p-value๊ฐ€ 0.05 ๋ฏธ๋งŒ์ด๋ฏ€๋กœ ์œ ์˜์„ฑ์€ ๊ฒ€์ฆ๋˜์—ˆ์œผ๋‚˜ R-squared(์„ค๋ช…๋ ฅ)์€ 0.47์ •๋„๋กœ 47%ํ•ด์„๋œ๋‹ค.





6.์ถ”๊ฐ€๋ถ„์„2
๊ทธ๋ ‡๋‹ค๋ฉด ์ง€์—ญ๋ณ„ ์†Œ๋“์ˆ˜์ค€๊ณผ ๋„์‹œ๋น„์œจ์ด ๋ณ‘์› ์ˆ˜์— ์˜ํ–ฅ์„ ๋ฏธ์น ๊นŒ?

๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

setwd('C:/ITWILL/2_Rwork/00/SemiProject')
ect = read_excel("ect.xlsx")
head(ect, n=10)


๋‹ค์ค‘ํšŒ๊ท€๋ถ„์„

da = lm(count ~ ect$money + ect$city_be, data = ect)

summary(da)

Coefficients:
Estimate        Std.       Error   t value   Pr(>|t|)    
(Intercept) 566.06761  122.95994    4.604    6.93e-06 ***
ect$money    -7.11020    3.57345   -1.990    0.0478 *  
ect$city_be   0.03499    0.56642    0.062    0.9508   
Residual standard error: 333.2 on 226 degrees of freedom
Multiple R-squared:  0.018, Adjusted R-squared:  0.009307 
F-statistic: 2.071 on 2 and 226 DF,  p-value: 0.1284
[ํ•ด์„]p-value๊ฐ€ 0.05 ๋ฏธ๋งŒ์ด๋ฏ€๋กœ ํšŒ๊ท€๋ชจํ˜•์€ ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์˜ํ•˜๊ฒŒ ํƒ€๋‹นํ•œ ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์„ค๋ช…๋ ฅ์ด 0์— ๊ฐ€๊นŒ์šฐ๋ฏ€๋กœ ๋ชจ๋ธ์˜ ์˜ˆ์ธก๋ ฅ์ด ์ข‹๋‹ค๊ณ  ํ•  ์ˆ˜๋Š” ์—†๋‹ค.