์ธ๋ฏธํ๋ก์ ํธ03 ์ถ๊ฐ๋ถ์
5. ์ถ๊ฐ๋ถ์2
์ง์ญ๋ณ ํ์์, ์์๋ณ์, ๋ณด๊ฑด์์ ๊ฐ์, ์ง์ญ๋ณ ์๋์์ค, ๋์ ๋น์จ์ด ๋
ธ์ธ ์ธ๊ตฌ ๋น์จ๊ณผ ๊ด๋ จ์ด ์์๊น?
SQL์์ ์ํ๋ ์ ๋ณด(๋ณ์ ์ข
๋ฅ๋ณ) ์ถ์ถํ๊ณ dataset ๋ถ๋ฌ์ค๊ธฐ
hane = read_excel("HANE.xlsx")
yo = read_excel("YOYANG.xlsx")
zin = read_excel("ZINRYO.xlsx")
zi = read_excel("ZISO.xlsx")
head(hane, n=10)
df๋ก ํ๋ณํ
hane = as.data.frame(hane)
yo = as.data.frame(yo)
zin = as.data.frame(zin)
zi = as.data.frame(zi)
dataset ๋ณํฉ
plus = merge(total, hane, by="area", all=T)
plus = merge(plus, yo, by="area", all=T)
plus = merge(plus, zin, by="area", all=T)
plus = merge(plus, zi, by="area", all=T)
head(plus)
NA๋ฅผ 0์ผ๋ก
plus[is.na(plus)]=0
์๊ด๋ถ์
cor(plus$hane, plus$pop_old_be) #-0.5251034
cor(plus$yoyang, plus$pop_old_be) #-0.3692181
cor(plus$zinryo, plus$pop_old_be) #0.5914815
cor(plus$ziso, plus$pop_old_be) #0.6068566
[ํด์] ํ์์๊ณผ ์์์์ค์ ์์ ๊ด๊ณ, ๋ณด๊ฑด์ง๋ฃ์์ ๋ณด๊ฑด์ง์๋ ์์ ๊ด๊ณ
๋ค์คํ๊ท๋ถ์
plus_model = lm(pop_old_be ~ hane+yoyang+zinryo+ziso, data = plus)
summary(plus_model)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.578706 0.870577 23.638 < 2e-16 ***
hane -0.037452 0.009087 -4.122 5.07e-05 ***
yoyang -0.255789 0.088803 -2.880 0.004305 **
zinryo 0.162956 0.117034 1.392 0.165005
ziso 0.620955 0.183089 3.392 0.000804 ***
Residual standard error: 6.361 on 258 degrees of freedom
Multiple R-squared: 0.4787, Adjusted R-squared: 0.4706
F-statistic: 59.23 on 4 and 258 DF, p-value: < 2.2e-16
[ํด์] p-value๊ฐ 0.05 ๋ฏธ๋ง์ด๋ฏ๋ก ์ ์์ฑ์ ๊ฒ์ฆ๋์์ผ๋ R-squared(์ค๋ช
๋ ฅ)์ 0.47์ ๋๋ก 47%ํด์๋๋ค.
6.์ถ๊ฐ๋ถ์2
๊ทธ๋ ๋ค๋ฉด ์ง์ญ๋ณ ์๋์์ค๊ณผ ๋์๋น์จ์ด ๋ณ์ ์์ ์ํฅ์ ๋ฏธ์น ๊น?
๋ฐ์ดํฐ ๋ถ๋ฌ์ค๊ธฐ
setwd('C:/ITWILL/2_Rwork/00/SemiProject')
ect = read_excel("ect.xlsx")
head(ect, n=10)
๋ค์คํ๊ท๋ถ์
da = lm(count ~ ect$money + ect$city_be, data = ect)
summary(da)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 566.06761 122.95994 4.604 6.93e-06 ***
ect$money -7.11020 3.57345 -1.990 0.0478 *
ect$city_be 0.03499 0.56642 0.062 0.9508
Residual standard error: 333.2 on 226 degrees of freedom
Multiple R-squared: 0.018, Adjusted R-squared: 0.009307
F-statistic: 2.071 on 2 and 226 DF, p-value: 0.1284
[ํด์]p-value๊ฐ 0.05 ๋ฏธ๋ง์ด๋ฏ๋ก ํ๊ท๋ชจํ์ ํต๊ณ์ ์ผ๋ก ์ ์ํ๊ฒ ํ๋นํ ๊ฒ์ผ๋ก ๋ํ๋ฌ๋ค. ๊ทธ๋ฌ๋ ์ค๋ช
๋ ฅ์ด 0์ ๊ฐ๊น์ฐ๋ฏ๋ก ๋ชจ๋ธ์ ์์ธก๋ ฅ์ด ์ข๋ค๊ณ ํ ์๋ ์๋ค.