01. ๊ต์œก์ˆ˜์ค€(education)๊ณผ ํก์—ฐ์œจ(smoking) ๊ฐ„์˜ ๊ด€๋ จ์„ฑ์„ ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•œ ๊ฐ€์„ค์„ ์ˆ˜๋ฆฝํ•˜๊ณ , ์ด๋ฅผ ํ† ๋Œ€๋กœ ๊ฐ€์„ค์„ ๊ฒ€์ •ํ•˜์‹œ์˜ค.[๋…๋ฆฝ์„ฑ ๊ฒ€์ •]

๊ท€๋ฌด๊ฐ€์„ค : ๊ต์œก์ˆ˜์ค€๊ณผ ํก์—ฐ์œจ์€ ๊ด€๋ จ์„ฑ์ด ์—†๋‹ค(๋…๋ฆฝ์ ์ด๋‹ค).
์—ฐ๊ตฌ๊ฐ€์„ค : ๊ต์œก์ˆ˜์ค€๊ณผ ํก์—ฐ์œจ์€ ๊ด€๋ จ์„ฑ์ด ์žˆ๋‹ค(๋…๋ฆฝ์ ์ด์ง€ ์•Š๋‹ค).

 

<๋‹จ๊ณ„ 1> ํŒŒ์ผ ๊ฐ€์ ธ์˜ค๊ธฐ

setwd("c:/ITWILL/2_Rwork/data")
smoke = read.csv("smoke.csv", header=TRUE)


๋ณ€์ˆ˜ ๋ณด๊ธฐ

head(smoke) # education, smoking ๋ณ€์ˆ˜


<๋‹จ๊ณ„ 2> ์ฝ”๋”ฉ ๋ณ€๊ฒฝ - ๋ณ€์ˆ˜ ๋ฆฌ์ฝ”๋”ฉ (๊ฐ€๋…์„ฑ ์ œ๊ณต    )
education(๋…๋ฆฝ๋ณ€์ˆ˜) : 1:๋Œ€์กธ, 2:๊ณ ์กธ, 3:์ค‘์กธ 
smoking(์ข…์†๋ณ€์ˆ˜): 1:๊ณผ๋‹คํก์—ฐ, 2:๋ณดํ†ตํก์—ฐ, 3:๋น„ํก์—ฐ

education ๋ณ€์ˆ˜ ๋ฆฌ์ฝ”๋”ฉ : education2

smoke$education2[smoke$education == 1] = '1:๋Œ€์กธ'
smoke$education2[smoke$education == 2] = '2:๊ณ ์กธ'
smoke$education2[smoke$education == 3] = '3:์ค‘์กธ'
table(smoke$education2)


smoking ๋ณ€์ˆ˜ ๋ฆฌ์ฝ”๋”ฉ : smoking2

smoke$smoking2[smoke$smoking == 1] = "1:๊ณผ๋‹คํก์—ฐ"
smoke$smoking2[smoke$smoking == 2] = "2:๋ณดํ†ตํก์—ฐ"
smoke$smoking2[smoke$smoking == 3] = "3:๋น„ํก์—ฐ"
table(smoke$smoking2)


<๋‹จ๊ณ„ 3> ๊ต์ฐจ๋ถ„ํ• ํ‘œ ์ž‘์„ฑ(table ํ•จ์ˆ˜ ์ด์šฉ)  

table(smoke$education2, smoke$smoking2)


<๋‹จ๊ณ„ 4> ๋…๋ฆฝ์„ฑ ๊ฒ€์ •(CrossTable ํ•จ์ˆ˜ ์ด์šฉ)

CrossTable(x=smoke$education2, y=smoke$smoking2, chisq = T)

Chi^2 =  18.91092     d.f. =  4     p =  0.0008182573 

<๋‹จ๊ณ„ 5> ๊ฒ€์ •๊ฒฐ๊ณผ ํ•ด์„
* P-value < ์•ŒํŒŒ : ์œ ์˜๋ฏธํ•œ ์ˆ˜์ค€์—์„œ ๋‘ ๋ณ€์ธ์€ ๊ด€๋ จ์„ฑ์ด ์žˆ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

02. ๋‚˜์ด(age3)์™€ ์ง์œ„(position) ๊ฐ„์˜ ๊ด€๋ จ์„ฑ์„ ๋‹จ๊ณ„๋ณ„๋กœ ๋ถ„์„ํ•˜์‹œ์˜ค. [๋…๋ฆฝ์„ฑ ๊ฒ€์ •]
๊ท€๋ฌด๊ฐ€์„ค : ๋‚˜์ด์™€ ์ง์œ„์€ ๊ด€๋ จ์„ฑ์ด ์—†๋‹ค.
๋Œ€๋ฆฝ๊ฐ€์„ค : ๋‚˜์ด์™€ ์ง์œ„์€ ๊ด€๋ จ์„ฑ์ด ์žˆ๋‹ค.


[๋‹จ๊ณ„ 1] ํŒŒ์ผ ๊ฐ€์ ธ์˜ค๊ธฐ

data = read.csv("cleanData.csv")
head(data)


[๋‹จ๊ณ„ 2] ๋ณ€์ˆ˜ ์„ ํƒ   

x = data$position # ํ–‰ - ์ง์œ„ ๋ณ€์ˆ˜ ์ด์šฉ
y = data$age3 # ์—ด - ๋‚˜์ด ๋ฆฌ์ฝ”๋”ฉ ๋ณ€์ˆ˜ ์ด์šฉ


[๋‹จ๊ณ„ 3] ์‚ฐ์ ๋„๋ฅผ ์ด์šฉํ•œ ๋ณ€์ˆ˜๊ฐ„์˜ ๊ด€๋ จ์„ฑ ๋ณด๊ธฐ - plot(x,y) ํ•จ์ˆ˜ ์ด์šฉ

plot(x, y) # ๋‘ ๋ณ€์ธ ๊ฐ„ ์ƒ๊ด€์„ฑ ํ™•์ธ


[๋‹จ๊ณ„ 4] ๋…๋ฆฝ์„ฑ ๊ฒ€์ •

chisq.test(x=x, y=y)

X-squared = 287.9, df = 8, p-value < 2.2e-16

[๋‹จ๊ณ„ 5] ๊ฒ€์ •๊ฒฐ๊ณผ ํ•ด์„ 
๊ท€๋ฌด๊ฐ€์„ค์„ ์ง€์ง€ํ•˜๋Š” ํ™•๋ฅ ์ด ๋งค์šฐ ๋‚ฎ๋‹ค. ๋”ฐ๋ผ์„œ ๋งค์šฐ ์œ ์˜๋ฏธํ•œ ์ˆ˜์ค€์—์„œ 
๋‘ ๋ณ€์ธ ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ๋ณด์ธ๋‹ค.

 

 

 

 

03. ์ง์—…์œ ํ˜•์— ๋”ฐ๋ฅธ ์‘๋‹ต์ •๋„์— ์ฐจ์ด๊ฐ€ ์žˆ๋Š”๊ฐ€๋ฅผ ๋‹จ๊ณ„๋ณ„๋กœ ๊ฒ€์ •ํ•˜์‹œ์˜ค.[๋™์งˆ์„ฑ ๊ฒ€์ •]


[๋‹จ๊ณ„ 1] ํŒŒ์ผ ๊ฐ€์ ธ์˜ค๊ธฐ

response = read.csv("response.csv")
head(response) # ๋ณ€์ˆ˜ ๋ณด๊ธฐ

job response

[๋‹จ๊ณ„ 2] ์ฝ”๋”ฉ ๋ณ€๊ฒฝ 
job ์นผ๋Ÿผ ์ฝ”๋”ฉ ๋ณ€๊ฒฝ : 1:ํ•™์ƒ, 2:์ง์žฅ์ธ, 3:์ฃผ๋ถ€ 
response ์นผ๋Ÿผ ์ฝ”๋”ฉ ๋ณ€๊ฒฝ : 1:๋ฌด์‘๋‹ต, 2:๋‚ฎ์Œ, 3:๋†’์Œ

job ๋ณ€์ˆ˜ ๋ฆฌ์ฝ”๋”ฉ : job2

response$job2[response$job==1] = '1:ํ•™์ƒ'
response$job2[response$job==2] = '2:์ง์žฅ์ธ'
response$job2[response$job==3] = '3:์ฃผ๋ถ€'


response ๋ณ€์ˆ˜ ๋ฆฌ์ฝ”๋”ฉ : response2

response$response2[response$response==1] = '1:๋ฌด์‘๋‹ต'
response$response2[response$response==2] = '2:๋‚ฎ์Œ'
response$response2[response$response==3] = '3:๋†’์Œ'


[๋‹จ๊ณ„ 3] ๊ต์ฐจ๋ถ„ํ• ํ‘œ ์ž‘์„ฑ

table(response$job2, response$response2)

           1:๋ฌด์‘๋‹ต 2:๋‚ฎ์Œ 3:๋†’์Œ
1:ํ•™์ƒ         25     37      8
2:์ง์žฅ์ธ       10     62     53
3:์ฃผ๋ถ€          5     41     59

[๋‹จ๊ณ„ 4] ๋™์งˆ์„ฑ ๊ฒ€์ •  

CrossTable(response$job2, response$response2, chisq = T)

Chi^2 =  58.2081     d.f. =  4     p =  6.900771e-12 

[๋‹จ๊ณ„ 5] ๊ฒ€์ •๊ฒฐ๊ณผ ํ•ด์„
๋งค์šฐ ์œ ์˜๋ฏธํ•œ ์ˆ˜์ค€์—์„œ ์ง์—…์˜ ์œ ํ˜•์— ๋”ฐ๋ฅธ ์‘๋‹ต์œจ์— ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

+ Recent posts