์ด์ •๋ฆฌ ์—ฐ์Šต๋ฌธ์ œ

 

2012๋…„ ๋ฏธ๊ตญ ๋Œ€์„  ๊ธฐ๋ถ€๊ธˆ ํ˜„ํ™ฉ ๋ฐ์ดํ„ฐ ์…‹ 

election = read.csv(file.choose(), stringsAsFactors = F) # election_2012.csv ์„ ํƒ
dim(election) # 1001731      16
str(election) # dim + class

<๋ฐ์ดํ„ฐ ์…‹ ์„ค๋ช…> : 2012๋…„ ๋ฏธ๊ตญ ๋Œ€์„ ์ž('Romney, Mitt'์™€ 'Obama, Barack') ํ›„์›๊ธˆ ํ˜„ํ™ฉ
'data.frame': 1001731 obs. of  16 variables:
3. cand_nm : ๋Œ€์„  ํ›„๋ณด์ž์ด๋ฆ„
4. contbr_nm : ํ›„์›์ž์ด๋ฆ„ 
5. contbr_city : ํ›„์› ๋„์‹œ  
9. contbr_occupation : ํ›„์›์ž ์ง์—…๊ตฐ 
10. contb_receipt_amt: ํ›„์›๊ธˆ 
11. contb_receipt_dt : ํ›„์› ๋‚ ์งœ 

 

 

 

chapter 01 : ์ž๋ฃŒํ˜•, ํ˜•๋ณ€ํ™˜(๋‚ ์งœ ๋ณ€ํ™˜)
[๋ฌธ์ œ1] election ๋ฐ์ดํ„ฐ์…‹์˜ ๋ณ€์ˆ˜๋ฅผ ๋Œ€์ƒ์œผ๋กœ ์ž๋ฃŒํ˜•์„ ํ™•์ธํ•˜๊ณ  ์ž๋ฃŒํ˜•์„ ๋ณ€๊ฒฝํ•˜์‹œ์˜ค. 
์†Œ์š”์‹œ๊ฐ„ : 5๋ถ„ 

1) cand_nm, contb_receipt_amt, contb_receipt_dt ๋ณ€์ˆ˜์˜ ์ž๋ฃŒํ˜• ํ™•์ธํ•˜๊ธฐ  
ํžŒํŠธ) mode() ์ด์šฉ 

mode(election$cand_nm) # "character"
mode(election$contb_receipt_amt) # "numeric"
mode(election$contb_receipt_dt) # "character"


2) ํ›„์›๋‚ ์งœ(contb_receipt_dt)๋ณ€์ˆ˜๋ฅผ ๋‚ ์งœํ˜•์œผ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ  

date = election$contb_receipt_dt 
date[1:10] # "20-Jun-11" "23-Jun-11" -> ๋ฏธ๊ตญ์‹ : ์ผ-์›”-๋…„๋„

Sys.Date(data) # Error in Sys.Date(data) : unused argument (data)


๋‹ค๊ตญ์–ด ์ •๋ณด ๋ณ€๊ฒฝ : ํ•œ๊ตญ -> ์˜์–ด 

Sys.getlocale() # "LC_COLLATE=Korean_Korea
Sys.setlocale(locale = 'English_USA') # ๋ฏธ๊ตญ์‹

 

๋ฏธ๊ตญ์‹ : ์ผ-์›”-๋…„๋„ -> ํ•œ๊ตญ์‹ : ๋…„๋„-์›”-์ผ 

kdate <- strptime(date, "%d-%b-%y")
kdate[1:10]


๋‚ ์งœํ˜• ์ˆ˜์ • 

election$contb_receipt_dt <- kdate

Sys.setlocale(locale = 'Korean_Korea') # ํ•œ๊ตญ์‹ ๋ณ€๊ฒฝ

 

 




chapter 02 : ์ƒ‰์ธ(index), ์นผ๋Ÿผ๋ช… ๋ณ€๊ฒฝ  
[๋ฌธ์ œ2] election ๋ฐ์ดํ„ฐ์…‹์„ ๋Œ€์ƒ์œผ๋กœ 6๊ฐœ ์นผ๋Ÿผ(๋ฐ์ดํ„ฐ ์…‹ ์„ค๋ช…)๋งŒ ์„ ํƒํ•˜์—ฌ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์…‹์„ ๋งŒ๋“ค์‹œ์˜ค.
์†Œ์š”์‹œ๊ฐ„ : 3๋ถ„ 

1) ์ƒ‰์ธ(index) ์ด์šฉํ•˜๊ธฐ : ํžŒํŠธ) dataset[, c(์—ดindex1, ์—ดindex2, ...)]

election_df = election[,c(3:5,9:11)]
dim(election_df)  # 1001731       6

 
2) election_df ์นผ๋Ÿผ๋ช… ๋ณ€๊ฒฝํ•˜๊ธฐ : ํžŒํŠธ) names(dataset) <- c('์นผ๋Ÿผ๋ช…1','์นผ๋Ÿฌ๋ช…2', ...)

์ˆ˜์ • ์นผ๋Ÿผ๋ช… :'cand_name','contbr_name','city','occupation','receipt_amt','receipt_date'

names(election_df)
names(election_df) <- c('cand_name','contbr_name','city','occupation','receipt_amt','receipt_date')    
names(election_df)

 

 




chapter 03 : ์„œ๋ธŒ์…‹(subset) ๋งŒ๋“ค๊ธฐ  
[๋ฌธ์ œ3] 'Romney, Mitt'์™€ 'Obama, Barack' ๋Œ€๋ นํ†ต ํ›„๋ณด์ž ๋ณ„๋กœ ์„œ๋ธŒ์…‹(subset)์„ ์ƒ์„ฑํ•˜์‹œ์˜ค.
์†Œ์š”์‹œ๊ฐ„ : 6๋ถ„
  
1) ๋Œ€์„  ํ›„๋ณด์ž ์ด๋ฆ„(cand_name)์„ ๋Œ€์ƒ์œผ๋กœ ์ค‘๋ณต๋˜์ง€ ์•Š์€ ํ›„๋ณด์ž ์ด๋ฆ„๊ณผ ๊ฐ ํ›„๋ณด์ž๋ณ„ ๋นˆ๋„์ˆ˜ ํ™•์ธํ•˜๊ธฐ
ํžŒํŠธ) unique() : ์œ ์ผ๊ฐ’ ํ™•์ธ, table() : ๋นˆ๋„์ˆ˜ ํ™•์ธ 

unique(election_df$cand_name) # 13๋ช… - "Romney, Mitt", "Obama, Barack"
table(election_df$cand_name)


2) 'Romney, Mitt'์™€ 'Obama, Barack' ๋Œ€๋ นํ†ต ํ›„๋ณด์ž ๋ณ„๋กœ ์„œ๋ธŒ์…‹ ๋งŒ๋“ค๊ธฐ
ํžŒํŠธ) subset(dataset, subset = ์กฐ๊ฑด์‹)

romney = subset(election_df, subset = cand_name == "Romney, Mitt") # 'Romney, Mitt'
obama = subset(election_df, subset = cand_name == "Obama, Barack")# 'Obama, Barack'


์ฐจ์› ํ™•์ธ 

dim(romney) # 107229      6
dim(obama) # 593746      6


๋‚ด์šฉ ํ™•์ธ 

head(romney)
tail(romney)
head(obama) 
tail(obama)

+ Recent posts