AI ์‹ค๋ฌด ๊ธฐ๋ณธ ๊ณผ์ •
[๊ธฐ๋ณธํ”„๋กœ์ ํŠธ] 01 ๊ตญ๋‚ด ์ฝ”๋กœ๋‚˜ ํ™˜์ž ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ ๋ถ„์„

ํ”„๋กœ์ ํŠธ ๋ชฉํ‘œ

์„œ์šธ์‹œ ์ฝ”๋กœ๋‚˜19 ํ™•์ง„์ž ํ˜„ํ™ฉ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์œ ์˜๋ฏธํ•œ ์ •๋ณด ๋„์ถœ
ํƒ์ƒ‰์  ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ ์ •์ œ, ํŠน์„ฑ ์—”์ง€๋‹ˆ์–ด๋ง, ์‹œ๊ฐํ™” ๋ฐฉ๋ฒ• ํ•™์Šต

 

  • ํ”„๋กœ์ ํŠธ ๋ชฉ์ฐจ
    1. ๋ฐ์ดํ„ฐ ์ฝ๊ธฐ: ์ฝ”๋กœ๋‚˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  Dataframe ๊ตฌ์กฐ๋ฅผ ํ™•์ธ
      1.1. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

    2. ๋ฐ์ดํ„ฐ ์ •์ œ: ๋น„์–ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ ๋˜๋Š” ์“ธ๋ชจ ์—†๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์‚ญ์ œ
      2.1. ๋น„์–ด์žˆ๋Š” column ์ง€์šฐ๊ธฐ

    3. ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”: ๊ฐ ๋ณ€์ˆ˜ ๋ณ„๋กœ ์ถ”๊ฐ€์ ์ธ ์ •์ œ ๋˜๋Š” feature engineering ๊ณผ์ •์„ ๊ฑฐ์น˜๊ณ  ์‹œ๊ฐํ™”๋ฅผ ํ†ตํ•˜์—ฌ ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ ํŒŒ์•…
      3.1. ํ™•์ง„์ผ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌํ•˜๊ธฐ
      3.2. ์›”๋ณ„ ํ™•์ง„์ž ์ˆ˜ ์ถœ๋ ฅ
      3.3. 8์›” ์ผ๋ณ„ ํ™•์ง„์ž ์ˆ˜ ์ถœ๋ ฅ
      3.4. ์ง€์—ญ๋ณ„ ํ™•์ง„์ž ์ˆ˜ ์ถœ๋ ฅ
      3.5. 8์›”๋‹ฌ ์ง€์—ญ๋ณ„ ํ™•์ง„์ž ์ˆ˜ ์ถœ๋ ฅ
      3.6. ์›”๋ณ„ ๊ด€์•…๊ตฌ ํ™•์ง„์ž ์ˆ˜ ์ถœ๋ ฅ
      3.7. ์„œ์šธ ์ง€์—ญ์—์„œ ํ™•์ง„์ž๋ฅผ ์ง€๋„์— ์ถœ๋ ฅ

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜

https://data.seoul.go.kr/dataList/OA-11677/S/1/datasetView.do

https://www.data.go.kr/tcs/dss/selectFileDataDetailView.do?publicDataPk=15063273

https://2021nipa.elice.io/explore

 

 

 

 

 

1. ๋ฐ์ดํ„ฐ ์ฝ๊ธฐ

ํ•„์š”ํ•œ ํŒจํ‚ค์ง€ ์„ค์น˜ ๋ฐ importํ•œ ํ›„ pandas๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ๊ณ  ์–ด๋– ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ์ €์žฅ๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

 

1-1. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns

# pd.read_csv๋ฅผ ํ†ตํ•˜์—ฌ dataframe ํ˜•ํƒœ๋กœ ์ฝ์–ด์˜ต๋‹ˆ๋‹ค.
corona_all=pd.read_csv("./data/์„œ์šธ์‹œ ์ฝ”๋กœ๋‚˜19 ํ™•์ง„์ž ํ˜„ํ™ฉ.csv")

# ์ƒ์œ„ 5๊ฐœ ๋ฐ์ดํ„ฐ๋ฅผ# ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
corona_all.head()

# dataframe ์ •๋ณด๋ฅผ ์š”์•ฝํ•˜์—ฌ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. 
corona_all.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5748 entries, 0 to 5747
Data columns (total 14 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   ์—ฐ๋ฒˆ      5748 non-null   int64  
 1   ํ™•์ง„์ผ     5748 non-null   object 
 2   ํ™˜์ž๋ฒˆํ˜ธ    5748 non-null   int64  
 3   ๊ตญ์       0 non-null      float64
 4   ํ™˜์ž์ •๋ณด    0 non-null      float64
 5   ์ง€์—ญ      5748 non-null   object 
 6   ์—ฌํ–‰๋ ฅ     459 non-null    object 
 7   ์ ‘์ด‰๋ ฅ     5748 non-null   object 
 8   ์กฐ์น˜์‚ฌํ•ญ    0 non-null      float64
 9   ์ƒํƒœ      5357 non-null   object 
 10  ์ด๋™๊ฒฝ๋กœ    5520 non-null   object 
 11  ๋“ฑ๋ก์ผ     5748 non-null   object 
 12  ์ˆ˜์ •์ผ     5748 non-null   object 
 13  ๋…ธ์ถœ์—ฌ๋ถ€    5748 non-null   object 
dtypes: float64(3), int64(2), object(9)
memory usage: 628.8+ KB

 

 

 

2. ๋ฐ์ดํ„ฐ ์ •์ œ

๋ฐ์ดํ„ฐ๋ฅผ ์ฝ๊ณ  ํ™•์ธํ–ˆ๋‹ค๋ฉด ๊ฒฐ์ธก๊ฐ’(missing data), ์ด์ƒ์น˜(outlier)๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ์ดํ„ฐ ์ •์ œ ๊ณผ์ •์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ๋ด…์‹œ๋‹ค.

 

2-1. ๋น„์–ด์žˆ๋Š” ์นผ๋Ÿผ ์‚ญ์ œ

# drop ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ตญ์ , ํ™˜์ž์ •๋ณด, ์กฐ์น˜์‚ฌํ•ญ coulmn ๋ฐ์ดํ„ฐ๋ฅผ ์‚ญ์ œํ•ฉ๋‹ˆ๋‹ค.
corona_del_col = corona_all.drop(columns = ['๊ตญ์ ','ํ™˜์ž์ •๋ณด','์กฐ์น˜์‚ฌํ•ญ'])

# ์ •์ œ ์ฒ˜๋ฆฌ๋œ dataframe ์ •๋ณด๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
corona_del_col.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5748 entries, 0 to 5747
Data columns (total 11 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   ์—ฐ๋ฒˆ      5748 non-null   int64 
 1   ํ™•์ง„์ผ     5748 non-null   object
 2   ํ™˜์ž๋ฒˆํ˜ธ    5748 non-null   int64 
 3   ์ง€์—ญ      5748 non-null   object
 4   ์—ฌํ–‰๋ ฅ     459 non-null    object
 5   ์ ‘์ด‰๋ ฅ     5748 non-null   object
 6   ์ƒํƒœ      5357 non-null   object
 7   ์ด๋™๊ฒฝ๋กœ    5520 non-null   object
 8   ๋“ฑ๋ก์ผ     5748 non-null   object
 9   ์ˆ˜์ •์ผ     5748 non-null   object
 10  ๋…ธ์ถœ์—ฌ๋ถ€    5748 non-null   object
dtypes: int64(2), object(9)
memory usage: 494.1+ KB

 

 

 

3. ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”

๋ฐ์ดํ„ฐ ์ •์ œ๋ฅผ ์™„๋ฃŒํ•œ corona_del_col ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๊ฐ column์˜ ๋ณ€์ˆ˜๋ณ„๋กœ ์–ด๋– ํ•œ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋ฅผ ํ•˜๊ณ  ์žˆ๋Š”์ง€ ์‹œ๊ฐํ™”๋ฅผ ํ†ตํ•˜์—ฌ ์•Œ์•„๋ด…์‹œ๋‹ค.

 

3-1. ํ™•์ง„์ผ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

corona_del_col['ํ™•์ง„์ผ']

0       10.21.
1       10.21.
2       10.21.
3       10.21.
4       10.21.
         ...  
5743     1.31.
5744     1.30.
5745     1.30.
5746     1.30.
5747     1.24.
Name: ํ™•์ง„์ผ, Length: 5748, dtype: object

 

ํ™•์ง„์ผ์— ์ €์žฅ๋œ ๋ฌธ์ž์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๋‚˜๋ˆ„์–ด month, day column์— int64 ํ˜•ํƒœ๋กœ ์ €์žฅํ•ด ๋ด…์‹œ๋‹ค.

# dataframe์— ์ถ”๊ฐ€ํ•˜๊ธฐ ์ „, ์ž„์‹œ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•ด ๋‘˜ list๋ฅผ ์„ ์–ธํ•ฉ๋‹ˆ๋‹ค.
month = []
day = []

for data in corona_del_col['ํ™•์ง„์ผ']:
    # split ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์›”, ์ผ์„ ๋‚˜๋ˆ„์–ด list์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
    month.append(data.split('.')[0])
    day.append(data.split('.')[1])
    
# corona_del_col์— `month`, `day` column์„ ์ƒ์„ฑํ•˜๋ฉฐ ๋™์‹œ์— list์— ์ž„์‹œ ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.
corona_del_col['month'] = month
corona_del_col['day'] = day

corona_del_col['month'].astype('int64')
corona_del_col['day'].astype('int64')

0       21
1       21
2       21
3       21
4       21
        ..
5743    31
5744    30
5745    30
5746    30
5747    24
Name: day, Length: 5748, dtype: int64

 

 

3-2. ์›”๋ณ„ ํ™•์ง„์ž ์ˆ˜ ์ถœ๋ ฅ

๋‚˜๋ˆ„์–ด์ง„ month์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋‹ฌ๋ณ„ ํ™•์ง„์ž ์ˆ˜๋ฅผ ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„๋กœ ์ถœ๋ ฅํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

# ๊ทธ๋ž˜ํ”„์—์„œ x์ถ•์˜ ์ˆœ์„œ๋ฅผ ์ •๋ฆฌํ•˜๊ธฐ ์œ„ํ•˜์—ฌ order list๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
order = []
for i in range(1,11):
    order.append(str(i))
order #['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

# ๊ทธ๋ž˜ํ”„์˜ ์‚ฌ์ด์ฆˆ๋ฅผ ์กฐ์ ˆํ•ฉ๋‹ˆ๋‹ค.
plt.figure(figsize=(10,5))

# seaborn์˜ countplot ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
sns.set(style="darkgrid")
ax = sns.countplot(x="month", data=corona_del_col, palette="Set2", order = order)

 

# series์˜ plot ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ ์ถœ๋ ฅ ๋ฐฉ๋ฒ•๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
corona_del_col['month'].value_counts().plot(kind='bar')

 

# value_counts()๋Š” ๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์„ธ์–ด์„œ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ฆฌํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.
corona_del_col['month'].value_counts()

8     2416
9     1304
6      460
10     425
3      391
7      281
5      228
4      156
2       80
1        7
Name: month, dtype: int64

 

 

3-3. 8์›”๋‹ฌ ์ผ๋ณ„ ํ™•์ง„์ž ์ˆ˜ ์ถœ๋ ฅ

์›”๋ณ„ ํ™•์ง„์ž ์ˆ˜๋ฅผ ์ถœ๋ ฅํ•ด๋ณด๋ฉด ์•Œ ์ˆ˜ ์žˆ๋“ฏ์ด 8์›”์— ํ™•์ง„์ž ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์•˜์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ์—” 8์›” ๋™์•ˆ ํ™•์ง„์ž ์ˆ˜๊ฐ€ ์–ด๋–ป๊ฒŒ ๋Š˜์—ˆ๋Š”์ง€ ์ผ๋ณ„ ํ™•์ง„์ž ์ˆ˜๋ฅผ ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„๋กœ ์ถœ๋ ฅํ•ด ๋ด…์‹œ๋‹ค.

# ๊ทธ๋ž˜ํ”„์—์„œ x์ถ•์˜ ์ˆœ์„œ๋ฅผ ์ •๋ฆฌํ•˜๊ธฐ ์œ„ํ•˜์—ฌ order list๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
order2 = []
for i in range(1,32):
    
    order2.append(str(i))

order2
'''
['1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 '10',
 '11',
 '12',
 '13',
 '14',
 '15',
 '16',
 '17',
 '18',
 '19',
 '20',
 '21',
 '22',
 '23',
 '24',
 '25',
 '26',
 '27',
 '28',
 '29',
 '30',
 '31']
 '''
 
 # seaborn์˜ countplot ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
plt.figure(figsize=(20,10))
sns.set(style="darkgrid")
ax = sns.countplot(x="day", data=corona_del_col[corona_del_col['month'] == '8'], palette="rocket_r", order = order2)

 

 

[๋ฌธ์ œ1] 8์›” ํ‰๊ท  ์ผ๋ณ„ ํ™•์ง„์ž ์ˆ˜ ๊ตฌํ•˜๊ธฐ

print(corona_del_col[corona_del_col['month'] == '8']['day'].count()/31)
#77.93548387096774

print(corona_del_col[corona_del_col['month'] == '8']['day'].value_counts().mean())
#77.93548387096774

# 8์›” ํ‰๊ท  ํ™•์ง„์ž ์ˆ˜๋ฅผ ๊ตฌํ•˜์—ฌ quiz_1 ๋ณ€์ˆ˜์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. floatํ˜• ์ƒ์ˆ˜๊ฐ’์œผ๋กœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
quiz_1 = corona_del_col[corona_del_col['month'] == '8']['day'].count()/31
quiz_1 #77.93548387096774

 

 

3-4. ์ง€์—ญ๋ณ„ ํ™•์ง„์ž ์ˆ˜ ์ถœ๋ ฅ

corona_del_col['์ง€์—ญ']

0       ์–‘์ฒœ๊ตฌ
1       ๊ฐ•๋™๊ตฌ
2       ๊ฐ•๋‚จ๊ตฌ
3       ๊ด€์•…๊ตฌ
4       ๊ด€์•…๊ตฌ
       ... 
5743    ์„ฑ๋ถ๊ตฌ
5744    ๋งˆํฌ๊ตฌ
5745    ์ข…๋กœ๊ตฌ
5746    ์ค‘๋ž‘๊ตฌ
5747    ๊ฐ•์„œ๊ตฌ
Name: ์ง€์—ญ, Length: 5748, dtype: object

 

์ด๋ฒˆ์—๋Š” ์ง€์—ญ๋ณ„๋กœ ํ™•์ง„์ž๊ฐ€ ์–ผ๋งˆ๋‚˜ ์žˆ๋Š”์ง€ ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„๋กœ ์ถœ๋ ฅํ•ด ๋ด…์‹œ๋‹ค.

import matplotlib.font_manager as fm

font_dirs = ['/usr/share/fonts/truetype/nanum', ]
font_files = fm.findSystemFonts(fontpaths=font_dirs)

for font_file in font_files:
    fm.fontManager.addfont(font_file)
    
plt.figure(figsize=(20,10))
# ํ•œ๊ธ€ ์ถœ๋ ฅ์„ ์œ„ํ•ด์„œ ํฐํŠธ ์˜ต์…˜์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
sns.set(font="NanumBarunGothic", 
        rc={"axes.unicode_minus":False},
        style='darkgrid')
ax = sns.countplot(x="์ง€์—ญ", data=corona_del_col, palette="Set2")

 

์ง€์—ญ ์ด์ƒ์น˜ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ

์œ„์˜ ์ถœ๋ ฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๋ฉด ์ข…๋ž‘๊ตฌ๋ผ๋Š” ์ž˜๋ชป๋œ ๋ฐ์ดํ„ฐ์™€ ํ•œ๊ตญ์ด๋ผ๋Š” ์ง€์—ญ๊ณผ๋Š” ๋งž์ง€ ์•Š๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ธฐ์กด ์ง€์—ญ ๋ฐ์ดํ„ฐ ํŠน์„ฑ์— ๋งž๋„๋ก ์ข…๋ž‘๊ตฌ -> ์ค‘๋ž‘๊ตฌ, ํ•œ๊ตญ -> ๊ธฐํƒ€๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€๊ฒฝํ•ด ๋ด…์‹œ๋‹ค.

# replace ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•ด๋‹น ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค.
# ์ด์ƒ์น˜๊ฐ€ ์ฒ˜๋ฆฌ๋œ ๋ฐ์ดํ„ฐ์ด๊ธฐ์— ์ƒˆ๋กœ์šด Dataframe์œผ๋กœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
corona_out_region = corona_del_col.replace({'์ข…๋ž‘๊ตฌ':'์ค‘๋ž‘๊ตฌ', 'ํ•œ๊ตญ':'๊ธฐํƒ€'})

# ์ด์ƒ์น˜๊ฐ€ ์ฒ˜๋ฆฌ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค์‹œ ์ถœ๋ ฅํ•ด ๋ด…์‹œ๋‹ค.
plt.figure(figsize=(20,10))
sns.set(font="NanumBarunGothic", 
        rc={"axes.unicode_minus":False},
        style='darkgrid')
ax = sns.countplot(x="์ง€์—ญ", data=corona_out_region, palette="Set2")

 

 

3-5. 8์›”๋‹ฌ ์ง€์—ญ๋ณ„ ํ™•์ง„์ž ์ˆ˜ ์ถœ๋ ฅ

# ๋…ผ๋ฆฌ์—ฐ์‚ฐ์„ ์ด์šฉํ•œ ์กฐ๊ฑด์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‚ฌ์šฉํ•˜๋ฉด ํ•ด๋‹น ์กฐ๊ฑด์— ๋งž๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ถœ๋ ฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
corona_out_region[corona_del_col['month'] == '8']

# ๊ทธ๋ž˜ํ”„๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
plt.figure(figsize=(20,10))
sns.set(font="NanumBarunGothic", 
        rc={"axes.unicode_minus":False},
        style='darkgrid')
ax = sns.countplot(x="์ง€์—ญ", data=corona_out_region[corona_del_col['month'] == '8'], palette="Set2")

 

 

3-6. ์›” ๋ณ„ ๊ด€์•…๊ตฌ ํ™•์ง„์ž ์ˆ˜ ์ถœ๋ ฅ

# ํ•ด๋‹น column์„ ์ง€์ •ํ•˜์—ฌ series ํ˜•ํƒœ๋กœ ์ถœ๋ ฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
corona_out_region['month'][corona_out_region['์ง€์—ญ'] == '๊ด€์•…๊ตฌ']

3       10
4       10
6       10
7       10
8       10
        ..
5630     3
5661     2
5674     2
5695     2
5711     2
Name: month, Length: 452, dtype: object

# ๊ทธ๋ž˜ํ”„๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
plt.figure(figsize=(10,5))
sns.set(style="darkgrid")
ax = sns.countplot(x="month", data=corona_out_region[corona_out_region['์ง€์—ญ'] == '๊ด€์•…๊ตฌ'], palette="Set2", order = order)

 

 

3-7. ์„œ์šธ ์ง€์—ญ์—์„œ ํ™•์ง„์ž๋ฅผ ์ง€๋„ ์œ„์— ์ถœ๋ ฅ

# ์ง€๋„ ์ถœ๋ ฅ์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ folium์„ import ํ•ฉ๋‹ˆ๋‹ค.
import folium

# Map ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ง€๋„๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
map_osm = folium.Map(location=[37.529622, 126.984307], zoom_start=11)

map_osm

#์„œ์šธ์‹œ ํ–‰์ •๊ตฌ์—ญ ์‹œ๊ตฐ ์ •๋ณด ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์™€ ์‚ฌ์šฉ. CRS์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
CRS=pd.read_csv("./data/์„œ์šธ์‹œ ํ–‰์ •๊ตฌ์—ญ ์‹œ๊ตฐ๊ตฌ ์ •๋ณด (์ขŒํ‘œ๊ณ„_ WGS1984).csv")

# Dataframe์„ ์ถœ๋ ฅํ•ด ๋ด…๋‹ˆ๋‹ค.
CRS

 

for ๋ฌธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ง€์—ญ๋งˆ๋‹ค ํ™•์ง„์ž๋ฅผ ์›ํ˜• ๋งˆ์ปค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ง€๋„์— ์ถœ๋ ฅํ•ด ๋ด…์‹œ๋‹ค.

# corona_out_region์˜ ์ง€์—ญ์—๋Š” 'oo๊ตฌ' ์ด์™ธ๋กœ `ํƒ€์‹œ๋„`, `๊ธฐํƒ€`์— ํ•ด๋‹น๋˜๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌ ํ•ฉ๋‹ˆ๋‹ค.
# ์œ„ ๋ฐ์ดํ„ฐ์— ํ•ด๋‹น๋˜๋Š” ์œ„๋„, ๊ฒฝ๋„๋ฅผ ์ฐพ์„ ์ˆ˜ ์—†๊ธฐ์— ์‚ญ์ œํ•˜์—ฌ corona_seoul๋กœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
corona_seoul = corona_out_region.drop(corona_out_region[corona_out_region['์ง€์—ญ'] == 'ํƒ€์‹œ๋„'].index)
corona_seoul = corona_seoul.drop(corona_out_region[corona_out_region['์ง€์—ญ'] == '๊ธฐํƒ€'].index)

# ์„œ์šธ ์ค‘์‹ฌ์ง€ ์ค‘๊ตฌ๋ฅผ ๊ฐ€์šด๋ฐ ์ขŒํ‘œ๋กœ ์žก์•„ ์ง€๋„๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
map_osm = folium.Map(location=[37.557945, 126.99419], zoom_start=11)

# ์ง€์—ญ ์ •๋ณด๋ฅผ set ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 25๊ฐœ ๊ณ ์œ ์˜ ์ง€์—ญ์„ ๋ฝ‘์•„๋ƒ…๋‹ˆ๋‹ค.
for region in set(corona_seoul['์ง€์—ญ']):

    # ํ•ด๋‹น ์ง€์—ญ์˜ ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๋ฅผ count์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
    count = len(corona_seoul[corona_seoul['์ง€์—ญ'] == region])
    # ํ•ด๋‹น ์ง€์—ญ์˜ ๋ฐ์ดํ„ฐ๋ฅผ CRS์—์„œ ๋ฝ‘์•„๋ƒ…๋‹ˆ๋‹ค.
    CRS_region = CRS[CRS['์‹œ๊ตฐ๊ตฌ๋ช…_ํ•œ๊ธ€'] == region]

    # CircleMarker๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ง€์—ญ๋งˆ๋‹ค ์›ํ˜•๋งˆ์ปค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    marker = folium.CircleMarker([CRS_region['์œ„๋„'], CRS_region['๊ฒฝ๋„']], # ์œ„์น˜
                                  radius=count/10 + 10,                 # ๋ฒ”์œ„
                                  color='#3186cc',            # ์„  ์ƒ‰์ƒ
                                  fill_color='#3186cc',       # ๋ฉด ์ƒ‰์ƒ
                                  popup=' '.join((region, str(count), '๋ช…'))) # ํŒ์—… ์„ค์ •
    
    # ์ƒ์„ฑํ•œ ์›ํ˜•๋งˆ์ปค๋ฅผ ์ง€๋„์— ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
    marker.add_to(map_osm)

map_osm

 

 

[๋ฌธ์ œ2] 6์›”์— ํ™•์ง„์ž๊ฐ€ ๊ฐ€์žฅ ๋งŽ์ด ๋‚˜์˜จ ์ง€์—ญ์„ ๊ตฌํ•˜์„ธ์š”.

corona_out_region[corona_del_col['month'] == '6']['์ง€์—ญ'].value_counts()
'''
๊ด€์•…๊ตฌ     59
๊ตฌ๋กœ๊ตฌ     45
์–‘์ฒœ๊ตฌ     43
๋„๋ด‰๊ตฌ     43
๊ฐ•์„œ๊ตฌ     33
์˜๋“ฑํฌ๊ตฌ    29
ํƒ€์‹œ๋„     23
์€ํ‰๊ตฌ     18
๊ธˆ์ฒœ๊ตฌ     17
์„œ์ดˆ๊ตฌ     15
์ค‘๋ž‘๊ตฌ     14
๋™์ž‘๊ตฌ     13
๋…ธ์›๊ตฌ     13
๋งˆํฌ๊ตฌ     12
์šฉ์‚ฐ๊ตฌ     12
๊ฐ•๋™๊ตฌ     11
๊ฐ•๋ถ๊ตฌ     10
์„ฑ๋™๊ตฌ      9
์„œ๋Œ€๋ฌธ๊ตฌ     8
๊ฐ•๋‚จ๊ตฌ      7
์†กํŒŒ๊ตฌ      7
์„ฑ๋ถ๊ตฌ      4
๋™๋Œ€๋ฌธ๊ตฌ     4
์ค‘๊ตฌ       3
๊ด‘์ง„๊ตฌ      3
์ข…๋กœ๊ตฌ      3
๊ธฐํƒ€       2
Name: ์ง€์—ญ, dtype: int64
'''

top = corona_out_region[corona_del_col['month'] == '6']['์ง€์—ญ'].value_counts()
top.index[0]
#๊ด€์•…๊ตฌ

# 6์›”์— ํ™•์ง„์ž๊ฐ€ ๊ฐ€์žฅ ๋งŽ์ด ๋‚˜์˜จ ์ง€์—ญ์„ ๊ตฌํ•˜์—ฌ quiz_2 ๋ณ€์ˆ˜์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
# ๋ฌธ์žํ˜•์œผ๋กœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
quiz_2 = top.index[0]

+ Recent posts