๋ค์ด๋ฒ ์ํ ๋ฆฌ๋ทฐ ์์ง
ํฅํ์ํ 14๊ฐ, ํฅํ ์คํจ ์ํ 14๊ฐ
์ํ๋ณ๋ก ๊ฐ 1000๊ฐ์ฉ ์์ง
0. ํจํค์ง ์ํฌํธ
import requests
from bs4 import BeautifulSoup
import pandas as pd
1. url ์
ํ
url_pre = 'https://movie.naver.com/movie/bi/mi/pointWriteFormList.naver?code=161967&type=after&onlyActualPointYn=Y&isActualPointWriteExecute=false&isMileageSubscriptionAlready=false&isMileageSubscriptionReject=false&page='
id_pre = '_filtered_ment_'
[url์์ง]
ํด๋น ์ํ ํ์ ์ง์ ํด์ F12 ๊ฐ๋ฐ์ ๋๊ตฌ ์ผ๊ธฐ > pointWriteFormList.naver ๊ฒ์ํด์ ํ์ด์ง ์ฐพ์๋์!
[์ด๋] for๋ฌธ ์๋ถ๋ถ์ผ๋ก ์ด๋
score_list = [] # ํ์ ์ ์ฅ
data = [] # ๋ฆฌ๋ทฐ ์ ์ฅ
* ์ฃผ์ : for๋ฌธ ์์ชฝ์ ์์ผ๋ฉด ์๋ก์ด ํ์ด์ง๊ฐ ๋ฐ๋๋๋ง๋ค ํด๋น ํ์ด์ง 10๊ฐ๋ง ์ ์ฅ๋จ
2. ๋ฐ์ดํฐ ์์ง
for page in range(100):
print('\n',page+1,"ํ์ด์ง\n")
site = url_pre+str(page+1) #1 ~ 10๊น์ง ๋ณํ๋จ.
print('site =', site)
res = requests.get(site)
soup = BeautifulSoup(res.content,'html.parser')
scores = soup.find_all('div','star_score')
for score in scores:
score_list.append(str(score.get_text()).strip())
id_list = [] # [์ด๋] ์๋ก์ด page ๋ง๋ค ๋ค์ ์ ์ฅํด์ผ ํจ
for i in range(10):
id_list.append(id_pre+str(i))
for id_num in id_list:
id_text = soup.find('span',{'id':id_num}).get_text()
data.append(str(id_text).strip()) # strip() ์ถ๊ฐ : ๋ฌธ์ฅ๋ ๋ถ์ฉ์ด ์ ๊ฑฐ
# ์ ์์ ๋ฆฌ๋ทฐ ํ์ธ
for score, line in zip(score_list, data):
print(score, line)
๊ธธ์ด ํ์ธ : ์ ์์ ๋ฆฌ๋ทฐ ๊ธธ์ด ๊ฐ์์ผ ํจ
len(score_list) # 100
len(data) # 100
3. ๋ฐ์ดํฐ ์์
์ ์ฅ
data_df = pd.DataFrame({'score':score_list, 'review' : data}, columns=['score', 'review'])
data_df.info()
data_df.to_csv(r'๊ฒฝ๋ก๋ช
\์ํ๋ช
_naver_review.csv', sep=',',
na_rep='NaN', encoding='utf-8-sig', index=False)
[error] 'list' object has no attribute 'to_csv'
[ํด๊ฒฐ] listDataFrame์ DateFrame์ผ๋ก ๋ณํํ์ฌ to_csv์ฌ์ฉ
์ฐธ๊ณ
https://cossmos.tistory.com/m/37
https://l0o02.github.io/2018/06/14/python-crawling-pagination-1/
https://haystar.tistory.com/10#toc111
https://kimdingko-world.tistory.com/77
'๊ฐ์ธ๊ณต๋ถ > Python' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
102. ํ์ด๋ ํ๋ก์ ํธ (8)์๊ฐํ, ๋ณ์ ์ถ๊ฐ ์์ง (0) | 2022.01.11 |
---|---|
101. ํ์ด๋ ํ๋ก์ ํธ (7)๋ค์ด๋ฒ ์ํ ๋ฆฌ๋ทฐ ์๋ํด๋ผ์ฐ๋ (0) | 2022.01.10 |
100. ํ์ด๋ ํ๋ก์ ํธ (5)html ๊ธฐ๋ณธ (0) | 2022.01.08 |
99. ํ์ด๋ ํ๋ก์ ํธ (4)์ค๊ฐ์ ๊ฒ (0) | 2022.01.07 |
98. ํ์ด๋ ํ๋ก์ ํธ (3)๋ค์ด๋ฒ ๋ด์ค ํฌ๋กค๋ฌ ๋ง๋ค๊ธฐ (0) | 2022.01.06 |