๋„ค์ด๋ฒ„ ์˜ํ™” ๋ฆฌ๋ทฐ ์ˆ˜์ง‘
ํฅํ–‰์˜ํ™” 14๊ฐœ, ํฅํ–‰ ์‹คํŒจ ์˜ํ™” 14๊ฐœ
์˜ํ™”๋ณ„๋กœ ๊ฐ 1000๊ฐœ์”ฉ ์ˆ˜์ง‘

 

 

 



0. ํŒจํ‚ค์ง€ ์ž„ํฌํŠธ

import requests
from bs4 import BeautifulSoup
import pandas as pd



1. url ์…‹ํŒ…

url_pre = 'https://movie.naver.com/movie/bi/mi/pointWriteFormList.naver?code=161967&type=after&onlyActualPointYn=Y&isActualPointWriteExecute=false&isMileageSubscriptionAlready=false&isMileageSubscriptionReject=false&page='
id_pre = '_filtered_ment_'

 

[url์ˆ˜์ง‘]

ํ•ด๋‹น ์˜ํ™” ํ‰์  ์ง„์ž…ํ•ด์„œ F12 ๊ฐœ๋ฐœ์ž ๋„๊ตฌ ์ผœ๊ธฐ > pointWriteFormList.naver ๊ฒ€์ƒ‰ํ•ด์„œ ํŽ˜์ด์ง€ ์ฐพ์•„๋ƒˆ์Œ!

 


[์ด๋™] for๋ฌธ ์•ž๋ถ€๋ถ„์œผ๋กœ ์ด๋™  

score_list = [] # ํ‰์  ์ €์žฅ 
data = [] # ๋ฆฌ๋ทฐ ์ €์žฅ

* ์ฃผ์˜ : for๋ฌธ ์•ˆ์ชฝ์— ์žˆ์œผ๋ฉด ์ƒˆ๋กœ์šด ํŽ˜์ด์ง€๊ฐ€ ๋ฐ”๋€”๋•Œ๋งˆ๋‹ค ํ•ด๋‹น ํŽ˜์ด์ง€ 10๊ฐœ๋งŒ ์ €์žฅ๋จ 



2. ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘

for page in range(100):
    print('\n',page+1,"ํŽ˜์ด์ง€\n")
    site = url_pre+str(page+1)  #1 ~ 10๊นŒ์ง€ ๋ณ€ํ™˜๋จ.
    print('site =', site)
    res = requests.get(site)

    soup = BeautifulSoup(res.content,'html.parser')    
    
    scores = soup.find_all('div','star_score')
    
    for score in scores:
        score_list.append(str(score.get_text()).strip())
    
    id_list = [] # [์ด๋™] ์ƒˆ๋กœ์šด page ๋งˆ๋‹ค ๋‹ค์‹œ ์ €์žฅํ•ด์•ผ ํ•จ
    for i in range(10):
        id_list.append(id_pre+str(i))    
    
    for id_num in id_list:    
        id_text = soup.find('span',{'id':id_num}).get_text()        
        data.append(str(id_text).strip()) # strip() ์ถ”๊ฐ€ : ๋ฌธ์žฅ๋ ๋ถˆ์šฉ์–ด ์ œ๊ฑฐ          

    # ์ ์ˆ˜์™€ ๋ฆฌ๋ทฐ ํ™•์ธ 
    for score, line in zip(score_list, data):
        print(score, line)

            
๊ธธ์ด ํ™•์ธ : ์ ์ˆ˜์™€ ๋ฆฌ๋ทฐ ๊ธธ์ด ๊ฐ™์•„์•ผ ํ•จ         

len(score_list) # 100 
len(data) # 100



3. ๋ฐ์ดํ„ฐ ์—‘์…€ ์ €์žฅ

data_df = pd.DataFrame({'score':score_list, 'review' : data}, columns=['score', 'review'])
data_df.info()

data_df.to_csv(r'๊ฒฝ๋กœ๋ช…\์˜ํ™”๋ช…_naver_review.csv', sep=',', 
            na_rep='NaN', encoding='utf-8-sig', index=False)

 

[error] 'list' object has no attribute 'to_csv'
[ํ•ด๊ฒฐ] listDataFrame์„ DateFrame์œผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ to_csv์‚ฌ์šฉ

 

 

 

 

 

 

์ฐธ๊ณ 
https://cossmos.tistory.com/m/37
https://l0o02.github.io/2018/06/14/python-crawling-pagination-1/
https://haystar.tistory.com/10#toc111
https://kimdingko-world.tistory.com/77

+ Recent posts