[웹 크롤링] 멜론 노래 순위 정보 크롤링

데이터 분석/Python 2024. 7. 6. 21:40

구글 크롬 드라이버 설치

https://googlechromelabs.github.io/chrome-for-testing/

Chrome for Testing availability

chrome-headless-shellmac-arm64https://storage.googleapis.com/chrome-for-testing-public/126.0.6478.126/mac-arm64/chrome-headless-shell-mac-arm64.zip200

googlechromelabs.github.io

크롬 드라이버 활용하기

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options

# Chromedriver 경로 설정
chromedriver_path = '.\chromedriver.exe'

# Service 객체 생성
service = Service(chromedriver_path)

# Options 객체 생성
options = Options()

# WebDriver 객체 생성
driver = webdriver.Chrome(service=service, options=options)

# URL 접속하기
url = 'https://www.naver.com/'
driver.get(url)

selenium의 webdriver는 크롬이나 인터넷 익스플로러 등에서 사이트 접속, 버튼 클릭, 글자 입력과 같이 웹 브라우저에서 사람이 할 수 있는 일들을 코드를 통해 제어할 수 있는 라이브러리

HTML 정보 찾기

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')
html = driver.page_source

태그 속성 활용

# 태그명으로 태그 찾기
tags_span = soup.select('span') # 모든 <span> 태그 선택
tags_p = soup.select('p')       # 모든 p 태그 선택

# id로 태그 찾기
ids_footer = soup.select('#footer') # id가 'footer'인 태그 선택
class_shortcut_area = soup.select('.shortcut_area') # class가 'shortcut_area'인 태그 선택
tags_span_class = soup.select('span.kwd_dsc') # span 태그 중 class가 'kwd_dsc'인 태그 선택

상위 구조 활용

html = '''
<html>
    <head>
    </head>
    <body>
        <h1> 우리동네시장</h1>
            <div class = 'sale'>
                <p id='fruits1' class='fruits'>
                    <span class = 'name'> 바나나 </span>
                    <span class = 'price'> 3000원 </span>
                    <span class = 'inventory'> 500개 </span>
                    <span class = 'store'> 가나다상회 </span>
                    <a href = 'http://bit.ly/forPlaywithData' > 홈페이지 </a>
                </p>
            </div>
            <div class = 'prepare'>
                <p id='fruits2' class='fruits'>
                    <span class ='name'> 파인애플 </span>
                </p>
            </div>
    </body>
</html>
'''

# 태그 구조로 위치 찾기
tags_name = soup.select('span.name')
print(tags_name)
# [<span class="name"> 바나나 </span>, <span class="name"> 파인애플 </span>]

tags_names = soup.select('#fruits1 > span.name')
print(tags_names)
# [<span class="name"> 바나나 </span>]

tags_banana2 = soup.select('div.sale > #fruits1 > span.name')
tags_banana3 = soup.select('div.sale span.name')
print(tags_banana2)
print(tags_banana3)

# [<span class="name"> 바나나 </span>]
# [<span class="name"> 바나나 </span>]

정보 가져오기

태그 그룹에서 하나의 태그 선택하기

# 태그 그룹에서 하나의 태그만 선택하기 
tags = soup.select('span.name')
tag_1 = tags[0] # 인덱스 번호로 하나의 태그 지정하기
print(tag_1)
# <span class="name"> 바나나 </span>

# 반복문으로 태그 하나씩 선택하기
tags = soup.select('span.name')
for tag in tags:
	print(tag)
# <span class="name"> 바나나 </span>
# <span class="name"> 파인애플 </span>

멜론 노래 순위 정보 크롤링

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup

# Chromedriver 경로 설정
chromedriver_path = '.\chromedriver.exe'

# Service 객체 생성
service = Service(chromedriver_path)

# Options 객체 생성
options = Options()

# WebDriver 객체 생성
driver = webdriver.Chrome(service=service, options=options)

url = 'http://www.melon.com/chart/index.htm'
driver.get(url)

html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')

# 1위 노래 곡 제목 가져오기
songs = soup.select('tr')[1:]
song = songs[0]
title = song.select('div.ellipsis.rank01 > span > a')[0].text
title
# 'Supernova'

# 1위 가수 정보 가져오기
singer = song.select('div.ellipsis.rank02 > a')[0].text
singer
# aespa

멜론 50위 노래 순위 정보 가져오기 - BeautifulSoup

for song in songs:                                        
    title = song.select('div.ellipsis.rank01 > span > a')[0].text
    singer = song.select ('div.ellipsis.rank02 > a')[0].text  
    print(title, singer, sep = ' | ')

멜론 50위 노래 순위 정보 가져오기 - Selenium

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

# Chromedriver 경로 설정
chromedriver_path = '.\chromedriver.exe'

# Service 객체 생성
service = Service(chromedriver_path)

# Options 객체 생성
options = Options()

# WebDriver 객체 생성
driver = webdriver.Chrome(service=service, options=options)

url = 'http://www.melon.com/chart/index.htm'
driver.get(url)          
                                
# CSS 선택자를 사용하여 요소 찾기
songs = driver.find_elements(By.CSS_SELECTOR, 'tr')[1:]
for song in songs:
    title = song.find_element(By.CSS_SELECTOR, 'div.ellipsis.rank01 > span > a').text
    print(title)

# 드라이버를 종료
driver.quit()

멜론 크롤링 결과를 엑셀로 저장하기

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import pandas as pd

# Chromedriver 경로 설정
chromedriver_path = '\chromedriver.exe'

# Service 객체 생성
service = Service(chromedriver_path)

# Options 객체 생성
options = Options()

# WebDriver 객체 생성
driver = webdriver.Chrome(service=service, options=options)

# URL 접속
url = 'http://www.melon.com/chart/index.htm'
driver.get(url)     

html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')

song_data = []
rank = 1

songs = soup.select('table > tbody > tr')
for song in songs:                                        
    title = song.select('div.rank01 > span > a')[0].text
    singer = song.select('div.rank02 > a')[0].text
    song_data.append(['Melon', rank, title, singer])
    rank = rank + 1
    
columns = ['서비스', '순위', '타이틀', '가수']
pd_data = pd.DataFrame(song_data, columns = columns)
pd_data.head()

# pd_data.to_excel('./files/melon.xlsx', index=False)

index=False를 설정하면, Excel 파일로 저장할 때 DataFrame의 인덱스 열이 파일에 포함되지 않는다.

'데이터 분석 > Python' 카테고리의 다른 글

[웹 크롤링] 유튜브 랭킹 데이터 수집과 시각화 (0)	2024.07.06
[pandas] 기초 (0)	2024.07.06

ABOUT ME

letsfuture letsfuture

구글 크롬 드라이버 설치

크롬 드라이버 활용하기

HTML 정보 찾기

태그 속성 활용

상위 구조 활용

정보 가져오기

태그 그룹에서 하나의 태그 선택하기

멜론 노래 순위 정보 크롤링

멜론 50위 노래 순위 정보 가져오기 - BeautifulSoup

멜론 50위 노래 순위 정보 가져오기 - Selenium

멜론 크롤링 결과를 엑셀로 저장하기

'데이터 분석 > Python' 카테고리의 다른 글

티스토리툴바

ABOUT ME

구글 크롬 드라이버 설치

크롬 드라이버 활용하기

HTML 정보 찾기

태그 속성 활용

상위 구조 활용

정보 가져오기

태그 그룹에서 하나의 태그 선택하기

멜론 노래 순위 정보 크롤링

멜론 50위 노래 순위 정보 가져오기 - BeautifulSoup

멜론 50위 노래 순위 정보 가져오기 - Selenium

멜론 크롤링 결과를 엑셀로 저장하기

'데이터 분석 > Python' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바