Justin's 學習日誌

2月 06, 2020

爬蟲實作PTT-NBA版

Ptt 實戰： import requests from bs4 import BeautifulSoup import time today = time.strftime ( '%m/%d' ) .lstrip ( '0' ) #m 代表月份， d 代表日期，但這個月份的回傳值會有０，但是 ptt 板上的月份是沒有０的， # 他會將字串左邊的文字給移除，輸入的文字是要移除的內容 print ( today ) def pttNBA ( url ) : resp = requests.get ( url ) if resp.status_code != 200: print ( 'URL 發生錯誤： '+ url ) return 200 的意思：如何查看我們是否有成功取得網頁的資訊我們可以印出resp.status_code 取得網頁的狀態碼，來得知此網頁是否有成功收到請求，並且是否為正常狀態。常見的狀態碼：200表示正常、404表示找不到網頁等…可見HTTP狀態碼。 soup = BeautifulSoup ( resp.text , 'html5lib' ) # 將網頁的內容傳給 beautifulSoup 解析 paging = soup.find ( 'div' , 'btn-group btn-group-paging' ) .find_all ( 'a' )[ 1 ] # 取得網頁元素的第一步是取得上一頁的連結，因為上頁是第二個，所以要加索引值[ 1 ]，再加上[]可以取得 href 超連結文字 articles = [] rents = soup.find_all ( 'div' , 'r-ent' ) for rent in rents: titl...