开发手册 欢迎您!
软件开发者资料库

Python3 BeautifulSoup安装及爬取网站网页示例代码

本文主要介绍Python3中,BeautifulSoup的安装步骤方法及爬取网站网页的相关的示例代码。

1、BeautifulSoup的安装

相关文档安装beautifulsoup4到Python3的方法(系统中默认使用的是Python2.7)

2、使用BeautifulSoup爬取网站网页示例代码

import bs4import requestsresponse = requests.get("https://en.wikipedia.org/wiki/Mathematics")if response is not None:    html = bs4.BeautifulSoup(response.text, 'html.parser')    title = html.select("#firstHeading")[0].text    paragraphs = html.select("p")    for para in paragraphs:        print (para.text)    # just grab the text up to contents as stated in question    intro = '\n'.join([ para.text for para in paragraphs[0:5]])    print (intro)

import requestsfrom bs4 import BeautifulSouppage = requests.get('https://web.archive.org/web/20121007172955/https://www.nga.gov/collection/anZ1.htm')soup = BeautifulSoup(page.text, 'html.parser')last_links = soup.find(class_='AlphaNav')last_links.decompose()artist_name_list = soup.find(class_='BodyText')artist_name_list_items = artist_name_list.find_all('a')# Use .contents to pull out the  tag’s childrenfor artist_name in artist_name_list_items:    names = artist_name.contents[0]    print(names)