Html->tìm

Beautiful Soup là một thư viện Python để lấy dữ liệu ra khỏi các tệp HTML và XML. Nó hoạt động với trình phân tích cú pháp yêu thích của bạn để cung cấp các cách điều hướng, tìm kiếm và sửa đổi cây phân tích thành ngữ. Nó thường tiết kiệm cho lập trình viên hàng giờ hoặc ngày làm việc

Show

Các hướng dẫn này minh họa tất cả các tính năng chính của Beautiful Soup 4, với các ví dụ. Tôi chỉ cho bạn biết thư viện tốt cho việc gì, cách nó hoạt động, cách sử dụng nó, cách khiến nó làm những gì bạn muốn và phải làm gì khi nó vi phạm mong đợi của bạn

Tài liệu này bao gồm Beautiful Soup phiên bản 4. 11. 0. Các ví dụ trong tài liệu này được viết cho Python 3. 8

Có thể bạn đang tìm tài liệu về Beautiful Soup 3. Nếu vậy, bạn nên biết rằng Beautiful Soup 3 không còn được phát triển nữa và mọi hỗ trợ dành cho nó đã bị hủy bỏ vào ngày 31 tháng 12 năm 2020. Nếu bạn muốn tìm hiểu về sự khác biệt giữa Beautiful Soup 3 và Beautiful Soup 4, hãy xem Chuyển mã sang BS4

Tài liệu này đã được người dùng Beautiful Soup dịch sang các ngôn ngữ khác

  • 这篇文档当然还有中文版

  • このページは日本語で利用できます(外部リンク)

  • 이 문서는 한국어 번역도 가능합니다

  • Tài liệu này cũng được phát hành tại Bồ Đào Nha ở Brasil

  • Эта документация доступна на русском языке

Tìm sự giúp đỡ¶

If you have questions about Beautiful Soup, or run into problems, send mail to the discussion group. Nếu vấn đề của bạn liên quan đến việc phân tích cú pháp tài liệu HTML, hãy nhớ đề cập đến hàm chẩn đoán() nói gì về tài liệu đó.

Bắt đầu nhanh¶

Đây là một tài liệu HTML mà tôi sẽ sử dụng làm ví dụ xuyên suốt tài liệu này. Đó là một phần của câu chuyện từ Alice in Wonderland

html_doc = """<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""

Chạy tài liệu “ba chị em” thông qua Beautiful Soup cho chúng ta một đối tượng

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
06, đại diện cho tài liệu dưới dạng cấu trúc dữ liệu lồng nhau

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.prettify())
# <html>
#  <head>
#   <title>
#    The Dormouse's story
#   </title>
#  </head>
#  <body>
#   <p class="title">
#    <b>
#     The Dormouse's story
#    </b>
#   </p>
#   <p class="story">
#    Once upon a time there were three little sisters; and their names were
#    <a class="sister" href="http://example.com/elsie" id="link1">
#     Elsie
#    </a>
#    ,
#    <a class="sister" href="http://example.com/lacie" id="link2">
#     Lacie
#    </a>
#    and
#    <a class="sister" href="http://example.com/tillie" id="link3">
#     Tillie
#    </a>
#    ; and they lived at the bottom of a well.
#   </p>
#   <p class="story">
#    ...
#   </p>
#  </body>
# </html>

Dưới đây là một số cách đơn giản để điều hướng cấu trúc dữ liệu đó

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>

One common task is extracting all the URLs found within a page’s tags:

for link in soup.find_all('a'):
    print(link.get('href'))
# http://example.com/elsie
# http://example.com/lacie
# http://example.com/tillie

Một nhiệm vụ phổ biến khác là trích xuất tất cả văn bản từ một trang

print(soup.get_text())
# The Dormouse's story
#
# The Dormouse's story
#
# Once upon a time there were three little sisters; and their names were
# Elsie,
# Lacie and
# Tillie;
# and they lived at the bottom of a well.
#
# ...

Does this look like what you need? If so, read on

Cài đặt Beautiful Soup¶

Nếu bạn đang sử dụng phiên bản Debian hoặc Ubuntu Linux gần đây, bạn có thể cài đặt Beautiful Soup với trình quản lý gói hệ thống

$ apt-get install python3-bs4

Beautiful Soup 4 được xuất bản thông qua PyPi, vì vậy nếu bạn không thể cài đặt nó bằng trình đóng gói hệ thống, bạn có thể cài đặt nó bằng

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
07 hoặc
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
08. Tên gói là
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
09. Đảm bảo rằng bạn sử dụng đúng phiên bản
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
08 hoặc
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
07 cho phiên bản Python của mình (các phiên bản này có thể được đặt tên lần lượt là
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
12 và
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
13)

$ easy_install beautifulsoup4

$ pip cài đặt beautifulsoup4

(Gói

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
06 không phải là thứ bạn muốn. Đó là bản phát hành chính trước đó, Beautiful Soup 3. Rất nhiều phần mềm sử dụng BS3, vì vậy nó vẫn có sẵn, nhưng nếu bạn đang viết mã mới, bạn nên cài đặt
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
09. )

Nếu bạn chưa cài đặt

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
07 hoặc
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
08, bạn có thể tải xuống tarball nguồn Beautiful Soup 4 và cài đặt nó với
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
18

thiết lập $ trăn. cài đặt py

Nếu vẫn thất bại, giấy phép của Beautiful Soup cho phép bạn đóng gói toàn bộ thư viện với ứng dụng của mình. You can download the tarball, copy its

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
19 directory into your application’s codebase, and use Beautiful Soup without installing it at all

Tôi sử dụng Python3. 8 để phát triển Beautiful Soup, nhưng nó sẽ hoạt động với các phiên bản gần đây khác

Cài đặt trình phân tích cú pháp¶

Beautiful Soup hỗ trợ trình phân tích cú pháp HTML có trong thư viện chuẩn của Python, nhưng nó cũng hỗ trợ một số trình phân tích cú pháp Python của bên thứ ba. Một là trình phân tích cú pháp lxml. Tùy thuộc vào thiết lập của bạn, bạn có thể cài đặt lxml bằng một trong các lệnh sau

$ apt-get cài đặt python-lxml

$ easy_install lxml

$ pip cài đặt lxml

Another alternative is the pure-Python html5lib parser, which parses HTML the way a web browser does. Tùy thuộc vào thiết lập của bạn, bạn có thể cài đặt html5lib bằng một trong các lệnh sau

$ apt-get cài đặt python-html5lib

$ easy_install html5lib

$ pip cài đặt html5lib

Bảng này tóm tắt những ưu điểm và nhược điểm của từng thư viện trình phân tích cú pháp

Trình phân tích cú pháp

sử dụng điển hình

Thuận lợi

Nhược điểm

HTML của Python. trình phân tích cú pháp

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
20

  • bao gồm pin

  • Tốc độ khá

  • Khoan dung (Kể từ Python 3. 2)

  • Không nhanh bằng lxml, kém nhẹ nhàng hơn html5lib

trình phân tích cú pháp HTML của lxml

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
21

  • Rất nhanh

  • khoan dung

  • Phụ thuộc C bên ngoài

trình phân tích cú pháp XML của lxml

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
22
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
23

  • Rất nhanh

  • Trình phân tích cú pháp XML duy nhất hiện được hỗ trợ

  • Phụ thuộc C bên ngoài

html5lib

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
24

  • cực kỳ khoan dung

  • Parses pages the same way a web browser does

  • Tạo HTML5 hợp lệ

  • Rất chậm

  • Phụ thuộc Python bên ngoài

Nếu có thể, tôi khuyên bạn nên cài đặt và sử dụng lxml để tăng tốc. Nếu bạn đang sử dụng phiên bản Python rất cũ – sớm hơn 3. 2. 2 – it’s essential that you install lxml or html5lib. Trình phân tích cú pháp HTML tích hợp của Python không tốt lắm trong các phiên bản cũ đó

Lưu ý rằng nếu một tài liệu không hợp lệ, các trình phân tích cú pháp khác nhau sẽ tạo ra các cây Súp đẹp khác nhau cho tài liệu đó. Xem Sự khác biệt giữa các trình phân tích cú pháp để biết chi tiết

Nấu súp¶

Để phân tích cú pháp một tài liệu, hãy chuyển nó vào hàm tạo

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
06. Bạn có thể truyền vào một chuỗi hoặc một xử lý tệp đang mở

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.prettify())
# <html>
#  <head>
#   <title>
#    The Dormouse's story
#   </title>
#  </head>
#  <body>
#   <p class="title">
#    <b>
#     The Dormouse's story
#    </b>
#   </p>
#   <p class="story">
#    Once upon a time there were three little sisters; and their names were
#    <a class="sister" href="http://example.com/elsie" id="link1">
#     Elsie
#    </a>
#    ,
#    <a class="sister" href="http://example.com/lacie" id="link2">
#     Lacie
#    </a>
#    and
#    <a class="sister" href="http://example.com/tillie" id="link3">
#     Tillie
#    </a>
#    ; and they lived at the bottom of a well.
#   </p>
#   <p class="story">
#    ...
#   </p>
#  </body>
# </html>
5

Đầu tiên, tài liệu được chuyển đổi thành Unicode và các thực thể HTML được chuyển đổi thành các ký tự Unicode

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.prettify())
# <html>
#  <head>
#   <title>
#    The Dormouse's story
#   </title>
#  </head>
#  <body>
#   <p class="title">
#    <b>
#     The Dormouse's story
#    </b>
#   </p>
#   <p class="story">
#    Once upon a time there were three little sisters; and their names were
#    <a class="sister" href="http://example.com/elsie" id="link1">
#     Elsie
#    </a>
#    ,
#    <a class="sister" href="http://example.com/lacie" id="link2">
#     Lacie
#    </a>
#    and
#    <a class="sister" href="http://example.com/tillie" id="link3">
#     Tillie
#    </a>
#    ; and they lived at the bottom of a well.
#   </p>
#   <p class="story">
#    ...
#   </p>
#  </body>
# </html>
6

Beautiful Soup sau đó phân tích cú pháp tài liệu bằng trình phân tích cú pháp tốt nhất hiện có. Nó sẽ sử dụng trình phân tích cú pháp HTML trừ khi bạn đặc biệt yêu cầu nó sử dụng trình phân tích cú pháp XML. (Xem Phân tích cú pháp XML. )

Các loại đối tượng¶

Beautiful Soup biến một tài liệu HTML phức tạp thành một cây các đối tượng Python phức tạp. Nhưng bạn sẽ chỉ phải xử lý khoảng bốn loại đối tượng.

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
26,
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
27,
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
06 và
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
29

soup.title # <title>The Dormouse's story</title> soup.title.name # u'title' soup.title.string # u'The Dormouse's story' soup.title.parent.name # u'head' soup.p # <p class="title"><b>The Dormouse's story</b></p> soup.p['class'] # u'title' soup.a # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> soup.find_all('a') # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] soup.find(id="link3") # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> 26¶

Đối tượng

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
26 tương ứng với thẻ XML hoặc HTML trong tài liệu gốc

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
3

Thẻ có rất nhiều thuộc tính và phương thức, và tôi sẽ trình bày hầu hết chúng trong Điều hướng cây và Tìm kiếm cây. Hiện tại, các tính năng quan trọng nhất của thẻ là tên và thuộc tính của nó

Tên¶

Mỗi thẻ có một tên, có thể truy cập dưới dạng

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
32

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
5

Nếu bạn thay đổi tên của thẻ, thay đổi đó sẽ được phản ánh trong bất kỳ đánh dấu HTML nào do Beautiful Soup tạo ra

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
6

Thuộc tính¶

Một thẻ có thể có bất kỳ số thuộc tính nào. Thẻ

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
33 có thuộc tính “id” có giá trị là “đậm nhất”. Bạn có thể truy cập các thuộc tính của thẻ bằng cách coi thẻ như một cuốn từ điển

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.prettify())
# <html>
#  <head>
#   <title>
#    The Dormouse's story
#   </title>
#  </head>
#  <body>
#   <p class="title">
#    <b>
#     The Dormouse's story
#    </b>
#   </p>
#   <p class="story">
#    Once upon a time there were three little sisters; and their names were
#    <a class="sister" href="http://example.com/elsie" id="link1">
#     Elsie
#    </a>
#    ,
#    <a class="sister" href="http://example.com/lacie" id="link2">
#     Lacie
#    </a>
#    and
#    <a class="sister" href="http://example.com/tillie" id="link3">
#     Tillie
#    </a>
#    ; and they lived at the bottom of a well.
#   </p>
#   <p class="story">
#    ...
#   </p>
#  </body>
# </html>
0

Bạn có thể truy cập trực tiếp từ điển đó dưới dạng

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
34

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.prettify())
# <html>
#  <head>
#   <title>
#    The Dormouse's story
#   </title>
#  </head>
#  <body>
#   <p class="title">
#    <b>
#     The Dormouse's story
#    </b>
#   </p>
#   <p class="story">
#    Once upon a time there were three little sisters; and their names were
#    <a class="sister" href="http://example.com/elsie" id="link1">
#     Elsie
#    </a>
#    ,
#    <a class="sister" href="http://example.com/lacie" id="link2">
#     Lacie
#    </a>
#    and
#    <a class="sister" href="http://example.com/tillie" id="link3">
#     Tillie
#    </a>
#    ; and they lived at the bottom of a well.
#   </p>
#   <p class="story">
#    ...
#   </p>
#  </body>
# </html>
1

Bạn có thể thêm, xóa và sửa đổi thuộc tính của thẻ. Một lần nữa, điều này được thực hiện bằng cách coi thẻ như một từ điển

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.prettify())
# <html>
#  <head>
#   <title>
#    The Dormouse's story
#   </title>
#  </head>
#  <body>
#   <p class="title">
#    <b>
#     The Dormouse's story
#    </b>
#   </p>
#   <p class="story">
#    Once upon a time there were three little sisters; and their names were
#    <a class="sister" href="http://example.com/elsie" id="link1">
#     Elsie
#    </a>
#    ,
#    <a class="sister" href="http://example.com/lacie" id="link2">
#     Lacie
#    </a>
#    and
#    <a class="sister" href="http://example.com/tillie" id="link3">
#     Tillie
#    </a>
#    ; and they lived at the bottom of a well.
#   </p>
#   <p class="story">
#    ...
#   </p>
#  </body>
# </html>
2

Thuộc tính đa giá trị¶

HTML 4 định nghĩa một vài thuộc tính có thể có nhiều giá trị. HTML 5 loại bỏ một vài trong số chúng, nhưng định nghĩa thêm một số. The most common multi-valued attribute is

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
35 (that is, a tag can have more than one CSS class). Những người khác bao gồm
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
36,
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
37,
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
38,
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
39 và
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
40. Beautiful Soup trình bày (các) giá trị của thuộc tính đa giá trị dưới dạng danh sách

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.prettify())
# <html>
#  <head>
#   <title>
#    The Dormouse's story
#   </title>
#  </head>
#  <body>
#   <p class="title">
#    <b>
#     The Dormouse's story
#    </b>
#   </p>
#   <p class="story">
#    Once upon a time there were three little sisters; and their names were
#    <a class="sister" href="http://example.com/elsie" id="link1">
#     Elsie
#    </a>
#    ,
#    <a class="sister" href="http://example.com/lacie" id="link2">
#     Lacie
#    </a>
#    and
#    <a class="sister" href="http://example.com/tillie" id="link3">
#     Tillie
#    </a>
#    ; and they lived at the bottom of a well.
#   </p>
#   <p class="story">
#    ...
#   </p>
#  </body>
# </html>
3

Nếu một thuộc tính có vẻ như có nhiều hơn một giá trị, nhưng đó không phải là thuộc tính đa giá trị như được định nghĩa bởi bất kỳ phiên bản nào của tiêu chuẩn HTML, thì Beautiful Soup sẽ để nguyên thuộc tính đó

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.prettify())
# <html>
#  <head>
#   <title>
#    The Dormouse's story
#   </title>
#  </head>
#  <body>
#   <p class="title">
#    <b>
#     The Dormouse's story
#    </b>
#   </p>
#   <p class="story">
#    Once upon a time there were three little sisters; and their names were
#    <a class="sister" href="http://example.com/elsie" id="link1">
#     Elsie
#    </a>
#    ,
#    <a class="sister" href="http://example.com/lacie" id="link2">
#     Lacie
#    </a>
#    and
#    <a class="sister" href="http://example.com/tillie" id="link3">
#     Tillie
#    </a>
#    ; and they lived at the bottom of a well.
#   </p>
#   <p class="story">
#    ...
#   </p>
#  </body>
# </html>
4

Khi bạn biến thẻ trở lại thành chuỗi, nhiều giá trị thuộc tính được hợp nhất

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.prettify())
# <html>
#  <head>
#   <title>
#    The Dormouse's story
#   </title>
#  </head>
#  <body>
#   <p class="title">
#    <b>
#     The Dormouse's story
#    </b>
#   </p>
#   <p class="story">
#    Once upon a time there were three little sisters; and their names were
#    <a class="sister" href="http://example.com/elsie" id="link1">
#     Elsie
#    </a>
#    ,
#    <a class="sister" href="http://example.com/lacie" id="link2">
#     Lacie
#    </a>
#    and
#    <a class="sister" href="http://example.com/tillie" id="link3">
#     Tillie
#    </a>
#    ; and they lived at the bottom of a well.
#   </p>
#   <p class="story">
#    ...
#   </p>
#  </body>
# </html>
5

Bạn có thể vô hiệu hóa điều này bằng cách chuyển

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
41 làm đối số từ khóa vào hàm tạo
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
06

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.prettify())
# <html>
#  <head>
#   <title>
#    The Dormouse's story
#   </title>
#  </head>
#  <body>
#   <p class="title">
#    <b>
#     The Dormouse's story
#    </b>
#   </p>
#   <p class="story">
#    Once upon a time there were three little sisters; and their names were
#    <a class="sister" href="http://example.com/elsie" id="link1">
#     Elsie
#    </a>
#    ,
#    <a class="sister" href="http://example.com/lacie" id="link2">
#     Lacie
#    </a>
#    and
#    <a class="sister" href="http://example.com/tillie" id="link3">
#     Tillie
#    </a>
#    ; and they lived at the bottom of a well.
#   </p>
#   <p class="story">
#    ...
#   </p>
#  </body>
# </html>
6

Bạn có thể sử dụng

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
43 để nhận giá trị luôn là danh sách, cho dù đó có phải là thuộc tính đa giá trị hay không

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.prettify())
# <html>
#  <head>
#   <title>
#    The Dormouse's story
#   </title>
#  </head>
#  <body>
#   <p class="title">
#    <b>
#     The Dormouse's story
#    </b>
#   </p>
#   <p class="story">
#    Once upon a time there were three little sisters; and their names were
#    <a class="sister" href="http://example.com/elsie" id="link1">
#     Elsie
#    </a>
#    ,
#    <a class="sister" href="http://example.com/lacie" id="link2">
#     Lacie
#    </a>
#    and
#    <a class="sister" href="http://example.com/tillie" id="link3">
#     Tillie
#    </a>
#    ; and they lived at the bottom of a well.
#   </p>
#   <p class="story">
#    ...
#   </p>
#  </body>
# </html>
7

Nếu bạn phân tích một tài liệu dưới dạng XML, sẽ không có thuộc tính đa giá trị nào

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.prettify())
# <html>
#  <head>
#   <title>
#    The Dormouse's story
#   </title>
#  </head>
#  <body>
#   <p class="title">
#    <b>
#     The Dormouse's story
#    </b>
#   </p>
#   <p class="story">
#    Once upon a time there were three little sisters; and their names were
#    <a class="sister" href="http://example.com/elsie" id="link1">
#     Elsie
#    </a>
#    ,
#    <a class="sister" href="http://example.com/lacie" id="link2">
#     Lacie
#    </a>
#    and
#    <a class="sister" href="http://example.com/tillie" id="link3">
#     Tillie
#    </a>
#    ; and they lived at the bottom of a well.
#   </p>
#   <p class="story">
#    ...
#   </p>
#  </body>
# </html>
8

Một lần nữa, bạn có thể định cấu hình điều này bằng cách sử dụng đối số

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
44

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.prettify())
# <html>
#  <head>
#   <title>
#    The Dormouse's story
#   </title>
#  </head>
#  <body>
#   <p class="title">
#    <b>
#     The Dormouse's story
#    </b>
#   </p>
#   <p class="story">
#    Once upon a time there were three little sisters; and their names were
#    <a class="sister" href="http://example.com/elsie" id="link1">
#     Elsie
#    </a>
#    ,
#    <a class="sister" href="http://example.com/lacie" id="link2">
#     Lacie
#    </a>
#    and
#    <a class="sister" href="http://example.com/tillie" id="link3">
#     Tillie
#    </a>
#    ; and they lived at the bottom of a well.
#   </p>
#   <p class="story">
#    ...
#   </p>
#  </body>
# </html>
9

Có thể bạn sẽ không cần làm điều này, nhưng nếu có, hãy sử dụng các giá trị mặc định làm hướng dẫn. Họ thực hiện các quy tắc được mô tả trong đặc tả HTML

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
0

soup.title # <title>The Dormouse's story</title> soup.title.name # u'title' soup.title.string # u'The Dormouse's story' soup.title.parent.name # u'head' soup.p # <p class="title"><b>The Dormouse's story</b></p> soup.p['class'] # u'title' soup.a # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> soup.find_all('a') # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] soup.find(id="link3") # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> 27¶

Một chuỗi tương ứng với một đoạn văn bản trong thẻ. Beautiful Soup sử dụng lớp

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
27 để chứa các đoạn văn bản này

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
1

Một

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
27 giống như một chuỗi Python Unicode, ngoại trừ việc nó cũng hỗ trợ một số tính năng được mô tả trong Điều hướng cây và Tìm kiếm cây. Bạn có thể chuyển đổi một
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
27 thành một chuỗi Unicode với
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
49

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
2

Bạn không thể chỉnh sửa chuỗi tại chỗ nhưng bạn có thể thay thế chuỗi này bằng chuỗi khác bằng cách sử dụng replace_with() .

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
3

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
27 hỗ trợ hầu hết các tính năng được mô tả trong Điều hướng cây và Tìm kiếm cây, nhưng không phải tất cả chúng. Cụ thể, vì một chuỗi không thể chứa bất kỳ thứ gì (theo cách một thẻ có thể chứa một chuỗi hoặc một thẻ khác), các chuỗi không hỗ trợ các thuộc tính
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
51 hoặc
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
52 hoặc phương thức
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
53

Nếu bạn muốn sử dụng một

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
27 bên ngoài Beautiful Soup, bạn nên gọi
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
55 trên nó để biến nó thành một chuỗi Python Unicode bình thường. Nếu không, chuỗi của bạn sẽ mang một tham chiếu đến toàn bộ cây phân tích Beautiful Soup, ngay cả khi bạn đã sử dụng xong Beautiful Soup. Đây là một sự lãng phí bộ nhớ lớn

soup.title # <title>The Dormouse's story</title> soup.title.name # u'title' soup.title.string # u'The Dormouse's story' soup.title.parent.name # u'head' soup.p # <p class="title"><b>The Dormouse's story</b></p> soup.p['class'] # u'title' soup.a # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> soup.find_all('a') # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] soup.find(id="link3") # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> 06¶

Đối tượng

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
06 đại diện cho toàn bộ tài liệu được phân tích cú pháp. Đối với hầu hết các mục đích, bạn có thể coi nó như một đối tượng Thẻ . Điều này có nghĩa là nó hỗ trợ hầu hết các phương pháp được mô tả trong Điều hướng cây và Tìm kiếm cây.

Bạn cũng có thể chuyển đối tượng

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
06 vào một trong các phương thức được xác định trong phần Sửa đổi cây, giống như cách bạn thực hiện với Thẻ . Điều này cho phép bạn làm những việc như kết hợp hai tài liệu được phân tích cú pháp.

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
4

Vì đối tượng

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
06 không tương ứng với thẻ HTML hoặc XML thực tế nên nó không có tên và không có thuộc tính. Nhưng đôi khi nó rất hữu ích để xem
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
32 của nó, vì vậy nó đã được đặt tên là “[tài liệu]”
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
32 đặc biệt

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
5

Nhận xét và các chuỗi đặc biệt khác¶

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
26,
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
27 và
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
06 bao gồm hầu hết mọi thứ bạn sẽ thấy trong tệp HTML hoặc XML, nhưng có một số bit còn sót lại. Cái chính mà bạn có thể gặp phải là bình luận

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
6

Đối tượng

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
29 chỉ là một loại đặc biệt của
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
27

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
7

Nhưng khi nó xuất hiện như một phần của tài liệu HTML, một

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
29 được hiển thị với định dạng đặc biệt

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
8

Beautiful Soup cũng định nghĩa các lớp có tên là

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
68,
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
69 và
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
70, dành cho biểu định kiểu CSS được nhúng (bất kỳ chuỗi nào được tìm thấy bên trong thẻ
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
71), Javascript được nhúng (bất kỳ chuỗi nào được tìm thấy trong thẻ
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
72) và mẫu HTML (bất kỳ chuỗi nào bên trong thẻ
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
73). Các lớp này hoạt động giống hệt như
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
27; . (Các lớp này mới trong Beautiful Soup 4. 9. 0 và trình phân tích cú pháp html5lib không sử dụng chúng. )

Beautiful Soup định nghĩa các lớp cho bất kỳ thứ gì khác có thể hiển thị trong tài liệu XML.

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
75,
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
76,
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
77 và
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
78. Giống như
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
29, các lớp này là các lớp con của
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
27 bổ sung thêm thứ gì đó vào chuỗi. Đây là một ví dụ thay thế nhận xét bằng khối CDATA

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
9

Điều hướng cây¶

Đây lại là tài liệu HTML “Ba chị em”

for link in soup.find_all('a'):
    print(link.get('href'))
# http://example.com/elsie
# http://example.com/lacie
# http://example.com/tillie
0

Tôi sẽ lấy phần này làm ví dụ để chỉ cho bạn cách di chuyển từ phần này sang phần khác của tài liệu

Đi xuống¶

Thẻ có thể chứa chuỗi và các thẻ khác. Các phần tử này là phần tử con của thẻ. Beautiful Soup cung cấp rất nhiều thuộc tính khác nhau để điều hướng và lặp qua phần con của thẻ

Lưu ý rằng các chuỗi Beautiful Soup không hỗ trợ bất kỳ thuộc tính nào trong số này, bởi vì một chuỗi không thể có con

Điều hướng bằng cách sử dụng tên thẻ¶

The simplest way to navigate the parse tree is to say the name of the tag you want. If you want the tag, just say

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
81:

for link in soup.find_all('a'):
    print(link.get('href'))
# http://example.com/elsie
# http://example.com/lacie
# http://example.com/tillie
1

You can do use this trick again and again to zoom in on a certain part of the parse tree. This code gets the first tag beneath the tag:

for link in soup.find_all('a'):
    print(link.get('href'))
# http://example.com/elsie
# http://example.com/lacie
# http://example.com/tillie
2

Sử dụng tên thẻ làm thuộc tính sẽ chỉ cung cấp cho bạn thẻ đầu tiên theo tên đó

for link in soup.find_all('a'):
    print(link.get('href'))
# http://example.com/elsie
# http://example.com/lacie
# http://example.com/tillie
3

If you need to get all the tags, or anything more complicated than the first tag with a certain name, you’ll need to use one of the methods described in Searching the tree, such as find_all():

for link in soup.find_all('a'):
    print(link.get('href'))
# http://example.com/elsie
# http://example.com/lacie
# http://example.com/tillie
4

soup.title # <title>The Dormouse's story</title> soup.title.name # u'title' soup.title.string # u'The Dormouse's story' soup.title.parent.name # u'head' soup.p # <p class="title"><b>The Dormouse's story</b></p> soup.p['class'] # u'title' soup.a # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> soup.find_all('a') # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] soup.find(id="link3") # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> 51 và soup.title # <title>The Dormouse's story</title> soup.title.name # u'title' soup.title.string # u'The Dormouse's story' soup.title.parent.name # u'head' soup.p # <p class="title"><b>The Dormouse's story</b></p> soup.p['class'] # u'title' soup.a # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> soup.find_all('a') # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] soup.find(id="link3") # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> 83¶

Phần tử con của thẻ có sẵn trong danh sách có tên là

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
51

for link in soup.find_all('a'):
    print(link.get('href'))
# http://example.com/elsie
# http://example.com/lacie
# http://example.com/tillie
5

The

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
06 object itself has children. In this case, the tag is the child of the
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
06 object.:

for link in soup.find_all('a'):
    print(link.get('href'))
# http://example.com/elsie
# http://example.com/lacie
# http://example.com/tillie
6

Một chuỗi không có

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
51, bởi vì nó không thể chứa bất cứ thứ gì

for link in soup.find_all('a'):
    print(link.get('href'))
# http://example.com/elsie
# http://example.com/lacie
# http://example.com/tillie
7

Thay vì lấy chúng dưới dạng danh sách, bạn có thể lặp lại các phần tử con của thẻ bằng cách sử dụng trình tạo

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
83

for link in soup.find_all('a'):
    print(link.get('href'))
# http://example.com/elsie
# http://example.com/lacie
# http://example.com/tillie
8

Nếu bạn muốn sửa đổi con của thẻ, hãy sử dụng các phương pháp được mô tả trong Sửa đổi cây. Không sửa đổi trực tiếp danh sách

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
51. có thể dẫn đến các vấn đề tinh tế và khó phát hiện

soup.title # <title>The Dormouse's story</title> soup.title.name # u'title' soup.title.string # u'The Dormouse's story' soup.title.parent.name # u'head' soup.p # <p class="title"><b>The Dormouse's story</b></p> soup.p['class'] # u'title' soup.a # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> soup.find_all('a') # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] soup.find(id="link3") # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> 90¶

The

soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
51 and
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

soup.title.parent.name
# u'head'

soup.p
# <p class="title"><b>The Dormouse's story</b></p>

soup.p['class']
# u'title'

soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
83 attributes only consider a tag’s direct children. For instance, the tag has a single direct child–the tag:</p><p><p><pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>9</p><p>But the <title> tag itself has a child: the string “The Dormouse’s story”. There’s a sense in which that string is also a child of the <head> tag. The <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>90 attribute lets you iterate over all of a tag’s children, recursively: its direct children, the children of its direct children, and so on:</p><p><p><pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>0</p><p>The <head> tag has only one child, but it has two descendants: the <title> tag and the <title> tag’s child. The <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 object only has one direct child (the <html> tag), but it has a whole lot of descendants:</p><p><p><pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>1</p><p><h3 id="soup-title-lt-title-gt-the-dormouse-s-story-lt-title-gt-soup-title-name-u-title-soup-title-string-u-the-dormouse-s-story-soup-title-parent-name-u-head-soup-p-lt-p-class-title-gt-lt-b-gt-the-dormouse-s-story-lt-b-gt-lt-p-gt-soup-p-class-u-title-soup-a-lt-a-class-sister-href-http-example-com-elsie-id-link1-gt-elsie-lt-a-gt-soup-find-all-a-lt-a-class-sister-href-http-example-com-elsie-id-link1-gt-elsie-lt-a-gt-lt-a-class-sister-href-http-example-com-lacie-id-link2-gt-lacie-lt-a-gt-lt-a-class-sister-href-http-example-com-tillie-id-link3-gt-tillie-lt-a-gt-soup-find-id-link3-lt-a-class-sister-href-http-example-com-tillie-id-link3-gt-tillie-lt-a-gt-52">soup.title # <title>The Dormouse's story</title> soup.title.name # u'title' soup.title.string # u'The Dormouse's story' soup.title.parent.name # u'head' soup.p # <p class="title"><b>The Dormouse's story</b></p> soup.p['class'] # u'title' soup.a # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> soup.find_all('a') # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] soup.find(id="link3") # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> 52¶</h3><p>Nếu thẻ chỉ có một phần tử con và phần tử con đó là một <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>27, thì phần tử con đó được cung cấp dưới dạng <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>52</p><p><p><pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>2</p><p>Nếu con duy nhất của một thẻ là một thẻ khác và thẻ đó có một <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>52, thì thẻ cha mẹ được coi là có cùng một <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>52 như con của nó</p><p><p><pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>3</p><p>Nếu một thẻ chứa nhiều hơn một thứ, thì không rõ <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>52 nên đề cập đến cái gì, vì vậy, <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>52 được định nghĩa là <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>02</p><p><p><pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>4</p><p><h3 id="for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-03-va-for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-04">for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 03 và for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 04¶</h3><p>Nếu có nhiều thứ bên trong một thẻ, bạn vẫn có thể chỉ xem các chuỗi. Sử dụng trình tạo <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>03</p><p><p><pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>5</p><p>Các chuỗi này có xu hướng có nhiều khoảng trắng thừa, bạn có thể loại bỏ khoảng trắng này bằng cách sử dụng trình tạo <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>06 để thay thế</p><p><p><pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>6</p><p>Ở đây, các chuỗi bao gồm toàn bộ khoảng trắng bị bỏ qua và khoảng trắng ở đầu và cuối chuỗi bị xóa</p><p><h2>Đi lên¶</h2><p>Tiếp tục phép loại suy “cây gia đình”, mọi thẻ và mọi chuỗi đều có cha. thẻ chứa nó</p><p><h3 id="for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-07">for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 07¶</h3><p>You can access an element’s parent with the <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>07 attribute. In the example “three sisters” document, the <head> tag is the parent of the <title> tag:</p><p><p><pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>7</p><p>The title string itself has a parent: the <title> tag that contains it:</p><p><p><pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>8</p><p>The parent of a top-level tag like <html> is the <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 object itself:</p><p><p><pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>9</p><p>Và <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>07 của một đối tượng <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 được định nghĩa là Không có</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>50</p><p><h3 id="for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-12">for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 12¶</h3><p>You can iterate over all of an element’s parents with <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>12. This example uses <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>12 to travel from an <a> tag buried deep within the document, to the very top of the document:</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>51</p><p><h2 id="di-ngang">Đi ngang¶</h2><p>Hãy xem xét một tài liệu đơn giản như thế này</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>52</p><p>The <b> tag and the <c> tag are at the same level: they’re both direct children of the same tag. We call them siblings. When a document is pretty-printed, siblings show up at the same indentation level. You can also use this relationship in the code you write.</p><p><h3 id="for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-15-va-for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-16">for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 15 và for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 16¶</h3><p>Bạn có thể sử dụng <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>15 và <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>16 để điều hướng giữa các phần tử trang ở cùng cấp độ của cây phân tích cú pháp</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>53</p><p>The <b> tag has a <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>15, but no <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>16, because there’s nothing before the <b> tag on the same level of the tree. For the same reason, the <c> tag has a <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>16 but no <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>15:</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>54</p><p>Các chuỗi “text1” và “text2” không phải là anh em ruột vì chúng không có cùng cha mẹ</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>55</p><p>Trong các tài liệu thực, <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>15 hoặc <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>16 của thẻ thường sẽ là một chuỗi chứa khoảng trắng. Trở lại với tư liệu “ba chị em”</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>56</p><p>You might think that the <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>15 of the first <a> tag would be the second <a> tag. But actually, it’s a string: the comma and newline that separate the first <a> tag from the second:</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>57</p><p>The second <a> tag is actually the <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>15 of the comma:</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>58</p><p><h3 id="for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-27-va-for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-28">for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 27 và for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 28¶</h3><p>Bạn có thể lặp lại các anh chị em của thẻ với <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>27 hoặc <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>28</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>59</p><p><h2 id="quay-di-quay-lai">Quay đi quay lại¶</h2><p>Hãy xem phần đầu của tài liệu “ba chị em”</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>60</p><p>An HTML parser takes this string of characters and turns it into a series of events: “open an <html> tag”, “open a <head> tag”, “open a <title> tag”, “add a string”, “close the <title> tag”, “open a <p> tag”, and so on. Beautiful Soup offers tools for reconstructing the initial parse of the document.</p><p><h3 id="for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-31-va-for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-32">for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 31 và for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 32¶</h3><p>Thuộc tính <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>31 của một chuỗi hoặc thẻ trỏ đến bất kỳ thứ gì được phân tích cú pháp ngay sau đó. Nó có thể giống với <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>15, nhưng nó thường khác rất nhiều</p><p>Here’s the final <a> tag in the “three sisters” document. Its <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>15 is a string: the conclusion of the sentence that was interrupted by the start of the <a> tag.:</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>61</p><p>But the <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>31 of that <a> tag, the thing that was parsed immediately after the <a> tag, is not the rest of that sentence: it’s the word “Tillie”:</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>62</p><p>That’s because in the original markup, the word “Tillie” appeared before that semicolon. The parser encountered an <a> tag, then the word “Tillie”, then the closing </a> tag, then the semicolon and rest of the sentence. The semicolon is on the same level as the <a> tag, but the word “Tillie” was encountered first.</p><p>Thuộc tính <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>32 hoàn toàn ngược lại với <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>31. Nó trỏ đến bất kỳ phần tử nào đã được phân tích cú pháp ngay trước phần tử này</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>63</p><p><h3 id="for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-39-va-for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-40">for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 39 và for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 40¶</h3><p>Bạn nên có ý tưởng ngay bây giờ. Bạn có thể sử dụng các vòng lặp này để tiến hoặc lùi trong tài liệu khi nó được phân tích cú pháp</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>64</p><p>Tìm kiếm trên cây¶<p>Beautiful Soup định nghĩa rất nhiều phương pháp để tìm kiếm cây phân tích cú pháp, nhưng chúng đều rất giống nhau. Tôi sẽ dành nhiều thời gian để giải thích hai phương pháp phổ biến nhất. <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>53 và <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42. Các phương thức khác có các đối số gần như chính xác giống nhau, vì vậy tôi sẽ chỉ trình bày ngắn gọn về chúng</p><p>Một lần nữa, tôi sẽ sử dụng tài liệu “ba chị em” làm ví dụ</p><p><p><pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>0</p><p>Bằng cách chuyển một bộ lọc tới một đối số như <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42, bạn có thể phóng to các phần của tài liệu mà bạn quan tâm</p><p><h2 id="cac-loai-bo-loc">Các loại bộ lọc¶</h2><p>Trước khi nói chi tiết về <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42 và các phương thức tương tự, tôi muốn đưa ra các ví dụ về các bộ lọc khác nhau mà bạn có thể chuyển vào các phương thức này. Các bộ lọc này hiển thị lặp đi lặp lại trong toàn bộ API tìm kiếm. Bạn có thể sử dụng chúng để lọc dựa trên tên của thẻ, trên thuộc tính của thẻ, trên văn bản của chuỗi hoặc trên một số kết hợp của những điều này</p><p><h3 id="mot-chuoi">Một chuỗi¶</h3><p>The simplest filter is a string. Pass a string to a search method and Beautiful Soup will perform a match against that exact string. This code finds all the <b> tags in the document:</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>66</p><p>Nếu bạn chuyển vào một chuỗi byte, Beautiful Soup sẽ cho rằng chuỗi được mã hóa dưới dạng UTF-8. Thay vào đó, bạn có thể tránh điều này bằng cách chuyển vào một chuỗi Unicode</p><p><h3 id="bieu-thuc-chinh-quy">Biểu thức chính quy¶</h3><p>Nếu bạn chuyển vào một đối tượng biểu thức chính quy, Beautiful Soup sẽ lọc theo biểu thức chính quy đó bằng cách sử dụng phương thức <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>45 của nó. Mã này tìm tất cả các thẻ có tên bắt đầu bằng chữ cái “b”; </p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>67</p><p>Mã này tìm tất cả các thẻ có tên chứa chữ 't'</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>68</p><p><h3 id="mot-danh-sach">Một danh sách¶</h3><p>If you pass in a list, Beautiful Soup will allow a string match against any item in that list. This code finds all the <a> tags and all the <b> tags:</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>69</p><p><h3 id="for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-46">for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 46¶</h3><p>Giá trị <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>46 khớp với mọi thứ có thể. This code finds all the tags in the document, but none of the text strings</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>30</p><p><h3 id="a-function">A function¶</h3><p>If none of the other matches work for you, define a function that takes an element as its only argument. The function should return <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>46 if the argument matches, and <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>49 otherwise</p><p>Here’s a function that returns <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>46 if a tag defines the “class” attribute but doesn’t define the “id” attribute</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>31</p><p>Pass this function into <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42 and you’ll pick up all the <p> tags:</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>32</p><p>This function only picks up the <p> tags. It doesn’t pick up the <a> tags, because those tags define both “class” and “id”. It doesn’t pick up tags like <html> and <title>, because those tags don’t define “class”.</p><p>If you pass in a function to filter on a specific attribute like <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>52, the argument passed into the function will be the attribute value, not the whole tag. Here’s a function that finds all <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>53 tags whose <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>52 attribute does not match a regular expression</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>33</p><p>The function can be as complicated as you need it to be. Đây là một hàm trả về <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>46 nếu một thẻ được bao quanh bởi các đối tượng chuỗi</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>34</p><p>Now we’re ready to look at the search methods in detail</p><p><h2 id="for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-42">for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 42¶</h2><p>Chữ ký phương thức. find_all( <span>name</span> , <span>attrs</span> , <span>recursive</span> , <span>string</span> , <span>limit</span> , <span>**kwargs</span> )</p><p>The <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42 method looks through a tag’s descendants and retrieves all descendants that match your filters. I gave several examples in Kinds of filters, but here are a few more</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>35</p><p>Some of these should look familiar, but others are new. What does it mean to pass in a value for <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>58, or <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>59? Why does <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>60 find a <p> tag with the CSS class “title”? Let’s look at the arguments to <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42.</p><p><h3 id="the-for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-62-argument">The for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 62 argument¶</h3><p>Nhập một giá trị cho ________ 462 và bạn sẽ yêu cầu Beautifulsoup chỉ xem xét các thẻ có tên nhất định. Text strings will be ignored, as will tags whose names that don’t match</p><p>This is the simplest usage</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>36</p><p>Nhớ lại từ Các loại bộ lọc mà giá trị thành <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>62 có thể là một chuỗi, biểu thức chính quy, danh sách, hàm hoặc giá trị True</p><p><h3 id="the-keyword-arguments">The keyword arguments¶</h3><p>Any argument that’s not recognized will be turned into a filter on one of a tag’s attributes. If you pass in a value for an argument called <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>59, Beautiful Soup will filter against each tag’s ‘id’ attribute</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>37</p><p>Nếu bạn nhập một giá trị cho <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>52, Beautiful Soup sẽ lọc theo thuộc tính 'href' của mỗi thẻ</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>38</p><p>Bạn có thể lọc thuộc tính dựa trên chuỗi, biểu thức chính quy, danh sách, hàm hoặc giá trị True</p><p>This code finds all tags whose <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>59 attribute has a value, regardless of what the value is</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>39</p><p>You can filter multiple attributes at once by passing in more than one keyword argument</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>50</p><p>Một số thuộc tính, chẳng hạn như thuộc tính data-* trong HTML 5, có tên không thể dùng làm tên của đối số từ khóa</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>51</p><p>Bạn có thể sử dụng các thuộc tính này trong các tìm kiếm bằng cách đưa chúng vào từ điển và chuyển từ điển vào <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42 dưới dạng đối số <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>69</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>52</p><p>Bạn không thể sử dụng đối số từ khóa để tìm kiếm phần tử 'tên' HTML, vì Beautifulsoup sử dụng đối số ________ 462 để chứa tên của chính thẻ đó. Instead, you can give a value to ‘name’ in the <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>69 argument</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>53</p><p><h3 id="searching-by-css-class">Searching by CSS class¶</h3><p>It’s very useful to search for a tag that has a certain CSS class, but the name of the CSS attribute, “class”, is a reserved word in Python. Sử dụng <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>35 làm đối số từ khóa sẽ gây ra lỗi cú pháp cho bạn. Kể từ Beautiful Soup 4. 1. 2, bạn có thể tìm kiếm theo lớp CSS bằng đối số từ khóa <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>73</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>54</p><p>As with any keyword argument, you can pass <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>73 a string, a regular expression, a function, or <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>46</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>55</p><p><span>Hãy nhớ</span> rằng một thẻ có thể có nhiều giá trị cho thuộc tính “lớp” của nó. When you search for a tag that matches a certain CSS class, you’re matching against any of its CSS classes. </p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>56</p><p>Bạn cũng có thể tìm kiếm giá trị chuỗi chính xác của thuộc tính <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>35</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>57</p><p>But searching for variants of the string value won’t work</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>58</p><p>Nếu bạn muốn tìm kiếm các thẻ khớp với hai hoặc nhiều lớp CSS, bạn nên sử dụng bộ chọn CSS</p></p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>59</p><p>Trong các phiên bản cũ hơn của Beautiful Soup, không có phím tắt <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>73, bạn có thể sử dụng thủ thuật <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>69 đã đề cập ở trên. Create a dictionary whose value for “class” is the string (or regular expression, or whatever) you want to search for</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>60</p><p><h3 id="doi-so-for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-58">Đối số for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 58¶</h3><p>Với <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>58, bạn có thể tìm kiếm chuỗi thay vì thẻ. Như với <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>62 và các đối số từ khóa, bạn có thể chuyển vào một chuỗi, biểu thức chính quy, danh sách, hàm hoặc giá trị True. Dưới đây là một số ví dụ</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>61</p><p>Although <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>58 is for finding strings, you can combine it with arguments that find tags: Beautiful Soup will find all tags whose <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>52 matches your value for <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>58. This code finds the <a> tags whose <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>52 is “Elsie”:</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>62</p><p>Đối số <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>58 mới trong Beautiful Soup 4. 4. 0. In earlier versions it was called <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>87</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>63</p><p><h3 id="the-for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-88-argument">The for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 88 argument¶</h3><p><pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42 returns all the tags and strings that match your filters. This can take a while if the document is large. If you don’t need all the results, you can pass in a number for <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>88. This works just like the LIMIT keyword in SQL. It tells Beautiful Soup to stop gathering results after it’s found a certain number</p><p>There are three links in the “three sisters” document, but this code only finds the first two</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>64</p><p><h3 id="the-for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-91-argument">The for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 91 argument¶</h3><p>If you call <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>92, Beautiful Soup will examine all the descendants of <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>93. its children, its children’s children, and so on. If you only want Beautiful Soup to consider direct children, you can pass in <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>94. See the difference here</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>65</p><p>Here’s that part of the document</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>66</p><p>The <title> tag is beneath the <html> tag, but it’s not directly beneath the <html> tag: the <head> tag is in the way. Beautiful Soup finds the <title> tag when it’s allowed to look at all descendants of the <html> tag, but when <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>94 restricts it to the <html> tag’s immediate children, it finds nothing.</p><p>Beautiful Soup offers a lot of tree-searching methods (covered below), and they mostly take the same arguments as <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42. <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>62, <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>69, <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>58, <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>88, and the keyword arguments. But the <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>91 argument is different. <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42 and <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>53 are the only methods that support it. Passing <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>94 into a method like <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>05 wouldn’t be very useful</p><p><h2 id="calling-a-tag-is-like-calling-for-link-in-soup-find-all-a-print-link-get-href-http-example-com-elsie-http-example-com-lacie-http-example-com-tillie-42">Calling a tag is like calling for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie 42¶</h2><p>Because <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42 is the most popular method in the Beautiful Soup search API, you can use a shortcut for it. If you treat the <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 object or a <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>26 object as though it were a function, then it’s the same as calling <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42 on that object. These two lines of code are equivalent</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>67</p><p>These two lines are also equivalent</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>68</p><p><h2 id="soup-title-lt-title-gt-the-dormouse-s-story-lt-title-gt-soup-title-name-u-title-soup-title-string-u-the-dormouse-s-story-soup-title-parent-name-u-head-soup-p-lt-p-class-title-gt-lt-b-gt-the-dormouse-s-story-lt-b-gt-lt-p-gt-soup-p-class-u-title-soup-a-lt-a-class-sister-href-http-example-com-elsie-id-link1-gt-elsie-lt-a-gt-soup-find-all-a-lt-a-class-sister-href-http-example-com-elsie-id-link1-gt-elsie-lt-a-gt-lt-a-class-sister-href-http-example-com-lacie-id-link2-gt-lacie-lt-a-gt-lt-a-class-sister-href-http-example-com-tillie-id-link3-gt-tillie-lt-a-gt-soup-find-id-link3-lt-a-class-sister-href-http-example-com-tillie-id-link3-gt-tillie-lt-a-gt-53">soup.title # <title>The Dormouse's story</title> soup.title.name # u'title' soup.title.string # u'The Dormouse's story' soup.title.parent.name # u'head' soup.p # <p class="title"><b>The Dormouse's story</b></p> soup.p['class'] # u'title' soup.a # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> soup.find_all('a') # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] soup.find(id="link3") # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> 53¶</h2><p>Chữ ký phương thức. find( <span>tên</span> , <span>attrs</span> , <span>recursive</span>, <span>string</span>, <span>**kwargs</span>)</p><p>The <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42 method scans the entire document looking for results, but sometimes you only want to find one result. If you know a document only has one <body> tag, it’s a waste of time to scan the entire document looking for more. Rather than passing in <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>13 every time you call <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>14, you can use the <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>53 method. These two lines of code are nearly equivalent:</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>69</p><p>Sự khác biệt duy nhất là <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42 trả về một danh sách chứa kết quả duy nhất và <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>53 chỉ trả về kết quả</p><p>Nếu <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42 không thể tìm thấy bất cứ thứ gì, nó sẽ trả về một danh sách trống. Nếu <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>53 không tìm thấy gì, nó sẽ trả về <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>02</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>00</p><p>Bạn có nhớ thủ thuật <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>21 từ Điều hướng bằng cách sử dụng tên thẻ không? </p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>01</p><p><h2 id="print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-05-and-print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-24">print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 05 and print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 24¶</h2><p>Chữ ký phương thức. find_parents( <span>tên</span> , <span>attrs</span> , <span>string</span>, <span>limit</span>, <span>**kwargs</span>)</p><p>Chữ ký phương thức. find_parent( <span>tên</span> , <span>attrs</span> , <span>string</span>, <span>**kwargs</span>)</p><p>Tôi đã dành rất nhiều thời gian ở trên để viết về <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42 và <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>53. API Beautiful Soup định nghĩa mười phương pháp khác để tìm kiếm cây, nhưng đừng sợ. Năm trong số các phương thức này về cơ bản giống với <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42 và năm phương thức còn lại về cơ bản giống với <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>53. Sự khác biệt duy nhất là chúng tìm kiếm ở bộ phận nào của cây</p><p>Trước tiên, hãy xem xét <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>05 và <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>24. Hãy nhớ rằng <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42 và <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>53 đi xuống cây, nhìn vào hậu duệ của thẻ. Các phương pháp này làm ngược lại. they work their way up the tree, looking at a tag’s (or a string’s) parents. Thử xem nào, bắt đầu từ sợi dây chôn sâu trong tài liệu “ba cô con gái”</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>02</p><p>One of the three <a> tags is the direct parent of the string in question, so our search finds it. One of the three <p> tags is an indirect parent of the string, and our search finds that as well. There’s a <p> tag with the CSS class “title” somewhere in the document, but it’s not one of this string’s parents, so we can’t find it with <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>05.</p><p>Bạn có thể đã tạo ra mối liên hệ giữa <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>24 và <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>05, và. cha mẹ và. thuộc tính cha mẹ đã đề cập trước đó. Kết nối rất mạnh mẽ. These search methods actually use <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>12 to iterate over all the parents, and check each one against the provided filter to see if it matches</p><p><h2 id="print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-37-va-print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-38">print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 37 và print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 38¶</h2><p>Chữ ký phương thức. find_next_siblings( <span>tên</span> , <span>attrs</span> , <span>string</span>, <span>limit</span>, <span>**kwargs</span>)</p><p>Chữ ký phương thức. find_next_sibling( <span>tên</span> , <span>attrs</span> , <span>string</span>, <span>**kwargs</span>)</p><p>Những phương pháp này sử dụng <span>. next_siblings</span> to iterate over the rest of an element’s siblings in the tree. Phương thức <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>37 trả về tất cả các anh chị em phù hợp và <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>38 chỉ trả về anh chị em đầu tiên. </p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>03</p><p><h2 id="print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-41-va-print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-42">print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 41 và print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 42¶</h2><p>Chữ ký phương thức. find_previous_siblings( <span>tên</span> , <span>attrs</span> , <span>string</span>, <span>limit</span>, <span>**kwargs</span>)</p><p>Chữ ký phương thức. find_previous_sibling( <span>tên</span> , <span>attrs</span> , <span>string</span>, <span>**kwargs</span>)</p><p>Những phương pháp này sử dụng <span>. previous_siblings</span> để lặp lại các phần tử anh chị em của phần tử đứng trước nó trong cây. Phương thức <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>41 trả về tất cả các anh chị em phù hợp và <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>42 chỉ trả về anh chị em đầu tiên. </p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>04</p><p><h2 id="print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-45-va-print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-46">print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 45 và print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 46¶</h2><p>Method signature. find_all_next( <span>name</span> , <span>attrs</span> , <span>string</span> , <span>limit</span> , <span>**kwargs</span> )</p><p>Chữ ký phương thức. find_next( <span>tên</span> , <span>attrs</span> , <span>string</span>, <span>**kwargs</span>)</p><p>Những phương pháp này sử dụng <span>. next_elements</span> để lặp qua bất kỳ thẻ và chuỗi nào xuất hiện sau nó trong tài liệu. Phương thức <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>45 trả về tất cả các kết quả khớp và <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>46 chỉ trả về kết quả khớp đầu tiên. </p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>05</p><p>In the first example, the string “Elsie” showed up, even though it was contained within the <a> tag we started from. In the second example, the last <p> tag in the document showed up, even though it’s not in the same part of the tree as the <a> tag we started from. For these methods, all that matters is that an element match the filter, and show up later in the document than the starting element.</p><p><h2 id="print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-49-va-print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-50">print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 49 và print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 50¶</h2><p>Chữ ký phương thức. find_all_previous( <span>tên</span> , <span>attrs</span> , <span>string</span>, <span>limit</span>, <span>**kwargs</span>)</p><p>Chữ ký phương thức. find_previous( <span>tên</span> , <span>attrs</span> , <span>string</span>, <span>**kwargs</span>)</p><p>Những phương pháp này sử dụng <span>. previous_elements</span> để lặp lại các thẻ và chuỗi trước nó trong tài liệu. Phương thức <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>49 trả về tất cả các kết quả phù hợp và <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>50 chỉ trả về kết quả khớp đầu tiên. </p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>06</p><p>The call to <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>53 found the first paragraph in the document (the one with class=”title”), but it also finds the second paragraph, the <p> tag that contains the <a> tag we started with. This shouldn’t be too surprising: we’re looking at all the tags that show up earlier in the document than the one we started with. A <p> tag that contains an <a> tag must have shown up before the <a> tag it contains.</p><p><h2 id="bo-chon-css">Bộ chọn CSS¶</h2><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 có phương thức <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>55 sử dụng gói SoupSieve để chạy bộ chọn CSS đối với tài liệu được phân tích cú pháp và trả về tất cả các phần tử phù hợp. <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>26 has a similar method which runs a CSS selector against the contents of a single tag</p><p>(Tích hợp SoupSieve đã được thêm vào Beautiful Soup 4. 7. 0. Các phiên bản trước cũng có phương thức <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>55, nhưng chỉ hỗ trợ các bộ chọn CSS được sử dụng phổ biến nhất. Nếu bạn đã cài đặt Beautiful Soup đến <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>08, SoupSieve đã được cài đặt cùng lúc, vì vậy bạn không phải làm gì thêm. )</p><p>Tài liệu SoupSieve liệt kê tất cả các bộ chọn CSS hiện được hỗ trợ, nhưng đây là một số điều cơ bản</p><p>Bạn có thể tìm thấy các thẻ</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>07</p><p>Tìm các thẻ bên dưới các thẻ khác</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>08</p><p>Tìm các thẻ ngay bên dưới các thẻ khác</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>09</p><p>Tìm anh chị em của thẻ</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>10</p><p>Tìm thẻ theo lớp CSS</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>11</p><p>Tìm thẻ theo ID</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>12</p><p>Tìm các thẻ khớp với bất kỳ bộ chọn nào từ danh sách các bộ chọn</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>13</p><p>Kiểm tra sự tồn tại của một thuộc tính</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>14</p><p>Tìm thẻ theo giá trị thuộc tính</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>15</p><p>Ngoài ra còn có một phương pháp gọi là <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>59, chỉ tìm thấy thẻ đầu tiên khớp với bộ chọn</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>16</p><p>Nếu bạn đã phân tích cú pháp XML xác định không gian tên, thì bạn có thể sử dụng chúng trong bộ chọn CSS</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>17</p><p>Khi xử lý bộ chọn CSS sử dụng không gian tên, Beautiful Soup luôn cố gắng sử dụng các tiền tố không gian tên có ý nghĩa dựa trên những gì nó thấy trong khi phân tích cú pháp tài liệu. Bạn luôn có thể cung cấp từ điển viết tắt của riêng bạn</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>18</p><p>Tất cả nội dung bộ chọn CSS này là một tiện ích cho những người đã biết cú pháp bộ chọn CSS. Bạn có thể làm tất cả những điều này với API Beautiful Soup. Và nếu bộ chọn CSS là tất cả những gì bạn cần, bạn nên phân tích cú pháp tài liệu bằng lxml. nó nhanh hơn rất nhiều. Nhưng điều này cho phép bạn kết hợp các bộ chọn CSS với API Beautiful Soup</p><p>Sửa đổi cây¶<p>Điểm mạnh chính của Beautiful Soup là tìm kiếm cây phân tích cú pháp, nhưng bạn cũng có thể sửa đổi cây và viết các thay đổi của mình dưới dạng tài liệu HTML hoặc XML mới</p><p><h2 id="thay-doi-ten-the-va-thuoc-tinh">Thay đổi tên thẻ và thuộc tính¶</h2><p>Tôi đã đề cập đến điều này trước đó, trong Thuộc tính, nhưng nó lặp đi lặp lại. Bạn có thể đổi tên thẻ, thay đổi giá trị của thuộc tính, thêm thuộc tính mới và xóa thuộc tính</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>19</p><p><h2 id="sua-doi-soup-title-lt-title-gt-the-dormouse-s-story-lt-title-gt-soup-title-name-u-title-soup-title-string-u-the-dormouse-s-story-soup-title-parent-name-u-head-soup-p-lt-p-class-title-gt-lt-b-gt-the-dormouse-s-story-lt-b-gt-lt-p-gt-soup-p-class-u-title-soup-a-lt-a-class-sister-href-http-example-com-elsie-id-link1-gt-elsie-lt-a-gt-soup-find-all-a-lt-a-class-sister-href-http-example-com-elsie-id-link1-gt-elsie-lt-a-gt-lt-a-class-sister-href-http-example-com-lacie-id-link2-gt-lacie-lt-a-gt-lt-a-class-sister-href-http-example-com-tillie-id-link3-gt-tillie-lt-a-gt-soup-find-id-link3-lt-a-class-sister-href-http-example-com-tillie-id-link3-gt-tillie-lt-a-gt-52">Sửa đổi soup.title # <title>The Dormouse's story</title> soup.title.name # u'title' soup.title.string # u'The Dormouse's story' soup.title.parent.name # u'head' soup.p # <p class="title"><b>The Dormouse's story</b></p> soup.p['class'] # u'title' soup.a # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> soup.find_all('a') # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] soup.find(id="link3") # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> 52¶</h2><p>If you set a tag’s <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>52 attribute to a new string, the tag’s contents are replaced with that string</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>20</p><p>Hãy cẩn thận. nếu thẻ chứa các thẻ khác, chúng và tất cả nội dung của chúng sẽ bị hủy</p><p><h2 id="print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-62">print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 62¶</h2><p>Bạn có thể thêm vào nội dung của thẻ bằng <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>63. Nó hoạt động giống như gọi <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>64 trong danh sách Python</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>21</p><p><h2 id="print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-65">print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 65¶</h2><p>Starting in Beautiful Soup 4. 7. 0, <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>26 cũng hỗ trợ một phương thức gọi là <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>67, phương thức này thêm mọi phần tử của danh sách vào một <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>26, theo thứ tự</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>22</p><p><h2 id="print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-69-va-print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-70">print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 69 và print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 70¶</h2><p>Nếu bạn cần thêm một chuỗi vào tài liệu, không vấn đề gì – bạn có thể chuyển một chuỗi Python vào <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>62 hoặc bạn có thể gọi hàm tạo <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>27</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>23</p><p>Nếu bạn muốn tạo một bình luận hoặc một số lớp con khác của <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>27, chỉ cần gọi hàm tạo</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>24</p><p>(Đây là tính năng mới trong Beautiful Soup 4. 4. 0. )</p><p>What if you need to create a whole new tag? The best solution is to call the factory method <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>74</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>25</p><p>Chỉ đối số đầu tiên, tên thẻ, là bắt buộc</p><p><h2 id="print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-75">print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 75¶</h2><p><pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>76 cũng giống như <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>63, ngoại trừ phần tử mới không nhất thiết phải ở cuối phần tử mẹ của nó là <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>51. Nó sẽ được chèn vào bất kỳ vị trí số nào bạn nói. Nó hoạt động giống như <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>79 trong danh sách Python</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>26</p><p><h2 id="print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-80-va-print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-81">print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 80 và print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 81¶</h2><p>Phương thức <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>80 chèn các thẻ hoặc chuỗi ngay trước một thứ khác trong cây phân tích cú pháp</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>27</p><p>Phương thức <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>81 chèn các thẻ hoặc chuỗi ngay sau một thứ khác trong cây phân tích cú pháp</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>28</p><p><h2 id="print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-84">print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 84¶</h2><p><pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>85 xóa nội dung của thẻ</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>29</p><p><h2 id="print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-86">print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 86¶</h2><p><pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>87 xóa thẻ hoặc chuỗi khỏi cây. It returns the tag or string that was extracted</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>30</p><p>At this point you effectively have two parse trees. one rooted at the <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 object you used to parse the document, and one rooted at the tag that was extracted. Bạn có thể tiếp tục gọi <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>89 trên phần tử con của phần tử mà bạn đã trích xuất</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>31</p><p><h2 id="print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-90">print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 90¶</h2><p><pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>91 removes a tag from the tree, then completely destroys it and its contents</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>32</p><p>Hành vi của một <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>26 hoặc <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>27 bị phân tách không được xác định và bạn không nên sử dụng nó cho bất cứ điều gì. If you’re not sure whether something has been decomposed, you can check its <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>94 property (new in Beautiful Soup 4. 9. 0)</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>33</p><p><h2 id="print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-95">print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 95¶</h2><p><pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>96 xóa thẻ hoặc chuỗi khỏi cây và thay thế bằng một hoặc nhiều thẻ hoặc chuỗi bạn chọn</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>34</p><p><pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>95 trả về thẻ hoặc chuỗi đã được thay thế để bạn có thể kiểm tra hoặc thêm lại vào phần khác của cây</p><p>Khả năng chuyển nhiều đối số vào replace_with() là tính năng mới trong Beautiful Soup 4. 10. 0</p><p><h2 id="print-soup-get-text-the-dormouse-s-story-the-dormouse-s-story-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-elsie-lacie-and-tillie-and-they-lived-at-the-bottom-of-a-well-98">print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... 98¶</h2><p><pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>99 bọc một phần tử trong thẻ bạn chỉ định. Nó trả về trình bao bọc mới</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>35</p><p>Phương pháp này mới trong Beautiful Soup 4. 0. 5</p></p><p><h2 id="from-bs4-import-beautifulsoup-soup-beautifulsoup-html-doc-html-parser-print-soup-prettify-lt-html-gt-lt-head-gt-lt-title-gt-the-dormouse-s-story-lt-title-gt-lt-head-gt-lt-body-gt-lt-p-class-title-gt-lt-b-gt-the-dormouse-s-story-lt-b-gt-lt-p-gt-lt-p-class-story-gt-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-lt-a-class-sister-href-http-example-com-elsie-id-link1-gt-elsie-lt-a-gt-lt-a-class-sister-href-http-example-com-lacie-id-link2-gt-lacie-lt-a-gt-and-lt-a-class-sister-href-http-example-com-tillie-id-link3-gt-tillie-lt-a-gt-and-they-lived-at-the-bottom-of-a-well-lt-p-gt-lt-p-class-story-gt-lt-p-gt-lt-body-gt-lt-html-gt-500">from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, 'html.parser') print(soup.prettify()) # <html> # <head> # <title> # The Dormouse's story # </title> # </head> # <body> # <p class="title"> # <b> # The Dormouse's story # </b> # </p> # <p class="story"> # Once upon a time there were three little sisters; and their names were # <a class="sister" href="http://example.com/elsie" id="link1"> # Elsie # </a> # , # <a class="sister" href="http://example.com/lacie" id="link2"> # Lacie # </a> # and # <a class="sister" href="http://example.com/tillie" id="link3"> # Tillie # </a> # ; and they lived at the bottom of a well. # </p> # <p class="story"> # ... # </p> # </body> # </html> 500¶</h2><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>501 ngược lại với <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>98. Nó thay thế một thẻ bằng bất cứ thứ gì bên trong thẻ đó. Nó tốt cho việc loại bỏ đánh dấu</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>36</p><p>Giống như <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>95, <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>500 trả về thẻ đã được thay thế</p><p><h2 id="from-bs4-import-beautifulsoup-soup-beautifulsoup-html-doc-html-parser-print-soup-prettify-lt-html-gt-lt-head-gt-lt-title-gt-the-dormouse-s-story-lt-title-gt-lt-head-gt-lt-body-gt-lt-p-class-title-gt-lt-b-gt-the-dormouse-s-story-lt-b-gt-lt-p-gt-lt-p-class-story-gt-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-lt-a-class-sister-href-http-example-com-elsie-id-link1-gt-elsie-lt-a-gt-lt-a-class-sister-href-http-example-com-lacie-id-link2-gt-lacie-lt-a-gt-and-lt-a-class-sister-href-http-example-com-tillie-id-link3-gt-tillie-lt-a-gt-and-they-lived-at-the-bottom-of-a-well-lt-p-gt-lt-p-class-story-gt-lt-p-gt-lt-body-gt-lt-html-gt-505">from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, 'html.parser') print(soup.prettify()) # <html> # <head> # <title> # The Dormouse's story # </title> # </head> # <body> # <p class="title"> # <b> # The Dormouse's story # </b> # </p> # <p class="story"> # Once upon a time there were three little sisters; and their names were # <a class="sister" href="http://example.com/elsie" id="link1"> # Elsie # </a> # , # <a class="sister" href="http://example.com/lacie" id="link2"> # Lacie # </a> # and # <a class="sister" href="http://example.com/tillie" id="link3"> # Tillie # </a> # ; and they lived at the bottom of a well. # </p> # <p class="story"> # ... # </p> # </body> # </html> 505¶</h2><p>Sau khi gọi một loạt các phương thức sửa đổi cây phân tích cú pháp, bạn có thể có hai hoặc nhiều đối tượng <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>27 cạnh nhau. Beautiful Soup không có bất kỳ vấn đề nào với điều này, nhưng vì nó không thể xảy ra trong một tài liệu mới được phân tích cú pháp nên bạn có thể không mong đợi hành vi như sau</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>37</p><p>Bạn có thể gọi <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>507 để dọn sạch cây phân tích cú pháp bằng cách hợp nhất các chuỗi liền kề</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>38</p><p>Phương pháp này mới trong Beautiful Soup 4. 8. 0</p><p>Đầu ra¶<p><h2 id="in-dep">In đẹp¶</h2><p>Phương thức <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>508 sẽ biến cây phân tích cú pháp Beautiful Soup thành một chuỗi Unicode được định dạng độc đáo, với một dòng riêng cho mỗi thẻ và mỗi chuỗi</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>39</p><p>Bạn có thể gọi <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>508 trên đối tượng <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 cấp cao nhất hoặc trên bất kỳ đối tượng <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>26 nào của nó</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>40</p><p>Since it adds whitespace (in the form of newlines), <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>508 changes the meaning of an HTML document and should not be used to reformat one. Mục tiêu của <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>508 là giúp bạn hiểu một cách trực quan cấu trúc của các tài liệu mà bạn làm việc với</p><p><h2 id="ban-in-khong-dep">Bản in không đẹp¶</h2><p>Nếu bạn chỉ muốn một chuỗi, không có định dạng ưa thích, bạn có thể gọi <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>514 trên một đối tượng <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 hoặc trên một <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>26 bên trong nó</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>41</p><p>Hàm <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>514 trả về một chuỗi được mã hóa bằng UTF-8. Xem Mã hóa để biết các tùy chọn khác</p><p>Bạn cũng có thể gọi <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>518 để lấy bytestring và <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>519 để lấy Unicode</p><p><h2 id="output-formatters">Output formatters¶</h2><p>Nếu bạn cung cấp cho Beautiful Soup một tài liệu chứa các thực thể HTML như “&lquot;”, chúng sẽ được chuyển đổi thành các ký tự Unicode</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>42</p><p>Sau đó, nếu bạn chuyển đổi tài liệu thành một chuỗi ký tự, thì các ký tự Unicode sẽ được mã hóa thành UTF-8. Bạn sẽ không lấy lại được các thực thể HTML</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>43</p><p>By default, the only characters that are escaped upon output are bare ampersands and angle brackets. These get turned into “&”, “<”, and “>”, so that Beautiful Soup doesn’t inadvertently generate invalid HTML or XML:</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>44</p><p>Bạn có thể thay đổi hành vi này bằng cách cung cấp một giá trị cho đối số <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>520 thành <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>508, <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>518 hoặc <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>519. Beautiful Soup recognizes five possible values for <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>520</p><p>Mặc định là <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>525. Chuỗi sẽ chỉ được xử lý đủ để đảm bảo rằng Beautiful Soup tạo HTML/XML hợp lệ</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>45</p><p>Nếu bạn vượt qua <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>526, Beautiful Soup sẽ chuyển đổi các ký tự Unicode thành các thực thể HTML bất cứ khi nào có thể</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>46</p><p>Nếu bạn vượt qua <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>527, nó tương tự như <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>526, nhưng Beautiful Soup sẽ bỏ qua dấu gạch chéo trong các thẻ trống HTML như “br”</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>47</p><p>Ngoài ra, bất kỳ thuộc tính nào có giá trị là chuỗi rỗng sẽ trở thành thuộc tính boolean kiểu HTML</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>48</p><p>(Hành vi này là mới kể từ Beautiful Soup 4. 10. 0. )</p><p>Nếu bạn vượt qua <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>529, Beautiful Soup sẽ không sửa đổi chuỗi nào ở đầu ra. Đây là tùy chọn nhanh nhất, nhưng nó có thể dẫn đến việc Beautiful Soup tạo HTML/XML không hợp lệ, như trong các ví dụ sau</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>49</p><p>Nếu bạn cần kiểm soát tinh vi hơn đối với đầu ra của mình, bạn có thể sử dụng lớp <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>530 của Beautiful Soup. Đây là một trình định dạng chuyển đổi chuỗi thành chữ hoa, cho dù chúng xuất hiện trong một nút văn bản hay trong một giá trị thuộc tính</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>50</p><p>Đây là một trình định dạng giúp tăng độ thụt đầu dòng khi in đẹp</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>51</p><p>Phân lớp <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>531 hoặc <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>532 sẽ cung cấp cho bạn nhiều quyền kiểm soát hơn đối với đầu ra. Ví dụ: Beautiful Soup sắp xếp các thuộc tính trong mọi thẻ theo mặc định</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>52</p><p>Để tắt tính năng này, bạn có thể phân lớp phương thức <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>533, phương thức này kiểm soát thuộc tính nào được xuất và theo thứ tự nào. This implementation also filters out the attribute called “m” whenever it appears</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>53</p><p>Một cảnh báo cuối cùng. nếu bạn tạo một đối tượng <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>75, văn bản bên trong đối tượng đó luôn được trình bày chính xác như nó xuất hiện, không có định dạng. Beautiful Soup sẽ gọi hàm thay thế thực thể của bạn, chỉ trong trường hợp bạn đã viết một hàm tùy chỉnh đếm tất cả các chuỗi trong tài liệu hoặc thứ gì đó, nhưng nó sẽ bỏ qua giá trị trả về</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>54</p><p><h2 id="from-bs4-import-beautifulsoup-soup-beautifulsoup-html-doc-html-parser-print-soup-prettify-lt-html-gt-lt-head-gt-lt-title-gt-the-dormouse-s-story-lt-title-gt-lt-head-gt-lt-body-gt-lt-p-class-title-gt-lt-b-gt-the-dormouse-s-story-lt-b-gt-lt-p-gt-lt-p-class-story-gt-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-lt-a-class-sister-href-http-example-com-elsie-id-link1-gt-elsie-lt-a-gt-lt-a-class-sister-href-http-example-com-lacie-id-link2-gt-lacie-lt-a-gt-and-lt-a-class-sister-href-http-example-com-tillie-id-link3-gt-tillie-lt-a-gt-and-they-lived-at-the-bottom-of-a-well-lt-p-gt-lt-p-class-story-gt-lt-p-gt-lt-body-gt-lt-html-gt-535">from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, 'html.parser') print(soup.prettify()) # <html> # <head> # <title> # The Dormouse's story # </title> # </head> # <body> # <p class="title"> # <b> # The Dormouse's story # </b> # </p> # <p class="story"> # Once upon a time there were three little sisters; and their names were # <a class="sister" href="http://example.com/elsie" id="link1"> # Elsie # </a> # , # <a class="sister" href="http://example.com/lacie" id="link2"> # Lacie # </a> # and # <a class="sister" href="http://example.com/tillie" id="link3"> # Tillie # </a> # ; and they lived at the bottom of a well. # </p> # <p class="story"> # ... # </p> # </body> # </html> 535¶</h2><p>Nếu bạn chỉ muốn văn bản con người có thể đọc được bên trong tài liệu hoặc thẻ, bạn có thể sử dụng phương pháp <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>535. Nó trả về tất cả văn bản trong tài liệu hoặc bên dưới thẻ, dưới dạng một chuỗi Unicode</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>55</p><p>Bạn có thể chỉ định một chuỗi được sử dụng để nối các đoạn văn bản lại với nhau</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>56</p><p>Bạn có thể yêu cầu Beautiful Soup loại bỏ khoảng trắng từ đầu và cuối mỗi đoạn văn bản</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>57</p><p>Nhưng tại thời điểm đó, bạn có thể muốn sử dụng <span>. Thay vào đó, trình tạo striped_strings</span> và tự xử lý văn bản. </p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>58</p><p>As of Beautiful Soup version 4.9.0, when lxml or html.parser are in use, the contents of <script>, <style>, and <template> tags are generally not considered to be ‘text’, since those tags are not part of the human-visible content of the page.</p><p>Kể từ phiên bản Beautiful Soup 4. 10. 0, you can call get_text(), . strings, or . stripped_strings on a NavigableString object. It will either return the object itself, or nothing, so the only reason to do this is when you’re iterating over a mixed list</p><p>Specifying the parser to use¶<p>If you just need to parse some HTML, you can dump the markup into the <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 constructor, and it’ll probably be fine. Beautiful Soup will pick a parser for you and parse the data. But there are a few additional arguments you can pass in to the constructor to change which parser is used</p><p>The first argument to the <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 constructor is a string or an open filehandle–the markup you want parsed. The second argument is how you’d like the markup parsed</p><p>If you don’t specify anything, you’ll get the best HTML parser that’s installed. Beautiful Soup ranks lxml’s parser as being the best, then html5lib’s, then Python’s built-in parser. You can override this by specifying one of the following</p><ul><li><p>What type of markup you want to parse. Currently supported are “html”, “xml”, and “html5”</p></li><li><p>The name of the parser library you want to use. Currently supported options are “lxml”, “html5lib”, and “html. parser” (Python’s built-in HTML parser)</p></li></ul><p>The section Installing a parser contrasts the supported parsers</p><p>If you don’t have an appropriate parser installed, Beautiful Soup will ignore your request and pick a different parser. Right now, the only supported XML parser is lxml. If you don’t have lxml installed, asking for an XML parser won’t give you one, and asking for “lxml” won’t work either</p><p><h2 id="differences-between-parsers">Differences between parsers¶</h2><p>Beautiful Soup presents the same interface to a number of different parsers, but each parser is different. Different parsers will create different parse trees from the same document. The biggest differences are between the HTML parsers and the XML parsers. Here’s a short document, parsed as HTML using the parser that comes with Python</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>59</p><p>Since a standalone <b/> tag is not valid HTML, html.parser turns it into a <b></b> tag pair.</p><p>Here’s the same document parsed as XML (running this requires that you have lxml installed). Note that the standalone <b/> tag is left alone, and that the document is given an XML declaration instead of being put into an <html> tag.:</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>60</p><p>There are also differences between HTML parsers. If you give Beautiful Soup a perfectly-formed HTML document, these differences won’t matter. One parser will be faster than another, but they’ll all give you a data structure that looks exactly like the original HTML document</p><p>But if the document is not perfectly-formed, different parsers will give different results. Here’s a short, invalid document parsed using lxml’s HTML parser. Note that the <a> tag gets wrapped in <body> and <html> tags, and the dangling </p> tag is simply ignored:</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>61</p><p>Here’s the same document parsed using html5lib</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>62</p><p>Instead of ignoring the dangling </p> tag, html5lib pairs it with an opening <p> tag. html5lib also adds an empty <head> tag; lxml didn’t bother.</p><p>Here’s the same document parsed with Python’s built-in HTML parser</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>63</p><p>Like lxml, this parser ignores the closing </p> tag. Unlike html5lib or lxml, this parser makes no attempt to create a well-formed HTML document by adding <html> or <body> tags.</p><p>Since the document “<a></p>” is invalid, none of these techniques is the ‘correct’ way to handle it. The html5lib parser uses techniques that are part of the HTML5 standard, so it has the best claim on being the ‘correct’ way, but all three techniques are legitimate.</p><p>Differences between parsers can affect your script. If you’re planning on distributing your script to other people, or running it on multiple machines, you should specify a parser in the <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 constructor. That will reduce the chances that your users parse a document differently from the way you parse it</p><p>Encodings¶<p>Any HTML or XML document is written in a specific encoding like ASCII or UTF-8. But when you load that document into Beautiful Soup, you’ll discover it’s been converted to Unicode</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>64</p><p>It’s not magic. (That sure would be nice. ) Beautiful Soup uses a sub-library called Unicode, Dammit to detect a document’s encoding and convert it to Unicode. The autodetected encoding is available as the <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>540 attribute of the <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 object</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>65</p><p>Unicode, Dammit guesses correctly most of the time, but sometimes it makes mistakes. Sometimes it guesses correctly, but only after a byte-by-byte search of the document that takes a very long time. If you happen to know a document’s encoding ahead of time, you can avoid mistakes and delays by passing it to the <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 constructor as <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>543</p><p>Here’s a document written in ISO-8859-8. The document is so short that Unicode, Dammit can’t get a lock on it, and misidentifies it as ISO-8859-7</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>66</p><p>We can fix this by passing in the correct <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>543</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>67</p><p>If you don’t know what the correct encoding is, but you know that Unicode, Dammit is guessing wrong, you can pass the wrong guesses in as <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>545</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>68</p><p>Windows-1255 isn’t 100% correct, but that encoding is a compatible superset of ISO-8859-8, so it’s close enough. (<pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>545 is a new feature in Beautiful Soup 4. 4. 0. )</p><p>In rare cases (usually when a UTF-8 document contains text written in a completely different encoding), the only way to get Unicode may be to replace some characters with the special Unicode character “REPLACEMENT CHARACTER” (U+FFFD, �). If Unicode, Dammit needs to do this, it will set the <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>547 attribute to <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>46 on the <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>549 or <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 object. This lets you know that the Unicode representation is not an exact representation of the original–some data was lost. If a document contains �, but <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>547 is <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>49, you’ll know that the � was there originally (as it is in this paragraph) and doesn’t stand in for missing data</p><p><h2 id="output-encoding">Output encoding¶</h2><p>When you write out a document from Beautiful Soup, you get a UTF-8 document, even if the document wasn’t in UTF-8 to begin with. Here’s a document written in the Latin-1 encoding</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>69</p><p>Note that the <meta> tag has been rewritten to reflect the fact that the document is now in UTF-8.</p><p>If you don’t want UTF-8, you can pass an encoding into <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>508</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>70</p><p>You can also call encode() on the <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 object, or any element in the soup, just as if it were a Python string</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>71</p><p>Any characters that can’t be represented in your chosen encoding will be converted into numeric XML entity references. Here’s a document that includes the Unicode character SNOWMAN</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>72</p><p>The SNOWMAN character can be part of a UTF-8 document (it looks like ☃), but there’s no representation for that character in ISO-Latin-1 or ASCII, so it’s converted into “☃” for those encodings</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>73</p><p><h2 id="unicode-dammit">Unicode, Dammit¶</h2><p>You can use Unicode, Dammit without using Beautiful Soup. It’s useful whenever you have data in an unknown encoding and you just want it to become Unicode</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>74</p><p>Unicode, dự đoán của Dammit sẽ chính xác hơn rất nhiều nếu bạn cài đặt một trong những thư viện Python này. <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>555, <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>556, or <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>557. The more data you give Unicode, Dammit, the more accurately it will guess. Nếu bạn có những nghi ngờ của riêng mình về việc mã hóa có thể là gì, bạn có thể chuyển chúng vào dưới dạng danh sách</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>75</p><p>Unicode, Dammit có hai tính năng đặc biệt mà Beautiful Soup không sử dụng</p><p><h3 id="bao-gia-thong-minh">Báo giá thông minh¶</h3><p>Bạn có thể sử dụng Unicode, Dammit để chuyển đổi các trích dẫn thông minh của Microsoft sang các thực thể HTML hoặc XML</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>76</p><p>You can also convert Microsoft smart quotes to ASCII quotes</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>77</p><p>Hopefully you’ll find this feature useful, but Beautiful Soup doesn’t use it. Beautiful Soup thích hành vi mặc định hơn, đó là chuyển đổi các trích dẫn thông minh của Microsoft thành các ký tự Unicode cùng với mọi thứ khác</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>78</p><p><h3 id="ma-hoa-khong-nhat-quan">Mã hóa không nhất quán¶</h3><p>Đôi khi, một tài liệu chủ yếu ở dạng UTF-8, nhưng chứa các ký tự Windows-1252, chẳng hạn như (một lần nữa) dấu ngoặc kép thông minh của Microsoft. Điều này có thể xảy ra khi một trang web bao gồm dữ liệu từ nhiều nguồn. Bạn có thể sử dụng <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>558 để biến một tài liệu như vậy thành UTF-8 thuần túy. Here’s a simple example</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>79</p><p>This document is a mess. The snowmen are in UTF-8 and the quotes are in Windows-1252. You can display the snowmen or the quotes, but not both</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>80</p><p>Decoding the document as UTF-8 raises a <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>559, and decoding it as Windows-1252 gives you gibberish. Fortunately, <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>558 will convert the string to pure UTF-8, allowing you to decode it to Unicode and display the snowmen and quote marks simultaneously</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>81</p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>558 only knows how to handle Windows-1252 embedded in UTF-8 (or vice versa, I suppose), but this is the most common case</p><p>Note that you must know to call <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>558 on your data before passing it into <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 or the <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>549 constructor. Beautiful Soup assumes that a document has a single encoding, whatever it might be. If you pass it a document that contains both UTF-8 and Windows-1252, it’s likely to think the whole document is Windows-1252, and the document will come out looking like <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>565</p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>558 is new in Beautiful Soup 4. 1. 0</p><p>Line numbers¶<p>The <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>567 and <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>568 parsers can keep track of where in the original document each Tag was found. You can access this information as <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>569 (line number) and <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>570 (position of the start tag within a line)</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>82</p><p>Note that the two parsers mean slightly different things by <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>571 and <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>572. For html. parser, these numbers represent the position of the initial less-than sign. For html5lib, these numbers represent the position of the final greater-than sign</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>83</p><p>You can shut off this feature by passing <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>573 constructor</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>84</p><p>This feature is new in 4. 8. 1, and the parsers based on lxml don’t support it</p><p>Comparing objects for equality¶<p>Beautiful Soup says that two <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>27 or <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>26 objects are equal when they represent the same HTML or XML markup. In this example, the two <b> tags are treated as equal, even though they live in different parts of the object tree, because they both look like “<b>pizza</b>”:</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>85</p><p>If you want to see whether two variables refer to exactly the same object, use is</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>86</p><p>Copying Beautiful Soup objects¶<p>You can use <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>576 to create a copy of any <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>26 or <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>27</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>87</p><p>The copy is considered equal to the original, since it represents the same markup as the original, but it’s not the same object</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>88</p><p>The only real difference is that the copy is completely detached from the original Beautiful Soup object tree, just as if <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>86 had been called on it</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>89</p><p>This is because two different <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>26 objects can’t occupy the same space at the same time</p><p>Advanced parser customization¶<p>Beautiful Soup offers a number of ways to customize how the parser treats incoming HTML and XML. Phần này bao gồm các kỹ thuật tùy chỉnh được sử dụng phổ biến nhất</p><p><h2 id="parsing-only-part-of-a-document">Parsing only part of a document¶</h2><p>Let’s say you want to use Beautiful Soup look at a document’s <a> tags. It’s a waste of time and memory to parse the entire document and then go over it again looking for <a> tags. It would be much faster to ignore everything that wasn’t an <a> tag in the first place. The <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>581 class allows you to choose which parts of an incoming document are parsed. You just create a <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>581 and pass it in to the <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 constructor as the <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>584 argument.</p><p>(Note that this feature won’t work if you’re using the html5lib parser. If you use html5lib, the whole document will be parsed, no matter what. Điều này là do html5lib liên tục sắp xếp lại cây phân tích cú pháp khi nó hoạt động và nếu một phần nào đó của tài liệu không thực sự đưa nó vào cây phân tích cú pháp, nó sẽ bị lỗi. Để tránh nhầm lẫn, trong các ví dụ bên dưới, tôi sẽ buộc Beautiful Soup sử dụng trình phân tích cú pháp tích hợp sẵn của Python. )</p><p><h3 id="from-bs4-import-beautifulsoup-soup-beautifulsoup-html-doc-html-parser-print-soup-prettify-lt-html-gt-lt-head-gt-lt-title-gt-the-dormouse-s-story-lt-title-gt-lt-head-gt-lt-body-gt-lt-p-class-title-gt-lt-b-gt-the-dormouse-s-story-lt-b-gt-lt-p-gt-lt-p-class-story-gt-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-lt-a-class-sister-href-http-example-com-elsie-id-link1-gt-elsie-lt-a-gt-lt-a-class-sister-href-http-example-com-lacie-id-link2-gt-lacie-lt-a-gt-and-lt-a-class-sister-href-http-example-com-tillie-id-link3-gt-tillie-lt-a-gt-and-they-lived-at-the-bottom-of-a-well-lt-p-gt-lt-p-class-story-gt-lt-p-gt-lt-body-gt-lt-html-gt-581">from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, 'html.parser') print(soup.prettify()) # <html> # <head> # <title> # The Dormouse's story # </title> # </head> # <body> # <p class="title"> # <b> # The Dormouse's story # </b> # </p> # <p class="story"> # Once upon a time there were three little sisters; and their names were # <a class="sister" href="http://example.com/elsie" id="link1"> # Elsie # </a> # , # <a class="sister" href="http://example.com/lacie" id="link2"> # Lacie # </a> # and # <a class="sister" href="http://example.com/tillie" id="link3"> # Tillie # </a> # ; and they lived at the bottom of a well. # </p> # <p class="story"> # ... # </p> # </body> # </html> 581¶</h3><p>The <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>581 class takes the same arguments as a typical method from Searching the tree. <span>name</span> , <span>attrs</span> , <span>string</span> , and <span>**kwargs</span> . Here are three <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>581 objects. </p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>90</p><p>I’m going to bring back the “three sisters” document one more time, and we’ll see what the document looks like when it’s parsed with these three <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>581 objects</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>91</p><p>Bạn cũng có thể chuyển một <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>581 vào bất kỳ phương thức nào được đề cập trong Tìm kiếm trên cây. Điều này có lẽ không hữu ích lắm, nhưng tôi nghĩ tôi sẽ đề cập đến nó</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>92</p><p><h2 id="customizing-multi-valued-attributes">Customizing multi-valued attributes¶</h2><p>Trong tài liệu HTML, một thuộc tính như <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>35 được cung cấp một danh sách các giá trị và một thuộc tính như <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>59 được cung cấp một giá trị, bởi vì đặc tả HTML xử lý các thuộc tính đó theo cách khác</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>93</p><p>Bạn có thể tắt tính năng này bằng cách chuyển vào <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>41. Hơn tất cả các thuộc tính sẽ được cung cấp một giá trị duy nhất</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>94</p><p>Bạn có thể tùy chỉnh hành vi này một chút bằng cách chuyển vào từ điển cho <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>44. Nếu bạn cần điều này, hãy xem <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>594 để xem cấu hình mà Beautiful Soup sử dụng theo mặc định, dựa trên đặc tả HTML</p><p>(Đây là tính năng mới trong Beautiful Soup 4. 8. 0. )</p><p><h2 id="handling-duplicate-attributes">Handling duplicate attributes¶</h2><p>When using the <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>567 parser, you can use the <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>596 constructor argument to customize what Beautiful Soup does when it encounters a tag that defines the same attribute more than once</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>95</p><p>The default behavior is to use the last value found for the tag</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>96</p><p>Với <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>597, bạn có thể yêu cầu Beautiful Soup sử dụng giá trị đầu tiên được tìm thấy và bỏ qua phần còn lại</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>97</p><p>(lxml và html5lib luôn làm theo cách này; hành vi của chúng không thể được định cấu hình từ bên trong Beautiful Soup. )</p><p>If you need more, you can pass in a function that’s called on each duplicate value</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>98</p><p>(Đây là tính năng mới trong Beautiful Soup 4. 9. 1. )</p><p><h2 id="khoi-tao-cac-lop-con-tuy-chinh">Khởi tạo các lớp con tùy chỉnh¶</h2><p>Khi trình phân tích cú pháp nói với Beautiful Soup về một thẻ hoặc một chuỗi, Beautiful Soup sẽ khởi tạo một đối tượng <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>26 hoặc <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>27 để chứa thông tin đó. Thay vì hành vi mặc định đó, bạn có thể yêu cầu Beautiful Soup khởi tạo các lớp con của <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>26 hoặc <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>27, các lớp con mà bạn xác định bằng hành vi tùy chỉnh</p><p><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>99</p><p>Điều này có thể hữu ích khi kết hợp Beautiful Soup vào khung thử nghiệm</p><p>(Đây là tính năng mới trong Beautiful Soup 4. 8. 1. )</p><p>Xử lý sự cố¶<p><h2 id="from-bs4-import-beautifulsoup-soup-beautifulsoup-html-doc-html-parser-print-soup-prettify-lt-html-gt-lt-head-gt-lt-title-gt-the-dormouse-s-story-lt-title-gt-lt-head-gt-lt-body-gt-lt-p-class-title-gt-lt-b-gt-the-dormouse-s-story-lt-b-gt-lt-p-gt-lt-p-class-story-gt-once-upon-a-time-there-were-three-little-sisters-and-their-names-were-lt-a-class-sister-href-http-example-com-elsie-id-link1-gt-elsie-lt-a-gt-lt-a-class-sister-href-http-example-com-lacie-id-link2-gt-lacie-lt-a-gt-and-lt-a-class-sister-href-http-example-com-tillie-id-link3-gt-tillie-lt-a-gt-and-they-lived-at-the-bottom-of-a-well-lt-p-gt-lt-p-class-story-gt-lt-p-gt-lt-body-gt-lt-html-gt-602">from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, 'html.parser') print(soup.prettify()) # <html> # <head> # <title> # The Dormouse's story # </title> # </head> # <body> # <p class="title"> # <b> # The Dormouse's story # </b> # </p> # <p class="story"> # Once upon a time there were three little sisters; and their names were # <a class="sister" href="http://example.com/elsie" id="link1"> # Elsie # </a> # , # <a class="sister" href="http://example.com/lacie" id="link2"> # Lacie # </a> # and # <a class="sister" href="http://example.com/tillie" id="link3"> # Tillie # </a> # ; and they lived at the bottom of a well. # </p> # <p class="story"> # ... # </p> # </body> # </html> 602¶</h2><p>Nếu bạn gặp khó khăn trong việc hiểu Beautiful Soup làm gì với một tài liệu, hãy chuyển tài liệu đó vào hàm <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>602. (Mới trong Súp đẹp 4. 2. 0. ) Beautiful Soup sẽ in ra một báo cáo cho bạn biết các trình phân tích cú pháp khác nhau xử lý tài liệu như thế nào và cho bạn biết nếu bạn đang thiếu một trình phân tích cú pháp mà Beautiful Soup có thể đang sử dụng</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>00</p><p>Chỉ cần nhìn vào đầu ra của chẩn đoán () có thể chỉ cho bạn cách giải quyết vấn đề. Ngay cả khi không, bạn có thể dán đầu ra của <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>602 khi yêu cầu trợ giúp</p><p><h2 id="loi-khi-phan-tich-tai-lieu">Lỗi khi phân tích tài liệu¶</h2><p>Có hai loại lỗi phân tích cú pháp khác nhau. Có sự cố xảy ra khi bạn cung cấp tài liệu cho Beautiful Soup và nó đưa ra một ngoại lệ, thường là <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>605. Và có hành vi không mong muốn, trong đó cây phân tích Beautiful Soup trông khác rất nhiều so với tài liệu được sử dụng để tạo ra nó</p><p>Hầu như không có vấn đề nào trong số này trở thành vấn đề với Beautiful Soup. Điều này không phải vì Beautiful Soup là một phần mềm được viết tốt một cách đáng kinh ngạc. Đó là bởi vì Beautiful Soup không bao gồm bất kỳ mã phân tích cú pháp nào. Thay vào đó, nó dựa vào các trình phân tích cú pháp bên ngoài. Nếu một trình phân tích cú pháp không hoạt động trên một tài liệu nhất định, giải pháp tốt nhất là thử một trình phân tích cú pháp khác. Xem Cài đặt trình phân tích cú pháp để biết chi tiết và so sánh trình phân tích cú pháp</p><p>The most common parse errors are <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>606 and <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>607. Cả hai đều được tạo bởi thư viện trình phân tích cú pháp HTML tích hợp của Python và giải pháp là <span>cài đặt lxml hoặc html5lib. </span></p><p>Loại hành vi không mong muốn phổ biến nhất là bạn không thể tìm thấy thẻ mà bạn biết là có trong tài liệu. Bạn đã thấy nó đi vào, nhưng <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42 trả về <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>609 hoặc <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>53 trả về <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>02. Đây là một vấn đề phổ biến khác với trình phân tích cú pháp HTML tích hợp của Python, đôi khi bỏ qua các thẻ mà nó không hiểu. Một lần nữa, giải pháp tốt nhất là <span>cài đặt lxml hoặc html5lib. </span></p><p><h2 id="su-co-phien-ban-khong-khop">Sự cố phiên bản không khớp¶</h2><ul><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>612 (trên đường dây <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>613). Nguyên nhân do chạy phiên bản Python 2 cũ của Beautiful Soup trong Python 3 mà không chuyển đổi mã</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>614 - Nguyên nhân do chạy phiên bản Python 2 cũ của Beautiful Soup trong Python 3</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>615 - Nguyên nhân do chạy phiên bản Python 3 của Beautiful Soup trong Python 2</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>616 - Nguyên nhân do chạy mã Beautiful Soup 3 trên hệ thống chưa cài đặt BS3. Hoặc do viết mã Beautiful Soup 4 mà không biết tên gói đã đổi thành <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>19</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>618 - Caused by running Beautiful Soup 4 code on a system that doesn’t have BS4 installed</p></li></ul></p><p><h2 id="parsing-xml">Parsing XML¶</h2><p>By default, Beautiful Soup parses documents as HTML. To parse a document as XML, pass in “xml” as the second argument to the <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 constructor</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>01</p><p>You’ll need to <span>have lxml installed</span> . </p><p><h2 id="other-parser-problems">Other parser problems¶</h2><ul><li><p>If your script works on one computer but not another, or in one virtual environment but not another, or outside the virtual environment but not inside, it’s probably because the two environments have different parser libraries available. For example, you may have developed the script on a computer that has lxml installed, and then tried to run it on a computer that only has html5lib installed. See Differences between parsers for why this matters, and fix the problem by mentioning a specific parser library in the <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 constructor</p></li><li><p>Because HTML tags and attributes are case-insensitive, all three HTML parsers convert tag and attribute names to lowercase. That is, the markup is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to is converted to . If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to <span>parse the document as XML. </span></p></li></ul></p><p><h2 id="miscellaneous">Miscellaneous¶</h2><ul><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>621 (or just about any other <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>622) - This problem shows up in two main situations. First, when you try to print a Unicode character that your console doesn’t know how to display. (See this page on the Python wiki for help. ) Second, when you’re writing to a file and you pass in a Unicode character that’s not supported by your default encoding. In this case, the simplest solution is to explicitly encode the Unicode string into UTF-8 with <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>623</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>624 - Caused by accessing <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>625 when the tag in question doesn’t define the <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>626 attribute. The most common errors are <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>627 and <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>628. Use <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>629 if you’re not sure <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>626 is defined, just as you would with a Python dictionary</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>631 - This usually happens because you expected <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42 to return a single tag or string. But <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42 returns a _list_ of tags and strings–a <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>634 object. You need to iterate over the list and look at the <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>635 of each one. Or, if you really only want one result, you need to use <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>53 instead of <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>42</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>638 - This usually happens because you called <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>53 and then tried to access the . foo` attribute of the result. But in your case, <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>53 didn’t find anything, so it returned <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>02, instead of returning a tag or a string. You need to figure out why your <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>53 call isn’t returning anything</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>643 - This usually happens because you’re treating a string as though it were a tag. You may be iterating over a list, expecting that it contains nothing but tags, when it actually contains both tags and strings</p></li></ul></p><p><h2 id="improving-performance">Improving Performance¶</h2><p>Beautiful Soup will never be as fast as the parsers it sits on top of. If response time is critical, if you’re paying for computer time by the hour, or if there’s any other reason why computer time is more valuable than programmer time, you should forget about Beautiful Soup and work directly atop lxml</p><p>That said, there are things you can do to speed up Beautiful Soup. If you’re not using lxml as the underlying parser, my advice is to <span>start</span> . Beautiful Soup parses documents significantly faster using lxml than using html. trình phân tích cú pháp hoặc html5lib. </p><p>Bạn có thể tăng tốc độ phát hiện mã hóa đáng kể bằng cách cài đặt thư viện cchardet</p><p>Parsing only part of a document won’t save you much time parsing the document, but it can save a lot of memory, and it’ll make searching the document much faster</p><p>Phiên dịch tài liệu này¶<p>Bản dịch mới của tài liệu Beautiful Soup được đánh giá rất cao. Translations should be licensed under the MIT license, just like Beautiful Soup and its English documentation are</p><p>Có hai cách để đưa bản dịch của bạn vào cơ sở mã chính và lên trang web Beautiful Soup</p><ol><li><p>Create a branch of the Beautiful Soup repository, add your translation, and propose a merge with the main branch, the same as you would do with a proposed change to the source code</p></li><li><p>Gửi tin nhắn đến nhóm thảo luận Beautiful Soup với liên kết đến bản dịch của bạn hoặc đính kèm bản dịch của bạn vào tin nhắn</p></li></ol><p>Use the Chinese or Brazilian Portuguese translations as your model. In particular, please translate the source file <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>644, rather than the HTML version of the documentation. Điều này cho phép xuất bản tài liệu ở nhiều định dạng khác nhau, không chỉ HTML</p><p>Súp đẹp mắt 3¶<p>Beautiful Soup 3 là sê-ri phát hành trước đó và không còn được phát triển tích cực nữa. Nó hiện được đóng gói với tất cả các bản phân phối Linux chính</p><p>$ apt-get install python-beautifulsoup</p><p>Nó cũng được xuất bản thông qua PyPi với tên <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06</p><p>$ easy_install BeautifulSoup</p><p>$ pip cài đặt BeautifulSoup</p><p>Bạn cũng có thể tải xuống tarball của Beautiful Soup 3. 2. 0</p><p>Nếu bạn đã chạy <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>646 hoặc <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>647, nhưng mã của bạn không hoạt động, bạn đã cài đặt nhầm Beautiful Soup 3. Bạn cần chạy <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>648</p><p>Tài liệu về Beautiful Soup 3 được lưu trữ trực tuyến</p><p><h2 id="chuyen-ma-sang-bs4">Chuyển mã sang BS4¶</h2><p>Hầu hết các mã được viết cho Beautiful Soup 3 sẽ hoạt động với Beautiful Soup 4 với một thay đổi đơn giản. Tất cả những gì bạn phải làm là thay đổi tên gói từ <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 thành <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>19. Vì vậy, điều này</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>02</p><p>trở thành cái này</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>03</p><ul><li><p>Nếu bạn nhận được thông báo "Không có mô-đun nào có tên là BeautifulSoup" của <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>651, vấn đề của bạn là bạn đang cố chạy mã Beautiful Soup 3, nhưng bạn chỉ cài đặt Beautiful Soup 4</p></li><li><p>Nếu bạn nhận được thông báo "No module named bs4" của <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>651, thì vấn đề là bạn đang cố chạy mã Beautiful Soup 4, nhưng bạn chỉ cài đặt Beautiful Soup 3</p></li></ul><p>Mặc dù BS4 hầu như tương thích ngược với BS3, nhưng hầu hết các phương pháp của nó đã không còn được dùng nữa và được đặt tên mới cho việc tuân thủ PEP 8. Có rất nhiều lần đổi tên và thay đổi khác, và một vài trong số chúng phá vỡ khả năng tương thích ngược</p><p>Đây là những gì bạn cần biết để chuyển đổi mã BS3 và thói quen của mình sang BS4</p><p><h3 id="you-need-a-parser">You need a parser¶</h3><p>Beautiful Soup 3 đã sử dụng <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>653 của Python, một mô-đun không được dùng nữa và đã bị xóa trong Python 3. 0. Beautiful Soup 4 sử dụng <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>567 theo mặc định, nhưng bạn có thể cắm lxml hoặc html5lib và sử dụng thay thế. See Installing a parser for a comparison</p><p>Vì <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>567 không phải là trình phân tích cú pháp giống như <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>653, nên bạn có thể thấy rằng Beautiful Soup 4 cung cấp cho bạn một cây phân tích cú pháp khác với Beautiful Soup 3 cho cùng một mã đánh dấu. Nếu bạn hoán đổi <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>567 lấy lxml hoặc html5lib, bạn có thể thấy rằng cây phân tích lại thay đổi. Nếu điều này xảy ra, bạn sẽ cần cập nhật mã cạo của mình để xử lý cây mới</p><p><h3 id="ten-phuong-thuc">Tên phương thức¶</h3><ul><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>658 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>659</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>660 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>661</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>662 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>663</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>664 -> <pre><span>print</span><span>(</span><span>soup</span><span>.</span><span>get_text</span><span>())</span> <span># The Dormouse's story</span> <span>#</span> <span># The Dormouse's story</span> <span>#</span> <span># Once upon a time there were three little sisters; and their names were</span> <span># Elsie,</span> <span># Lacie and</span> <span># Tillie;</span> <span># and they lived at the bottom of a well.</span> <span>#</span> <span># ...</span> </pre>14</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>666 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>667</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>668 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>669</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>670 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>671</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>672 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>673</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>674 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>675</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>676 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>677</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>678 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>679</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>680 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>681</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>682 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>683</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>684 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>685</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>686 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>687</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>688 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>689</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>690 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>691</p></li></ul><p>Một số đối số cho hàm tạo Beautiful Soup đã được đổi tên vì những lý do tương tự</p><ul><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>692 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>693</p></li><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>694 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>695</p></li></ul><p>Tôi đã đổi tên một phương thức để tương thích với Python 3</p><ul><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>696 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>697</p></li></ul><p>Tôi đã đổi tên một thuộc tính để sử dụng thuật ngữ chính xác hơn</p><ul><li><p><pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>698 -> <pre><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>html_doc</span><span>,</span> <span>'html.parser'</span><span>)</span> <span>print</span><span>(</span><span>soup</span><span>.</span><span>prettify</span><span>())</span> <span># <html></span> <span># <head></span> <span># <title></span> <span># The Dormouse's story</span> <span># </title></span> <span># </head></span> <span># <body></span> <span># <p class="title"></span> <span># <b></span> <span># The Dormouse's story</span> <span># </b></span> <span># </p></span> <span># <p class="story"></span> <span># Once upon a time there were three little sisters; and their names were</span> <span># <a class="sister" href="http://example.com/elsie" id="link1"></span> <span># Elsie</span> <span># </a></span> <span># ,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2"></span> <span># Lacie</span> <span># </a></span> <span># and</span> <span># <a class="sister" href="http://example.com/tillie" id="link3"></span> <span># Tillie</span> <span># </a></span> <span># ; and they lived at the bottom of a well.</span> <span># </p></span> <span># <p class="story"></span> <span># ...</span> <span># </p></span> <span># </body></span> <span># </html></span> </pre>699</p></li></ul><p>Tôi đã đổi tên ba thuộc tính để tránh sử dụng các từ có ý nghĩa đặc biệt đối với Python. Không giống như những thay đổi khác, những thay đổi này không tương thích ngược. If you used these attributes in BS3, your code will break on BS4 until you change them</p><ul><li><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>300 -> <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>301</p></li><li><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>302 -> <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>303</p></li><li><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>304 -> <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>305</p></li></ul><p>Các phương thức này còn sót lại từ API Beautiful Soup 2. Chúng không còn được dùng nữa từ năm 2006 và hoàn toàn không nên sử dụng</p><ul><li><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>306</p></li><li><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>307</p></li><li><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>308</p></li><li><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>307</p></li><li><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>310</p></li><li><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>311</p></li><li><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>312</p></li></ul></p><p><h3 id="may-phat-dien">Máy phát điện¶</h3><p>Tôi đã đặt các tên tuân thủ PEP 8 của trình tạo và biến chúng thành các thuộc tính</p><ul><li><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>313 -> <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>314</p></li><li><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>315 -> <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>316</p></li><li><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>317 -> <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>318</p></li><li><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>319 -> <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>320</p></li><li><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>321 -> <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>322</p></li><li><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>323 -> <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>324</p></li><li><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>325 -> <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>326</p></li></ul><p>Vì vậy, thay vì điều này</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>04</p><p>You can write this</p><p><p><pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>05</p><p>(Nhưng mã cũ sẽ vẫn hoạt động. )</p><p>Some of the generators used to yield <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>02 after they were done, and then stop. đó là một lỗi. Bây giờ các máy phát điện chỉ dừng lại</p><p>Có hai trình tạo mới, <span>. chuỗi và. stripped_strings</span> . <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>03 yields NavigableString objects, and <pre><span>for</span> <span>link</span> <span>in</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>):</span> <span>print</span><span>(</span><span>link</span><span>.</span><span>get</span><span>(</span><span>'href'</span><span>))</span> <span># http://example.com/elsie</span> <span># http://example.com/lacie</span> <span># http://example.com/tillie</span> </pre>06 yields Python strings that have had whitespace stripped. </p><p><h3>XML¶</h3><p>Không còn lớp <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>330 để phân tích cú pháp XML. Để phân tích cú pháp XML, bạn chuyển vào “xml” làm đối số thứ hai cho hàm tạo <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06. Vì lý do tương tự, hàm tạo <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 không còn nhận ra đối số <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>333</p><p>Việc xử lý các thẻ XML phần tử trống của Beautiful Soup đã được cải thiện. Trước đây khi bạn phân tích cú pháp XML, bạn phải nói rõ ràng thẻ nào được coi là thẻ phần tử rỗng. Đối số <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>334 cho hàm tạo không còn được nhận dạng. Thay vào đó, Beautiful Soup coi bất kỳ thẻ trống nào là thẻ phần tử trống. Nếu bạn thêm một phần tử con vào thẻ phần tử trống, nó sẽ không còn là thẻ phần tử trống nữa</p><p><h3 id="thuc-the">Thực thể¶</h3><p>Một thực thể HTML hoặc XML đến luôn được chuyển đổi thành ký tự Unicode tương ứng. Beautiful Soup 3 có một số cách đối phó với các thực thể chồng chéo, đã bị loại bỏ. Hàm tạo <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 không còn nhận ra các đối số <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>336 hoặc <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>337. (Unicode, Dammit vẫn có <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>338, nhưng mặc định của nó giờ là chuyển smart quote thành Unicode. ) Các hằng số <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>339, <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>340 và <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>341 đã bị xóa vì chúng định cấu hình một tính năng (chuyển đổi một số nhưng không phải tất cả các thực thể thành ký tự Unicode) không còn tồn tại</p><p>Nếu bạn muốn chuyển các ký tự Unicode trở lại thành các thực thể HTML khi xuất ra, thay vì chuyển chúng thành các ký tự UTF-8, bạn cần sử dụng <span>trình định dạng đầu ra</span>.</p><p><h3 id="miscellaneous">Miscellaneous¶</h3><p><span>Thẻ. chuỗi</span> hiện hoạt động theo cách đệ quy. Nếu thẻ A chỉ chứa một thẻ B và không có gì khác, thì A. chuỗi giống như B. chuỗi. (Trước đây, nó là Không có. )</p><p>Các thuộc tính đa giá trị như <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>35 có danh sách các chuỗi làm giá trị của chúng, không phải chuỗi. Điều này có thể ảnh hưởng đến cách bạn tìm kiếm theo lớp CSS</p><p>Các đối tượng <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>26 hiện triển khai phương thức <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>344, sao cho hai đối tượng <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>26 được coi là bằng nhau nếu chúng tạo ra cùng một đánh dấu. Điều này có thể thay đổi hành vi của tập lệnh nếu bạn đặt các đối tượng <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>26 vào từ điển hoặc đặt</p><p>Nếu bạn chuyển một trong các phương thức của <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>347 cả <span>chuỗi</span> và đối số dành riêng cho thẻ như <span>name . chuỗi</span>, Beautiful Soup will search for tags that match your tag-specific criteria and whose <span>Tag.string</span> khớp với giá trị của bạn cho <span>chuỗi</span> . Nó sẽ không tự tìm thấy các chuỗi. Trước đây, Beautiful Soup đã bỏ qua các đối số dành riêng cho thẻ và tìm kiếm các chuỗi. </p><p>Hàm tạo <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>06 không còn nhận ra đối số đánh dấuMassage. Giờ đây, trách nhiệm của trình phân tích cú pháp là xử lý đánh dấu một cách chính xác</p><p>Các lớp trình phân tích cú pháp thay thế hiếm khi được sử dụng như <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>349 và <pre><span>soup</span><span>.</span><span>title</span> <span># <title>The Dormouse's story</title></span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>name</span> <span># u'title'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>string</span> <span># u'The Dormouse's story'</span> <span>soup</span><span>.</span><span>title</span><span>.</span><span>parent</span><span>.</span><span>name</span> <span># u'head'</span> <span>soup</span><span>.</span><span>p</span> <span># <p class="title"><b>The Dormouse's story</b></p></span> <span>soup</span><span>.</span><span>p</span><span>[</span><span>'class'</span><span>]</span> <span># u'title'</span> <span>soup</span><span>.</span><span>a</span> <span># <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a></span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'a'</span><span>)</span> <span># [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,</span> <span># <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]</span> <span>soup</span><span>.</span><span>find</span><span>(</span><span>id</span><span>=</span><span>"link3"</span><span>)</span> <span># <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a></span> </pre>350 đã bị xóa. Bây giờ là quyết định của trình phân tích cú pháp về cách xử lý đánh dấu mơ hồ</p><div></div> <div></div> <h3 id="lam-cach-nao-de-lay-phan-tu-html-trong-php">Làm cách nào để lấy phần tử HTML trong PHP?</h3> <div>Using the PHP DOMDocument Class, call the DOMDocument object. Gọi hàm loadHTML() được xác định trước với các tham số biến. <span>Sử dụng hàm DOM getElementById(), chúng tôi nhận được giá trị phần tử HTML</span> . </div> <h3 id="lam-cach-nao-de-su-dung-file-get-html-trong-php">Làm cách nào để sử dụng File_get_html trong PHP?</h3> <div>You want file_get_html because file_get_contents will load the response body into a string but file_get_html will load it into simple-html-dom. <span>$dom = file_get_html($url);</span> <span>$dom = str_get_html(file_get_contents($url));</span></div></p></td></tr></table> <script async src="/dist/js/lazyhtml.min.js" crossorigin="anonymous"></script> <div class="lazyhtml" data-lazyhtml> <script type="text/lazyhtml"> <div class="youtubeVideo"><h3>Video liên quan</h3> <iframe width="560" height="315" src="https://www.youtube.com/embed/BTTO6qr3psI?controls=0" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"allowfullscreen></iframe> </div> </script> </div> <div class="tags pt-3"> <a href="https://toanthua.com/tags/programming" class="tag-link">programming</a> <a href="https://toanthua.com/tags/html" class="tag-link">html</a> </div> <div class="post-tools"> <button data-postid="html-tim" class="btn btn-answerModalBox"><img class="mr-1" alt="Html->tìm" src="/dist/images/svg/messages_16.svg">Reply</button> <button data-postid="html-tim" data-vote="up" class="btn btn-doVote"><img class="mr-1" alt="Html->tìm" src="/dist/images/svg/face-smile_16.svg">7</button> <button data-postid="html-tim" data-vote="down" class="btn btn-doVote"><img class="mr-1" alt="Html->tìm" src="/dist/images/svg/poo_16.svg">1</button> <button class="btn"><img class="mr-1" alt="Html->tìm" src="/dist/images/svg/facebook_16.svg">Chia sẻ</button> </div> </div><!-- end question-post-body --> </div><!-- end question-post-body-wrap --> </div><!-- end question --> <div id="answers_html-tim" class="answers"> </div><!-- end answer-wrap --> <div class="entryFooter"> <div class="footerLinkAds"></div> <div class="footerRelated"><div class="postRelatedWidget"> <h2>Bài Viết Liên Quan</h2> <div class="questions-snippet layoutNews border-top border-top-gray"> <div class="media media-card rounded-0 shadow-none mb-0 bg-transparent py-4 px-0 border-bottom border-bottom-gray"> <div class="media-image"> <a href="/chung-ta-co-the-tam-dung-setinterval-javascript-khong"><img src="https://ap.cdnki.com/r_chung-ta-co-the-tam-dung-setinterval-javascript-khong---5edc11cdb14f9d9dc75d75d2012ad840.webp" alt="Chúng ta có thể tạm dừng setinterval javascript không?"></a> </div> <div class="media-body"> <h5 class="mb-2 fw-medium"><a href="/chung-ta-co-the-tam-dung-setinterval-javascript-khong">Chúng ta có thể tạm dừng setinterval javascript không?</a></h5> <p class="mb-2 truncate lh-20 fs-15"></p> <div class="media media-card questionTags user-media px-0 border-bottom-0 pb-0"> <div class="tags"> </div> </div> </div> </div><!-- end media --> <div class="media media-card rounded-0 shadow-none mb-0 bg-transparent py-4 px-0 border-bottom border-bottom-gray"> <div class="media-image"> <a href="/cach-truy-cap-bien-ben-trong-ham-python"><img src="https://ap.cdnki.com/r_cach-truy-cap-bien-ben-trong-ham-python---9241d707b0cf6e48533863c88b01be5f.webp" alt="Cách truy cập biến bên trong hàm python"></a> </div> <div class="media-body"> <h5 class="mb-2 fw-medium"><a href="/cach-truy-cap-bien-ben-trong-ham-python">Cách truy cập biến bên trong hàm python</a></h5> <p class="mb-2 truncate lh-20 fs-15"></p> <div class="media media-card questionTags user-media px-0 border-bottom-0 pb-0"> <div class="tags"> </div> </div> </div> </div><!-- end media --> <div class="media media-card rounded-0 shadow-none mb-0 bg-transparent py-4 px-0 border-bottom border-bottom-gray"> <div class="media-image"> <a href="/5-kilogam-bang-bao-nhieu-de-xi-gam"><img src="https://ap.cdnki.com/r_5-kilogam-bang-bao-nhieu-de-xi-gam---8c4bf77202f14f93cd7a3c243d37266f.webp" alt="5 kilogam bằng bao nhiêu đề xi gam"></a> </div> <div class="media-body"> <h5 class="mb-2 fw-medium"><a href="/5-kilogam-bang-bao-nhieu-de-xi-gam">5 kilogam bằng bao nhiêu đề xi gam</a></h5> <p class="mb-2 truncate lh-20 fs-15"></p> <div class="media media-card questionTags user-media px-0 border-bottom-0 pb-0"> <div class="tags"> </div> </div> </div> </div><!-- end media --> <div class="media media-card rounded-0 shadow-none mb-0 bg-transparent py-4 px-0 border-bottom border-bottom-gray"> <div class="media-image"> <a href="/python-co-tot-de-tao-api-khong"><img src="https://ap.cdnki.com/r_python-co-tot-de-tao-api-khong---a076d374bbb9d3de2a813a89c58d26ef.webp" alt="Python có tốt để tạo API không?"></a> </div> <div class="media-body"> <h5 class="mb-2 fw-medium"><a href="/python-co-tot-de-tao-api-khong">Python có tốt để tạo API không?</a></h5> <p class="mb-2 truncate lh-20 fs-15"></p> <div class="media media-card questionTags user-media px-0 border-bottom-0 pb-0"> <div class="tags"> </div> </div> </div> </div><!-- end media --> <div class="media media-card rounded-0 shadow-none mb-0 bg-transparent py-4 px-0 border-bottom border-bottom-gray"> <div class="media-image"> <a href="/python-duoc-su-dung-de-lam-gi-mot-cach-chuyen-nghiep"><img src="https://ap.cdnki.com/r_python-duoc-su-dung-de-lam-gi-mot-cach-chuyen-nghiep---78aee7054d998b420947551a88557458.webp" alt="Python được sử dụng để làm gì một cách chuyên nghiệp?"></a> </div> <div class="media-body"> <h5 class="mb-2 fw-medium"><a href="/python-duoc-su-dung-de-lam-gi-mot-cach-chuyen-nghiep">Python được sử dụng để làm gì một cách chuyên nghiệp?</a></h5> <p class="mb-2 truncate lh-20 fs-15"></p> <div class="media media-card questionTags user-media px-0 border-bottom-0 pb-0"> <div class="tags"> </div> </div> </div> </div><!-- end media --> <div class="media media-card rounded-0 shadow-none mb-0 bg-transparent py-4 px-0 border-bottom border-bottom-gray"> <div class="media-image"> <a href="/viet-json-vao-tep-python-utf-8"><img src="https://ap.cdnki.com/r_viet-json-vao-tep-python-utf-8---0b122ce641d8f4a0c41cceb72b41d4bd.webp" alt="Viết json vào tệp python utf-8"></a> </div> <div class="media-body"> <h5 class="mb-2 fw-medium"><a href="/viet-json-vao-tep-python-utf-8">Viết json vào tệp python utf-8</a></h5> <p class="mb-2 truncate lh-20 fs-15"></p> <div class="media media-card questionTags user-media px-0 border-bottom-0 pb-0"> <div class="tags"> </div> </div> </div> </div><!-- end media --> <div class="media media-card rounded-0 shadow-none mb-0 bg-transparent py-4 px-0 border-bottom border-bottom-gray"> <div class="media-image"> <a href="/tu-dien-xay-dung-lop-python"><img src="https://ap.cdnki.com/r_tu-dien-xay-dung-lop-python---a6aa1871d9be2a4046cd4ef534ff5ee8.webp" alt="Từ điển xây dựng lớp Python"></a> </div> <div class="media-body"> <h5 class="mb-2 fw-medium"><a href="/tu-dien-xay-dung-lop-python">Từ điển xây dựng lớp Python</a></h5> <p class="mb-2 truncate lh-20 fs-15"></p> <div class="media media-card questionTags user-media px-0 border-bottom-0 pb-0"> <div class="tags"> </div> </div> </div> </div><!-- end media --> <div class="media media-card rounded-0 shadow-none mb-0 bg-transparent py-4 px-0 border-bottom border-bottom-gray"> <div class="media-image"> <a href="/strreplace-nhieu-tu-php"><img src="https://ap.cdnki.com/r_str_replace-nhieu-tu-php---634cc46a91fee6f2c1ed3a8d4716c31f.webp" alt="Str_replace nhiều từ php"></a> </div> <div class="media-body"> <h5 class="mb-2 fw-medium"><a href="/strreplace-nhieu-tu-php">Str_replace nhiều từ php</a></h5> <p class="mb-2 truncate lh-20 fs-15"></p> <div class="media media-card questionTags user-media px-0 border-bottom-0 pb-0"> <div class="tags"> </div> </div> </div> </div><!-- end media --> <div class="media media-card rounded-0 shadow-none mb-0 bg-transparent py-4 px-0 border-bottom border-bottom-gray"> <div class="media-image"> <a href="/lich-2023-in-mien-phi"><img src="https://ap.cdnki.com/r_lich-2023-in-mien-phi---0a55776d715161c199a2f17df907392c.webp" alt="Lịch 2023 in miễn phí"></a> </div> <div class="media-body"> <h5 class="mb-2 fw-medium"><a href="/lich-2023-in-mien-phi">Lịch 2023 in miễn phí</a></h5> <p class="mb-2 truncate lh-20 fs-15"></p> <div class="media media-card questionTags user-media px-0 border-bottom-0 pb-0"> <div class="tags"> </div> </div> </div> </div><!-- end media --> <div class="media media-card rounded-0 shadow-none mb-0 bg-transparent py-4 px-0 border-bottom border-bottom-gray"> <div class="media-image"> <a href="/university-of-pretoria-nursing-requirements-2024"><img src="https://ap.cdnki.com/r_university-of-pretoria-nursing-requirements-2024---1f24e30a0618a60e8d3249ae7e926caa.webp" alt="University of pretoria nursing requirements 2024"></a> </div> <div class="media-body"> <h5 class="mb-2 fw-medium"><a href="/university-of-pretoria-nursing-requirements-2024">University of pretoria nursing requirements 2024</a></h5> <p class="mb-2 truncate lh-20 fs-15"></p> <div class="media media-card questionTags user-media px-0 border-bottom-0 pb-0"> <div class="tags"> </div> </div> </div> </div><!-- end media --> </div> </div></div> <div class="footerRelated"></div> </div> </div> </div><!-- end question-main-bar --> </div><!-- end col-lg-9 --> <div class="col-right"> <div class="sidebar"> <div class="card card-item"> <div class="card-body"> <h3 class="fs-14 text-uppercase pb-3">MỚI CẬP NHẬP</h3> <div class="divider"><span></span></div> <div class="sidebar-questions pt-3"> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/bai-thuyet-trinh-ve-tap-doan-xang-dau-viet-nam-nam-2024">Bài thuyết trình về tập đoàn xăng dầu việt nam năm 2024</a></h5> <small class="meta"> <span class="pr-1">13 phúts trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/RowdyViolation" class="author">RowdyViolation</a> </small> </div> </div><!-- end media --> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/giai-bai-tap-hoa-hoc-trang-1-11-nam-2024">Giải bài tập hóa học trang 1 11 năm 2024</a></h5> <small class="meta"> <span class="pr-1">1 giờs trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/PunitiveCatholicism" class="author">PunitiveCatholicism</a> </small> </div> </div><!-- end media --> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/lam-the-nao-de-tro-nen-lanh-lung-nam-2024">Làm thế nào để trở nên lạnh lùng năm 2024</a></h5> <small class="meta"> <span class="pr-1">1 giờs trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/RagingDisability" class="author">RagingDisability</a> </small> </div> </div><!-- end media --> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/hinh-anh-van-hoa-viet-nam-voi-phuong-tay-nam-2024">Hình ảnh văn hóa việt nam với phương tây năm 2024</a></h5> <small class="meta"> <span class="pr-1">1 giờs trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/TautReinstatement" class="author">TautReinstatement</a> </small> </div> </div><!-- end media --> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/giam-doc-cong-an-tinh-daklak-vu-hong-van-nam-2024">Giám đốc công an tỉnh daklak vũ hồng văn năm 2024</a></h5> <small class="meta"> <span class="pr-1">1 giờs trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/RealisticAccommodation" class="author">RealisticAccommodation</a> </small> </div> </div><!-- end media --> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/bai-tap-viet-lai-cau-khong-lam-thay-doi-nghia-nam-2024">Bài tập viết lại câu không làm thay đổi nghĩa năm 2024</a></h5> <small class="meta"> <span class="pr-1">2 giờs trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/ThoughtfulIntercession" class="author">ThoughtfulIntercession</a> </small> </div> </div><!-- end media --> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/toan-lop-4-trang-78-luyen-tap-bai-3-nam-2024">Toán lớp 4 trang 78 luyện tập bài 3 năm 2024</a></h5> <small class="meta"> <span class="pr-1">2 giờs trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/UnconsciousGlucose" class="author">UnconsciousGlucose</a> </small> </div> </div><!-- end media --> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/bai-tap-chuong-ii-dai-so-lop-10-nam-2024">Bài tập chương ii đại số lớp 10 năm 2024</a></h5> <small class="meta"> <span class="pr-1">3 giờs trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/ContagiousMantra" class="author">ContagiousMantra</a> </small> </div> </div><!-- end media --> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/soan-toan-7-cong-tru-so-huu-ti-nam-2024">Soạn toán 7 cộng trừ số hữu tỉ năm 2024</a></h5> <small class="meta"> <span class="pr-1">3 giờs trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/IntangibleSiding" class="author">IntangibleSiding</a> </small> </div> </div><!-- end media --> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/bs-ngoc-lan-bv-tu-du-kham-ngay-nao-nam-2024">Bs ngọc lan bv từ dũ khám ngày nào năm 2024</a></h5> <small class="meta"> <span class="pr-1">3 giờs trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/CivicVulnerability" class="author">CivicVulnerability</a> </small> </div> </div><!-- end media --> </div><!-- end sidebar-questions --> </div> </div><!-- end card --> <div class="card card-item"> <div class="card-body"> <h3 class="fs-14 text-uppercase pb-3">Xem Nhiều</h3> <div class="divider"><span></span></div> <div class="sidebar-questions pt-3"> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/hoa-ra-anh-van-o-day-truyen-tom-tat-nam-2024">Hóa ra anh vẫn ở đây truyện tóm tắt năm 2024</a></h5> <small class="meta"> <span class="pr-1">2 ngàys trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/Long-lostFeedback" class="author">Long-lostFeedback</a> </small> </div> </div><!-- end media --> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/bai-thu-hoach-ve-hoc-tap-chuyen-de-nam-2023-nam-2024">Bài thu hoạch về học tập chuyên đề năm 2023 năm 2024</a></h5> <small class="meta"> <span class="pr-1">9 giờs trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/ElectricalRobber" class="author">ElectricalRobber</a> </small> </div> </div><!-- end media --> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/bai-thuyet-trinh-ve-tap-doan-xang-dau-viet-nam-nam-2024">Bài thuyết trình về tập đoàn xăng dầu việt nam năm 2024</a></h5> <small class="meta"> <span class="pr-1">13 phúts trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/RowdyViolation" class="author">RowdyViolation</a> </small> </div> </div><!-- end media --> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/bai-tap-cac-thi-trong-tieng-anh-va-cach-dung-nam-2024">Bài tập các thì trong tiếng anh và cách dùng năm 2024</a></h5> <small class="meta"> <span class="pr-1">5 ngàys trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/SonicMotto" class="author">SonicMotto</a> </small> </div> </div><!-- end media --> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/top-trung-tam-anh-ngu-gia-re-tphcm-nam-2024">Top trung tâm anh ngữ giá rẻ tphcm năm 2024</a></h5> <small class="meta"> <span class="pr-1">2 ngàys trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/SexyPotassium" class="author">SexyPotassium</a> </small> </div> </div><!-- end media --> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/trung-binh-moi-lan-quan-he-ton-bao-nhieu-calo-nam-2024">Trung bình mỗi lần quan hệ tốn bao nhiêu calo năm 2024</a></h5> <small class="meta"> <span class="pr-1">1 ngàys trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/ProdigiousTights" class="author">ProdigiousTights</a> </small> </div> </div><!-- end media --> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/cac-sach-tinh-toan-he-thong-xu-ly-nuoc-thai-nam-2024">Các sách tính toán hệ thống xử lý nước thải năm 2024</a></h5> <small class="meta"> <span class="pr-1">4 ngàys trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/SuppleProtein" class="author">SuppleProtein</a> </small> </div> </div><!-- end media --> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/bai-tap-co-ban-thi-hien-tai-hoan-thanh-violet-nam-2024">Bài tập cơ bản thì hiện tại hoàn thành violet năm 2024</a></h5> <small class="meta"> <span class="pr-1">6 ngàys trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/VeteranSemifinal" class="author">VeteranSemifinal</a> </small> </div> </div><!-- end media --> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/hoi-cho-cuoi-tuan-nha-van-hoa-thanh-nien-nam-2024">Hội chợ cuối tuần nhà văn hóa thanh niên năm 2024</a></h5> <small class="meta"> <span class="pr-1">5 ngàys trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/UnemployedShaving" class="author">UnemployedShaving</a> </small> </div> </div><!-- end media --> <div class="media media-card media--card media--card-2"> <div class="media-body"> <h5><a href="https://toanthua.com/treo-guong-bat-quai-loi-nhu-the-nao-nam-2024">Treo gương bát quái lồi như thế nào năm 2024</a></h5> <small class="meta"> <span class="pr-1">3 ngàys trước</span> <span class="pr-1">. bởi</span> <a href="https://toanthua.com/author/AnguishedTuesday" class="author">AnguishedTuesday</a> </small> </div> </div><!-- end media --> </div><!-- end sidebar-questions --> </div> </div><!-- end card --> </div><!-- end sidebar --> </div><!-- end col-lg-3 --> </div><!-- end row --> </div><!-- end container --> </section><!-- end question-area --> <!-- ================================ END QUESTION AREA ================================= --> <script>var questionId ='html-tim'</script> <script>var postTime ='2023-01-07T01:59:34.552Z'</script> <script>var siteDomain ='toanthua.com'</script> <script type="text/javascript" src="https://toanthua.com/dist/js/pages/comment.js"></script> <!-- ================================ END FOOTER AREA ================================= --> <section class="footer-area pt-80px bg-dark position-relative"> <span class="vertical-bar-shape vertical-bar-shape-1"></span> <span class="vertical-bar-shape vertical-bar-shape-2"></span> <span class="vertical-bar-shape vertical-bar-shape-3"></span> <span class="vertical-bar-shape vertical-bar-shape-4"></span> <div class="container"> <div class="row"> <div class="col-lg-3 responsive-column-half"> <div class="footer-item"> <h3 class="fs-18 fw-bold pb-2 text-white">Chúng tôi</h3> <ul class="generic-list-item generic-list-item-hover-underline pt-3 generic-list-item-white"> <li><a href="/about.html">Giới thiệu</a></li> <li><a href="/contact.html">Liên hệ</a></li> <li><a href="/contact.html">Tuyển dụng</a></li> <li><a href="/contact.html">Quảng cáo</a></li> </ul> </div><!-- end footer-item --> </div><!-- end col-lg-3 --> <div class="col-lg-3 responsive-column-half"> <div class="footer-item"> <h3 class="fs-18 fw-bold pb-2 text-white">Điều khoản</h3> <ul class="generic-list-item generic-list-item-hover-underline pt-3 generic-list-item-white"> <li><a href="/privacy-statement.html">Điều khoản hoạt động</a></li> <li><a href="/terms-and-conditions.html">Điều kiện tham gia</a></li> <li><a href="/privacy-statement.html">Quy định cookie</a></li> </ul> </div><!-- end footer-item --> </div><!-- end col-lg-3 --> <div class="col-lg-3 responsive-column-half"> <div class="footer-item"> <h3 class="fs-18 fw-bold pb-2 text-white">Trợ giúp</h3> <ul class="generic-list-item generic-list-item-hover-underline pt-3 generic-list-item-white"> <li><a href="/contact.html">Hướng dẫn</a></li> <li><a href="/contact.html">Loại bỏ câu hỏi</a></li> <li><a href="/contact.html">Liên hệ</a></li> </ul> </div><!-- end footer-item --> </div><!-- end col-lg-3 --> <div class="col-lg-3 responsive-column-half"> <div class="footer-item"> <h3 class="fs-18 fw-bold pb-2 text-white">Mạng xã hội</h3> <ul class="generic-list-item generic-list-item-hover-underline pt-3 generic-list-item-white"> <li><a href="#"><i class="fab fa-facebook-f mr-1"></i> Facebook</a></li> <li><a href="#"><i class="fab fa-twitter mr-1"></i> Twitter</a></li> <li><a href="#"><i class="fab fa-linkedin mr-1"></i> LinkedIn</a></li> <li><a href="#"><i class="fab fa-instagram mr-1"></i> Instagram</a></li> </ul> </div><!-- end footer-item --> </div><!-- end col-lg-3 --> </div><!-- end row --> </div><!-- end container --> <hr class="border-top-gray my-3"> <div class="container"> <div class="row align-items-center pb-4 copyright-wrap"> <div class="col-6"> <img src ="/dist/images/dmca_protected_sml.png"/> </div> <!-- end col-lg-6 --><div class="col-6"> <div class="copyright-desc text-right fs-14"> <div>Bản quyền © 2024 <a href="https://toanthua.com"></a> Inc.</div> </div> </div><!-- end col-lg-6 --> </div><!-- end row --> </div><!-- end container --> </section><!-- end footer-area --> <!-- ================================ END FOOTER AREA ================================= --> <!-- template js files --> <!-- start back to top --> <div id="back-to-top" data-toggle="tooltip" data-placement="top" title="Lên đầu trang"> <img alt="" src="/dist/images/svg/arrow-up_20.svg"> </div> <!-- end back to top --> <script src="https://toanthua.com/dist/js/bootstrap.bundle.min.js"></script> <script src="https://toanthua.com/dist/js/sweetalert2.js"></script> <script src="https://toanthua.com/dist/js/moment.js"></script> <script src="https://toanthua.com/dist/js/main.js?v=1"></script> <!-- Google Tag Manager (noscript) --> <noscript><iframe src="https://www.googletagmanager.com/ns.html?id=" height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript> <!-- End Google Tag Manager (noscript) --> </body> </html>