Hướng dẫn get json from script tag python - lấy json từ python thẻ script

5

Mới!Lưu câu hỏi hoặc câu trả lời và sắp xếp nội dung yêu thích của bạn.Tìm hiểu thêm.
Learn more.

Tôi muốn trích xuất reviewCount từ thẻ script bằng súp đẹp.Đã thử cách tiếp cận khác nhưng không thành công.

<script type="application/json" data-initial-state="review-filter">
{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}
</script>

Hướng dẫn get json from script tag python - lấy json từ python thẻ script

Martineau

Huy hiệu vàng 116K2525 gold badges160 silver badges285 bronze badges

Hỏi ngày 14 tháng 4 năm 2020 lúc 21:28Apr 14, 2020 at 21:28

1

Điều này sẽ hoạt động, tôi hoàn toàn chắc chắn có một cách tiếp cận thanh lịch hơn:

import json
from bs4 import BeautifulSoup

html = '''
<script type="application/json" data-initial-state="review-filter">
{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}
</script>
'''

soup = BeautifulSoup(html, 'html.parser')
res = soup.find('script')
json_object = json.loads(res.contents[0])

for language in json_object['languages']:
    print('{}: {}'.format(language['displayName'], language['reviewCount']))

output:

Toutes les langues: 573
français: 567
English: 6

Đã trả lời ngày 14 tháng 4 năm 2020 lúc 21:39Apr 14, 2020 at 21:39

James Powisjames PowisJames Powis

6244 Huy hiệu bạc16 Huy hiệu đồng4 silver badges16 bronze badges

3

Nhập JSON và tải dữ liệu vào json và sau đó là iterarte để có được tất cả các reviewCount.

import json
html='''<script type="application/json" data-initial-state="review-filter">
{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}
</script>'''

soup=BeautifulSoup(html,"html.parser")
item=soup.select_one('script[data-initial-state="review-filter"]').text
jsondata=json.loads(item)
for item in jsondata['languages']:
    print(item['reviewCount'])

Output::

573
567
6

Đã trả lời ngày 14 tháng 4 năm 2020 lúc 21:52Apr 14, 2020 at 21:52

KundukkundukKunduK

30.4K4 Huy hiệu vàng13 Huy hiệu bạc37 Huy hiệu đồng4 gold badges13 silver badges37 bronze badges

import re

html = '''<script type="application/json" data-initial-state="review-filter">
{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}
</script>'''


match = [item.group(1) for item in re.finditer('reviewCount":"(.+?)"', html)]

print(match)

Output:

['573', '567', '6']

Đã trả lời ngày 14 tháng 4 năm 2020 lúc 23:34Apr 14, 2020 at 23:34

Hướng dẫn get json from script tag python - lấy json từ python thẻ script

αԋɱҽԃ αє ιcαηαԋɱҽԃ αє єcαηαԋɱҽԃ αмєяιcαη

11.1k2 Huy hiệu vàng15 Huy hiệu bạc43 Huy hiệu đồng2 gold badges15 silver badges43 bronze badges

Thẻ tập lệnh đầy đủ là:

Output:

</script>, <script type="text/javascript"> (function(widgetFactory) { widgetFactory.mergeConfig('gallery', { account_url : 'SocialKeked', favs_account_url : null, sort : 'viral', section : 'hot', window : 'day', tag : null, isHotImage : '1', hash : 'cLQ8lVI', baseURL : decodeURIComponent('%2Fgallery'), page : 0, isPro : false, searchQuery : '', advSearch : null, isRandom : false, safe_tags : true, hasAccess : false, inGallery : false, hashes : null, image : {"id":339385563,"hash":"cLQ8lVI","account_id":"117321139","account_url":"SocialKeked","title":"Noice!","score":795,"starting_score":0,"virality":18864.156525,"size":37779342,"views":"170307","is_hot":true,"is_album":false,"album_cover":null,"album_cover_width":0,"album_cover_height":0,"mimetype":"image\/gif","ext":".gif","width":728,"height":408,"animated":true,"looping":true,"ups":737,"downs":27,"points":710,"reddit":null,"description":"","bandwidth":"5.85 TB","timestamp":"2019-12-19 12:46:19","hot_datetime":"2019-12-19 16:32:01","gallery_datetime":"2019-12-19 12:45:40","in_gallery":true,"section":"","tags":["0","0"],"subtype":null,"spam":"0","pending":"0","comment_count":115,"nsfw":false,"topic":"No Topic","topic_id":29,"meme_name":null,"meme_top":null,"meme_bottom":null,"prefer_video":true,"video_source":"https:\/\/img-9gag-fun.9cache.com\/photo\/ad5OR4N_460sv.mp4","video_host":"img-9gag-fun.9cache.com","num_images":1,"platform":null,"readonly":false,"ad_type":0,"ad_url":"","weight":-1,"favorite_count":173,"processing":{"status":"completed"},"galleryTags":[{"id":"197940815","hash":"cLQ8lVI","account_id":"117321139","tag_id":"547","display":"football","ups":"0","downs":"0","score":"0","timestamp":"2019-12-19 12:46:19","blocked":"0","tag":"football","subscribers":"16415","images":"11097","background_hash":"dMdNvgJ","thumbnail_hash":null,"spam":"0","nsfw":"0","is_promoted":"0","animated":"0","thumbnail_animated":null,"metadata":{"tag_id":"547","title":null,"description":"touchdoooowwwwnnn!","logo_hash":null,"logo_destination_url":null,"is_promoted":"0","accent":"a88680"},"image":{"animated":"0"},"thumbnail":{"animated":null}},{"id":"197940811","hash":"cLQ8lVI","account_id":"117321139","tag_id":"1024","display":"awesome","ups":"0","downs":"0","score":"0","timestamp":"2019-12-19 12:46:19","blocked":"0","tag":"awesome","subscribers":"981004","images":"756530","background_hash":"4kmYoey","thumbnail_hash":null,"spam":"0","nsfw":"0","is_promoted":"0","animated":"0","thumbnail_animated":null,"metadata":{"tag_id":"1024","title":null,"description":"neat and amazing","logo_hash":null,"logo_destination_url":null,"is_promoted":"0","accent":"8472BD"},"image":{"animated":"0"},"thumbnail":{"animated":null}}],"favorited":false,"adConfig":{"safeFlags":["in_gallery","sixth_mod_safe","gallery"],"highRiskFlags":[],"unsafeFlags":[],"wallUnsafeFlags":[],"showsAds":true},"vote":null}, group : null, comment_sort : 'best', comment_id : '', captionsEnabled : true, onTheFlyThreshold : 10485760, galleryTitle : 'Imgur: The magic of the Internet', votedFavedRecently: false, tagSectionIsPromoted: false, lastModLog: null, }); widgetFactory.mergeConfig('groups', { groups: { } });