怎样用python爬虫爬取百度搜索图片

40 435 2021-05-24 02:00:01

本文目录

1. python爬取百度图片源码
2. python爬取百度图片执行结果
3. 附注：关于python爬虫

python爬虫爬取百度图片是很多人python爬虫入门后一个重要的练手项目。一方面，操作简单，另一方面，百度对爬虫的容忍度比较高，不容易封IP。

怎样用python爬虫爬取百度搜索图片

那么，python要怎样爬取百度图片呢？

python爬取百度图片源码

import requests, re, getpass, os, shutil
 
picList = []  # 图片列表
file = 'C:\\Users\\' + getpass.getuser() + '\\Desktop\\'  # 保存路径
 
# 获取图片
def getUrl(url):
    global picList
    t, s = 0, 0
    while t < 1000:
        _url = url + str(t)
        # 获取网页源码
        result = requests.get(_url).text
        # 匹配网页上的所有图片的URL
        picUrl = re.findall('"objURL":"(.*?)",', result)
        if len(picUrl) == 0:
            break
        else:
            s += len(picUrl)
            # 保存图片的URL
            picList += picUrl
            t += 60
    return s
 
 
# 下载图片
def download(keyword, num):
    if num > len(picList):
        num = len(picList)
    print('图片保存的路径：{}'.format(file))
    for i in range(num):
        picUrl = picList[i]  # 图片的URL
        # picType = picUrl[str(picUrl).rfind('.'):]  # 图片的格式
        picName = file + '\\' + keyword + str(i + 1) + '.jpg'
        with open(picName, 'wb') as tempFile:
            print('下载第张{}图片，地址：{}'.format(i + 1, picUrl))
            # 保存图片
            tempFile.write(requests.get(picUrl).content)
 
 
if __name__ == '__main__':
    key = input('输入检索关键字：')
    url = 'http://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word=' + key + '&pn='
    # 获取图片
    getUrl(url)
    # 图片下载到桌面上的关键字文件夹里
    file = file + key
    if file is None:
        print('保存路径错误')
    else:
        if os.path.exists(file):
            '''
            file_list = os.listdir(file)
            for _f in file_list:
                file_path = os.path.join(file, _f)
                if os.path.isfile(file_path):
                    os.remove(file_path)
                elif os.path.isdir(file_path):
                    shutil.rmtree(file_path)
                    '''
            shutil.rmtree(file, True)
        os.makedirs(file)
        download(key, int(input("需要图片数量：")))

python爬取百度图片执行结果

输入检索关键字：黄鹤楼
需要图片数量：10
图片保存的路径：C:\Users\Administrator\Desktop\黄鹤楼
下载第张1图片，地址：http://image109.360doc.com/DownloadImg/2017/12/2614/120146561_24_20171226025343324.jpg
下载第张2图片，地址：http://images.quanjing.com/chineseview092/high/yt-p0101725.jpg
下载第张3图片，地址：http://img.yxad.cn/images/20180817/571d39d3dd614a89b70129ab4d3c8571.jpeg
下载第张4图片，地址：http://boot-img.xuexi.cn/image/1005/process/5b53e5bee2d14397b5b5180cd27722a9.jpg
下载第张5图片，地址：http://pics2.baidu.com/feed/472309f7905298228609cb7a923545cd0b46d453.jpeg?token=3735b5b6fca388d9aa56241874733699&s=1D326F951CD037E94A78492D0300C066
下载第张6图片，地址：http://dpic.tiankong.com/8v/0p/QJ9114862369.jpg
下载第张7图片，地址：http://pic.soutu123.cn/element_origin_min_pic/16/10/04/1157f327a3c305d.jpg%21/fw/700/quality/90/unsharp/true/compress/true
下载第张8图片，地址：http://5b0988e595225.cdn.sohucs.com/images/20180208/e7f835a4b8d94e35a94d127b97f62163.jpeg
下载第张9图片，地址：http://photos.tuchong.com/47256/f/797849.jpg
下载第张10图片，地址：http://img.pconline.com.cn/images/upload/upc/tx/photoblog/1309/20/c98/25943950_1379668272404.jpg
 
Process finished with exit code 0

这样，我们就实现了通过python爬虫爬取百度图片的操作。但光复制可不行哦，在试验成功后一定要多写多练，才能将python爬虫的强大感受出来。

附注：关于python爬虫

在python爬虫的后期，可以通过对内容进行分析与聚合生成新的页面，让网站质量远超采集站（如非老司机，在聚合出可读性强、相关性高的高质量页请慎重操作，聚合站是百度打击的重点）。

参考链接：https://blog.csdn.net/xxxxing/article/details/104929347

本文地址：https://xzo.com.cn/develop/python/980.html