requests

requests

GET请求:

带参数的url请求:      

1
2
3
import requests
data ={'name':'cheng','age':20} #参数
response = requests.get('https://httpbin.org/get', params=data)

这样requests会给我们自动build这个网址,查看response.url 这个属性。会得到我们请求的代码

https://httpbin.org/get?name=cheng&age=20'

解析JSON:

requests还提供了一个解析json的方法,用get方法请求 https://httpbin.org/get,它的返回结果是一个json的字符串,所以我们可以直接调用response.json方法得到。

复制代码;)

1
2
3
4
5
6
7
# coding=utf-8
import json
import requests

response = requests.get('https://httpbin.org/get')
print(response.json()) #调用json方法
print(json.loads(response.text))

用json.loads把json数据转化字典方法,同样打印输出,发现和response.json()是一样的,其实response.json()就是通过一个json.loads()方法

获取二进制数据

Requests 会自动解码来自服务器的内容。大多数 unicode 字符集都能被无缝地解码。请求发出后,Requests 会基于 HTTP 头部对响应的编码作出有根据的推测。当你访问 r.text 之时,Requests 会使用其推测的文本编码。你可以找出 Requests 使用了什么编码,并且能够使用r.encoding 属性来改变它。

response.content()方法可以使你也能以字节的方式访问请求响应体,对于一些图片和视频 音频内容,需要用到content

https://ssl.gstatic.com/ui/v1/icons/mail/rfr/logo_gmail_lockup_default_1x.png 是一张gmail的图片 通过储存content属性就可以获得图片

1
2
3
4
5
6
7
# coding=utf-8
import json
import requests
response = requests.get('https://ssl.gstatic.com/ui/v1/icons/mail/rfr/logo_gmail_lockup_default_1x.png')
with open('gmail.png','wb') as f: #注意这边是二进制的写入所以是wb
f.write(response.content)
f.close()

img

成功写入了图片

添加headers

一些网站会检测请求方是不是机器,如果是机器,就不能成功访问,所以要添加headers伪装成浏览器

1
2
3
4
5
6
# coding=utf-8
import requests

headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36'}
response = requests.get('https://www.zhihu.com/', headers=headers)

添加上headers后的requests就可以伪装成浏览器了

post请求

1
2
3
4
5
6
# coding=utf-8
import requests

data = {'name':'cheng', 'age':20} #
response = requests.post('https://httpbin.org/post', data=data)
print(response.text)

发送了post请求,返回结果:
img

可以看到服务器接受了我们post的data,返回了一个json格式的数据.

response常用属性

1
2
3
4
5
6
7
8
9
# coding=utf-8
import requests

response = requests.get('http://www.cnki.net/old/')
print(type(response.status_code), response.status_code)
print(type(response.headers), response.status_code)
print(type(response.cookies), response.cookies)
print(type(response.url), response.url)
print(type(response.history), response.history)

输出结果:

img

可以看到服务器返回的状态码是int类型的数据,headers是一个字典类型,cookies ,请求的网址是一个字符串类型,history是浏览的历史

文件上传:

1
2
3
4
5
6
# coding=utf-8
import requests

file = {'file' :open('img.jpg', 'rb')}
response = requests.post('https://httpbin.org/')
print(response.text)

返回的结果中有file这个字典键值,对应的是我们上传的文件

cookies:

response.cookies是一个字典的形式,我们可以通过for循环把他们print出来

1
2
3
4
5
6
7
8
# coding=utf-8
import requests

response = requests.get('https://www.baidu.com/')
print(response)
print(response.cookies)
for key,value in response.cookies.items():
print(key + '=' + value)

运行结果:

img

会话维持——session

模拟登陆

普通的get方式:

1
2
3
4
5
6
# coding=utf-8
import requests

requests.get('https://httpbin.org/cookies/set/number/123456')
response = requests.get('https://httpbin.org/cookies')
print(response.text)

这里,我们第一次请求网站,设置cookies,当第二次请求是,返回结果是:

img

可以看到第二次请求的cookies是空,原因是我们发起了两次请求,这两个请求是完全独立的过程,他们两个是没有相关性的,可以把他们想象成用两个浏览器分别访问,相当于模拟了一个会话。

用session请求:

1
2
3
4
5
6
7
# coding=utf-8
import requests

s = requests.session()
s.get('https://httpbin.org/cookies/set/number/123456')
response = s.get('https://httpbin.org/cookies')
print(response.text)

运行结果:

img

可以看到第二次请求的返回值就是第一次设置的值,可以把他们看作一个浏览器先后发出了请求,会话对象让你能够跨请求保持某些参数

证书验证:

有些网站访问时会出现证书错误的情况:

有两种方式可以解决:

一种是修改requests中的varify参数,使他为false:

1
2
3
4
5
6
# coding=utf-8
import requests
import urllib3
urllib3.disable_warnings()

response = requests.get('http://www.12306.cn', verify=False)

这样访问时会自动忽略网站的证书。但是requests还是用生成warning,提醒你证书是不安全的我们导入urllib3模块,用disable_warnings()方法

第二种是直接指定一个证书:

1
2
3
4
# coding=utf-8
import requests

response = requests.get('http://www.12306.cn', cert('path/server'))

代理设置

1
2
3
4
5
6
7
import requests

proxies = {
'https': 'https://127.0.0.1:13386',
'http':'https://127.0.0.1:12345'
}
response = requests.get('https://www.google.com/?hl=zh_cn', proxies=proxies)

直接添加一个proxies的字典就行

当代理有密码时,只要在修改values值,添加上用户名和密码

1
2
3
4
5
6
import requests

proxies = {
'https': 'https://user:password@127.0.0.1:13386',
}
response = requests.get('https://www.google.com/',proxies=proxies)

如果是shadowsocks可以 pip install requests[socks] 然后将proxies修改成:

1
2
3
4
5
6
import requests

proxies = {
'https': 'socks5:330330://127.0.0.1:13386',
}
response = requests.get('https://www.baidu.com/', proxies=proxies)

超时设置

1
2
3
4
import requests

response = requests.get('https://www.baidu.com/', timeout=0.2)
print(response.status_code)

限制了响应时间,如果大于0.2秒,会抛出异常

认证设置

1
2
3
4
5
6
import requests
from requests.auth import HTTPBasicAuth

response = requests.get(url, auth=HTTPBasicAuth('usr','password'))
response = requests.get(url, auth={'user':'12345'})
print(response.status_code)

这样的两种auth属性都行

异常处理

  • exception requests.RequestException(*args, **kwargs)[源代码]¶
    There was an ambiguous exception that occurred while handling your request.

  • exception requests.ConnectionError(*args, **kwargs)[源代码]
    A Connection error occurred.

  • exception requests.HTTPError(*args, **kwargs)[源代码]
    An HTTP error occurred.

  • exception requests.URLRequired(*args, **kwargs)[源代码]
    A valid URL is required to make a request.

  • exception requests.TooManyRedirects(*args, **kwargs)[源代码]
    Too many redirects.

  • exception requests.ConnectTimeout(*args, **kwargs)[源代码]
    The request timed out while trying to connect to the remote server.
    Requests that produced this error are safe to retry.

  • exception requests.ReadTimeout(*args, **kwargs)[源代码]
    The server did not send any data in the allotted amount of time.

  • exception requests.Timeout(*args, **kwargs)[源代码]
    The request timed out.
    Catching this error will catch both ConnectTimeout and ReadTimeout errors.