最近需要抓一个需要翻墙才能访问的网页的包,发现VPN直连时会导致 Fiddler 和 Charles 抓包工具无法正常进行抓包,网上找了以后发现了一些解决方案:Github:VPN直连,导致 Fiddler 和 Charles 抓包工具无法正常进行抓包解决方案 ——试了貌似没用、windows下,实现vpn访问下的charles抓包设置中无网络问题的解决——收此启发指导了在charles的Proxy->external proxy
允许其他端口代理
1.找到VPN软件的代理端口proxy port
我这边使用的是vmess,可以在选项->参数设置
中查看,需要明确的参数是端口和协议,我这边是10808和socks协议
2.设置charles:
Proxy->external proxy
, 首先允许其他proxy,然后根据刚刚查看到的vmess端口和协议进行填写
3.设置完成,开始抓包
完结撒花~
附录:
requests使用代理
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
| import requests
cookies = { 'PB3_SESSION': '"2|1:0|10:1650810241|11:PB3_SESSION|40:djJleDo1Mi4xNDAuMjAxLjIxMTo1OTQ4NjM0Mg==|f661892137fd704b91fa09d8c58fd641a15ab9e83f94c69981dbeed7980fc9e4"', 'V2EX_LANG': 'zhcn', }
headers = { 'authority': 'cn.v2ex.com', 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8', 'cache-control': 'no-cache', 'pragma': 'no-cache', 'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="100", "Google Chrome";v="100"', 'sec-ch-ua-mobile': '?0', 'sec-ch-ua-platform': '"Windows"', 'sec-fetch-dest': 'document', 'sec-fetch-mode': 'navigate', 'sec-fetch-site': 'none', 'sec-fetch-user': '?1', 'upgrade-insecure-requests': '1', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36', }
http_proxy = "socks5h://127.0.0.1:10808" https_proxy = "socks5h://127.0.0.1:10808" proxies = { "https": https_proxy, "http": http_proxy }
response = requests.get('https://cn.v2ex.com/about', cookies=cookies, headers=headers, proxies=proxies)
|
注:一开始在Sublime里运行的,结果一直在response.text
时报编码错误,但是通过网页的content-type和meta charset进行确认过没问题,后来经过一个启发想到会不会是控制台有编码显示不了,于是在Pycharm中运行,成功!
aiohttp使用socks代理
from: https://pypi.org/project/aiohttp-socks/、https://www.cnblogs.com/john-xiong/p/13812567.html
-
pip install aiohttp_socks
-
1 2 3 4 5 6 7 8 9
| connector = ProxyConnector.from_url('socks5://127.0.0.1:10808')
async def getDataByChromeDriver(url): async with aiohttp.ClientSession(connector=connector) as session: async with session.get(url) as response: return await response.text()
if __name__ == '__main__': loop.run_until_complete(asyncio.wait([getDataByChromeDriver(index) for title, index in title_list.items()]))
|
-
运行即可
request-html使用代理
Python爬虫一个requests_html模块足矣!(支持JS加载&异步请求)
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| from requests_html import AsyncHTMLSession
http_proxy = "socks5h://127.0.0.1:10808" https_proxy = "socks5h://127.0.0.1:10808" proxies = { "https": https_proxy, "http": http_proxy }
session = AsyncHTMLSession()
async def getDataByChromeDriver(index: Union[int, str]): response = await session.get('https://www.qkl123.com/sector/{}'.format(index), headers=headers, proxies=proxies)
|
request-html异步
1 2 3 4 5 6 7 8 9 10
| from requests_html import AsyncHTMLSession asession = AsyncHTMLSession() async def get_pyclock(index): r = await asession.get('http://httpbin.org/get') await r.html.arender() return r
results = asession.run(get_pyclock, get_pyclock, get_pyclock) print(results)
|
and:https://cloud.tencent.com/developer/article/1575104
asession.run无法传参的问题
修改requests_html.AsyncHTMLSessions使得支持url参数