最近需要抓一个需要翻墙才能访问的网页的包,发现VPN直连时会导致 Fiddler 和 Charles 抓包工具无法正常进行抓包,网上找了以后发现了一些解决方案:Github:VPN直连,导致 Fiddler 和 Charles 抓包工具无法正常进行抓包解决方案 ——试了貌似没用、windows下,实现vpn访问下的charles抓包设置中无网络问题的解决——收此启发指导了在charles的Proxy->external proxy允许其他端口代理
 1.找到VPN软件的代理端口proxy port
我这边使用的是vmess,可以在选项->参数设置中查看,需要明确的参数是端口和协议,我这边是10808和socks协议

 2.设置charles:
Proxy->external proxy, 首先允许其他proxy,然后根据刚刚查看到的vmess端口和协议进行填写

 3.设置完成,开始抓包
完结撒花~
 附录:
 requests使用代理
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 
 | import requests
 cookies = {
 'PB3_SESSION': '"2|1:0|10:1650810241|11:PB3_SESSION|40:djJleDo1Mi4xNDAuMjAxLjIxMTo1OTQ4NjM0Mg==|f661892137fd704b91fa09d8c58fd641a15ab9e83f94c69981dbeed7980fc9e4"',
 'V2EX_LANG': 'zhcn',
 }
 
 headers = {
 'authority': 'cn.v2ex.com',
 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
 'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8',
 'cache-control': 'no-cache',
 'pragma': 'no-cache',
 'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="100", "Google Chrome";v="100"',
 'sec-ch-ua-mobile': '?0',
 'sec-ch-ua-platform': '"Windows"',
 'sec-fetch-dest': 'document',
 'sec-fetch-mode': 'navigate',
 'sec-fetch-site': 'none',
 'sec-fetch-user': '?1',
 'upgrade-insecure-requests': '1',
 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36',
 }
 
 http_proxy = "socks5h://127.0.0.1:10808"
 https_proxy = "socks5h://127.0.0.1:10808"
 proxies = {
 "https": https_proxy,
 "http": http_proxy
 }
 
 response = requests.get('https://cn.v2ex.com/about', cookies=cookies, headers=headers, proxies=proxies)
 
 | 
注:一开始在Sublime里运行的,结果一直在response.text时报编码错误,但是通过网页的content-type和meta charset进行确认过没问题,后来经过一个启发想到会不会是控制台有编码显示不了,于是在Pycharm中运行,成功!
 aiohttp使用socks代理
from: https://pypi.org/project/aiohttp-socks/、https://www.cnblogs.com/john-xiong/p/13812567.html
- 
pip install aiohttp_socks
 
- 
| 12
 3
 4
 5
 6
 7
 8
 9
 
 | connector = ProxyConnector.from_url('socks5://127.0.0.1:10808')
 async def getDataByChromeDriver(url):
 async with aiohttp.ClientSession(connector=connector) as session:
 async with session.get(url) as response:
 return await response.text()
 
 if __name__ == '__main__':
 loop.run_until_complete(asyncio.wait([getDataByChromeDriver(index) for title, index in title_list.items()]))
 
 |  
 
- 
运行即可 
 request-html使用代理
Python爬虫一个requests_html模块足矣!(支持JS加载&异步请求)
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 
 | from requests_html import AsyncHTMLSession
 http_proxy = "socks5h://127.0.0.1:10808"
 https_proxy = "socks5h://127.0.0.1:10808"
 proxies = {
 "https": https_proxy,
 "http": http_proxy
 }
 
 session = AsyncHTMLSession()
 
 async def getDataByChromeDriver(index: Union[int, str]):
 response = await session.get('https://www.qkl123.com/sector/{}'.format(index), headers=headers, proxies=proxies)
 
 
 | 
 request-html异步
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 
 | from requests_html import AsyncHTMLSessionasession = AsyncHTMLSession()
 async def get_pyclock(index):
 r = await asession.get('http://httpbin.org/get')
 await r.html.arender()
 return r
 
 results = asession.run(get_pyclock, get_pyclock,
 get_pyclock)
 print(results)
 
 | 
and:https://cloud.tencent.com/developer/article/1575104
 asession.run无法传参的问题
修改requests_html.AsyncHTMLSessions使得支持url参数