docs/index.rst
.. ProxyPool documentation master file, created by
sphinx-quickstart on Wed Jul 8 16:13:42 2020.
You can adapt this file completely to your liking, but it should at least
contain the root toctree directive.
::
*** ______ ********************* ______ *********** _ ******** *** | ___ _ ******************** | ___ \ ********* | | ******** *** | |/ / _ __ __ _ __ _ | |/ /__ * ___ | | ******** *** | /| // _ \ \ / /| | | || __// _ \ / _ \ | | ******** *** | | | | | () | > < \ || || | | () | () || | **** *** _| || ___/ //_\ __ |_| _/ _/ ___/ **** **** __ / / ***** ************************* /_ / *******************************
Python爬虫代理IP池
.. code-block:: console
$ git clone [email protected]:jhao104/proxy_pool.git
.. code-block:: console
$ pip install -r requirements.txt
.. code-block:: python
HOST = "0.0.0.0" PORT = 5000
DB_CONN = 'redis://@127.0.0.1:8888'
PROXY_FETCHER = [ "freeProxy01", "freeProxy02", # .... ]
.. code-block:: console
$ python proxyPool.py schedule
$ python proxyPool.py server
使用
============ ======== ================ ==============
Api Method Description Params
============ ======== ================ ==============
/ GET API介绍 无
/get GET 返回一个代理 可选参数: ?type=https 过滤支持https的代理
/pop GET 返回并删除一个代理 可选参数: ?type=https 过滤支持https的代理
/all GET 返回所有代理 可选参数: ?type=https 过滤支持https的代理
/count GET 返回代理数量 无
/delete GET 删除指定代理 ?proxy=host:ip
============ ======== ================ ==============
.. code-block:: python
import requests
def get_proxy(): return requests.get("http://127.0.0.1:5010/get?type=https").json()
def delete_proxy(proxy): requests.get("http://127.0.0.1:5010/delete/?proxy={}".format(proxy))
def getHtml(): # .... retry_count = 5 proxy = get_proxy().get("proxy") while retry_count > 0: try: html = requests.get('https://www.example.com', proxies={"http": "http://{}".format(proxy), "https": "https://{}".format(proxy)}) # 使用代理访问 return html except Exception: retry_count -= 1 # 删除代理池中代理 delete_proxy(proxy) return None
.. toctree:: :maxdepth: 2
user/index dev/index changelog