This is a collection of semi-official download handlers for Scrapy. See the Scrapy download handler documentation for more information.
They should work and some of them may be later promoted to the official status, but here they are provided as-is and no support or stability promises are given. The documentation, including limitations and unsupported features, is also provided as-is and may be incomplete.
As this code intentionally uses private Scrapy APIs, it specifies a tight dependency on Scrapy. This version of the package only supports Scrapy 2.15.x.
The baseline for these handlers is the default Scrapy handler,
HTTP11DownloadHandler, which uses Twisted and supports HTTP/1.1. Feature
parity with it is an explicit goal but it's not always possible and not all
possible features are implemented in all handlers (which may change in the
future). Certain popular features not supported by HTTP11DownloadHandler,
like HTTP/2 support, and features unique to some handlers, may or may not be
implemented. Please see the sections for individual handlers for more details.
The following table summarizes the most important differences:
| Handler | HTTP/2 | HTTP/3 | Proxies | TLS logging | Impersonation | TLS version limits |
|---|---|---|---|---|---|---|
| (HTTP11DownloadHandler) | Not possible | Not possible | Yes | Yes | Not possible | No |
| AiohttpDownloadHandler | Not possible | Not possible | Yes | Partial | Not possible | No |
| CurlCffiDownloadHandler | Yes | Yes (not tested) | Yes | Not possible | No | Not possible |
| HttpxDownloadHandler | Yes | Not possible | Yes | Yes | Not possible | No |
| NiquestsDownloadHandler | Yes | No | Yes | Yes | Not possible | Not possible |
| PyreqwestDownloadHandler | Yes | Not possible | Not possible | Not possible | Not possible | No |
The following basic features are supported by all handlers unless mentioned in their docs:
- Native asyncio integration without requiring a Twisted reactor
- HTTP/1.1 for
httpandhttpsschemes - Unified download handler exceptions
- Proxies, including HTTP and HTTPS proxies for HTTP and HTTPS destinations
- Proxy authentication via
HttpProxyMiddleware - IPv6 destinations
DOWNLOAD_MAXSIZE,DOWNLOAD_WARNSIZEand the respective request meta keysDOWNLOAD_TIMEOUTand the respective request meta keyDOWNLOAD_FAIL_ON_DATALOSSand the"dataloss"flag- Setting the
download_latencyrequest meta DOWNLOAD_BIND_ADDRESSDOWNLOAD_VERIFY_CERTIFICATESheaders_receivedandbytes_receivedsignals- Not reading the proxy configuration from the environment variables
- Not handling cookies, redirects, compression and other things handled by Scrapy itself
This handler supports HTTP/1.1 and uses the aiohttp library.
Install it with:
pip install scrapy-download-handlers-incubator[aiohttp]Enable it with:
DOWNLOAD_HANDLERS = {
"http": "scrapy_download_handlers_incubator.AiohttpDownloadHandler",
"https": "scrapy_download_handlers_incubator.AiohttpDownloadHandler",
}| Proxies | Yes (HTTPS proxies for HTTPS destinations are not supported on Python < 3.11) |
| HTTP/2 | No (not supported by the library) |
| TLS verbose logging | Partial (skipped for small responses) |
response.ip_address |
Partial (skipped for small responses) |
response.certificate |
Partial (DER bytes; skipped for small responses) |
Per-request bindaddress |
No (not supported by the library) |
| Proxy certificate verification | Follows DOWNLOAD_VERIFY_CERTIFICATES |
Notable features supported by the library but not implemented:
- DNS resolving settings
- Custom DNS resolvers
This handler supports HTTP/1.1 and HTTP/2 and uses the curl_cffi library.
Install it with:
pip install scrapy-download-handlers-incubator[curl-cffi]Enable it with:
DOWNLOAD_HANDLERS = {
"http": "scrapy_download_handlers_incubator.CurlCffiDownloadHandler",
"https": "scrapy_download_handlers_incubator.CurlCffiDownloadHandler",
}| Proxies | Yes |
| HTTP/2 | Yes |
| HTTP/3 | Yes (but not tested) |
| TLS verbose logging | No (not supported by the library) |
response.ip_address |
Yes |
response.certificate |
No (not supported by the library) |
Per-request bindaddress |
No (not supported by the library) |
| Proxy certificate verification | Follows DOWNLOAD_VERIFY_CERTIFICATES |
Notable features supported by the library but not implemented:
- Impersonation
- Advanced libcurl tunables
CURL_CFFI_HTTP_VERSION(str, default:"v1", corresponding to "Enforce HTTP/1.1"): The HTTP version to use. The value is passed directly to the library so the possible values are set bycurl_cffi.requests.utils.normalize_http_version()and the meanings of the underlying constants can be seen in libcurl docs (CURLOPT_HTTP_VERSION). Set this to"v2tls"or"v2"to enable HTTP/2 for HTTPS requests or for all requests respectively. Set this to"v3"to enable HTTP/3.
This is an updated copy of the official
scrapy.core.downloader.handlers._httpx.HttpxDownloadHandler handler. It
supports HTTP/1.1 and HTTP/2 and uses the httpx library.
Install it with:
pip install scrapy-download-handlers-incubator[httpx]Enable it with:
DOWNLOAD_HANDLERS = {
"http": "scrapy_download_handlers_incubator.HttpxDownloadHandler",
"https": "scrapy_download_handlers_incubator.HttpxDownloadHandler",
}| Proxies | Yes (separate connection pool per proxy) |
| HTTP/2 | Yes |
| HTTP/3 | No (not supported by the library) |
| TLS verbose logging | Yes |
response.ip_address |
Yes |
response.certificate |
Yes (DER bytes) |
Per-request bindaddress |
No (not supported by the library) |
| Proxy certificate verification | Follows DOWNLOAD_VERIFY_CERTIFICATES |
Notable features supported by the library but not implemented:
- SOCKS5 proxies
- Alternative transports
- Limiting the number of per-proxy connection pool to save resources
HTTPX_HTTP2_ENABLED(bool, default:False): Whether to enable HTTP/2.
This handler supports HTTP/1.1 and HTTP/2 and uses the niquests library.
Install it with:
pip install scrapy-download-handlers-incubator[niquests]Enable it with:
DOWNLOAD_HANDLERS = {
"http": "scrapy_download_handlers_incubator.NiquestsDownloadHandler",
"https": "scrapy_download_handlers_incubator.NiquestsDownloadHandler",
}| Proxies | Yes |
| HTTP/2 | Yes |
| HTTP/3 | No (not implemented) |
| TLS verbose logging | Yes |
response.ip_address |
Yes |
response.certificate |
Yes (DER bytes) |
Per-request bindaddress |
No (not supported by the library) |
| Proxy certificate verification | Follows DOWNLOAD_VERIFY_CERTIFICATES |
Notable features supported by the library but not implemented:
- Custom DNS resolvers
- SOCKS5 proxies
- HTTP/2 tunables
NIQUESTS_HTTP2_ENABLED(bool, default:False): Whether to enable HTTP/2.
This handler supports HTTP/1.1 and HTTP/2 and uses the pyreqwest library.
Install it with:
pip install scrapy-download-handlers-incubator[pyreqwest]Enable it with:
DOWNLOAD_HANDLERS = {
"http": "scrapy_download_handlers_incubator.PyreqwestDownloadHandler",
"https": "scrapy_download_handlers_incubator.PyreqwestDownloadHandler",
}| Proxies | No (not supported by the library) |
| HTTP/2 | Yes |
| HTTP/3 | No (not supported by the library) |
| TLS verbose logging | No (not supported by the library) |
response.ip_address |
No (not supported by the library) |
response.certificate |
No (not supported by the library) |
Per-request bindaddress |
No (not supported by the library) |
Notable features supported by the library but not implemented:
- HTTP/2 tunables
PYREQWEST_HTTP2_ENABLED(bool, default:False): Whether to enable HTTP/2.