Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/impersonate 6.0 #163

Merged
merged 19 commits into from
Dec 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,4 +50,3 @@ jobs:
- uses: pypa/[email protected]
with:
password: ${{ secrets.PYPI_TOKEN }}

1 change: 0 additions & 1 deletion .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ on:
push:
branches:
- main
- master
- bugfix/*
- feature/*
jobs:
Expand Down
8 changes: 4 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.ONESHELL:
SHELL := bash
VERSION := 0.5.4
CURL_VERSION := curl-7.84.0
VERSION := 0.6.0b6
CURL_VERSION := curl-8.1.1

.preprocessed: curl_cffi/include/curl/curl.h curl_cffi/cacert.pem .so_downloaded
touch .preprocessed
Expand All @@ -15,7 +15,7 @@ $(CURL_VERSION):
tar -xf $(CURL_VERSION).tar.xz

curl-impersonate-$(VERSION)/chrome/patches: $(CURL_VERSION)
curl -L "https://github.com/lwthiker/curl-impersonate/archive/refs/tags/v$(VERSION).tar.gz" \
curl -L "https://github.com/yifeikong/curl-impersonate/archive/refs/tags/v$(VERSION).tar.gz" \
-o "curl-impersonate-$(VERSION).tar.gz"
tar -xf curl-impersonate-$(VERSION).tar.gz

Expand All @@ -32,7 +32,7 @@ curl_cffi/cacert.pem:
curl https://curl.se/ca/cacert.pem -o curl_cffi/cacert.pem

.so_downloaded:
python preprocess/download_so.py
python preprocess/download_so.py $(VERSION)
touch .so_downloaded

preprocess: .preprocessed
Expand Down
46 changes: 44 additions & 2 deletions README-zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,16 @@ TLS 或者 JA3 指纹。如果你莫名其妙地被某个网站封锁了,可
- 预编译,不需要再自己机器上再弄一遍。
- 支持 `asyncio`,并且每个请求都可以换代理。
- 支持 http 2.0,requests 不支持。
- 支持 websocket。

|库|requests|aiohttp|httpx|pycurl|curl_cffi|
|---|---|---|---|---|---|
|http2|❌|❌|✅|✅|✅|
|sync|✅|❌|✅|✅|✅|
|async|❌|✅|✅|❌|✅|
|websocket|❌|✅|❌|❌|✅|
|指纹|❌|❌|❌|❌|✅|
|速度|🐇|🐇🐇|🐇|🐇🐇|🐇🐇|

## 安装

Expand All @@ -23,8 +33,14 @@ TLS 或者 JA3 指纹。如果你莫名其妙地被某个网站封锁了,可
在其他小众平台,你可能需要先编译并安装 `curl-impersonate` 并且设置 `LD_LIBRARY_PATH` 这些
环境变量。

安装测试版:

pip install curl_cffi --pre

## 使用

尽量模仿比较新的浏览器,不要直接从下边的例子里复制 `chrome110` 去用。

### 类 requests

```python
Expand Down Expand Up @@ -59,19 +75,25 @@ print(r.json())
# {'cookies': {'foo': 'bar'}}
```

支持模拟的浏览器版本,和 [curl-impersonate](https://github.com/lwthiker/curl-impersonate) 一致:
支持模拟的浏览器版本,和我 [fork](https://github.com/yifeikong/curl-impersonate) 的 [curl-impersonate](https://github.com/lwthiker/curl-impersonate) 一致:

不过只支持类似 Chrome 的浏览器。Firefox 的支持进展可以查看 #55

- chrome99
- chrome100
- chrome101
- chrome104
- chrome107
- chrome110
- chrome116
- chrome119
- chrome120
- chrome99_android
- edge99
- edge101
- safari15_3
- safari15_5
- safari17_2_ios

### asyncio

Expand Down Expand Up @@ -102,6 +124,22 @@ async with AsyncSession() as s:
results = await asyncio.gather(*tasks)
```

### WebSockets

```python
from curl_cffi.requests import Session, WebSocket

def on_message(ws: WebSocket, message):
print(message)

with Session() as s:
ws = s.ws_connect(
"wss://api.gemini.com/v1/marketdata/BTCUSD",
on_message=on_message,
)
ws.run_forever()
```

### 类 curl

另外,你还可以使用类似 curl 的底层 API:
Expand All @@ -125,7 +163,10 @@ print(body.decode())

更多细节请查看 [英文文档](https://curl-cffi.readthedocs.io)。

如果你用 scrapy 的话,可以参考这个中间件:[tieyongjie/scrapy-fingerprint](https://github.com/tieyongjie/scrapy-fingerprint)
如果你用 scrapy 的话,可以参考这些中间件:

- [tieyongjie/scrapy-fingerprint](https://github.com/tieyongjie/scrapy-fingerprint)
- [jxlil/scrapy-impersonate](https://github.com/jxlil/scrapy-impersonate)

有问题和建议请优先提 issue,中英文均可,也可以加微信群交流讨论:

Expand All @@ -136,6 +177,7 @@ print(body.decode())
- 该项目 fork 自:[multippt/python_curl_cffi](https://github.com/multippt/python_curl_cffi), MIT 协议发布。
- Headers/Cookies 代码来自 [httpx](https://github.com/encode/httpx/blob/master/httpx/_models.py), BSD 协议发布。
- Asyncio 支持是受 Tornado 的 curl http client 启发而做。
- WebSocket API 的设计来自 [websocket_client](https://github.com/websocket-client/websocket-client)。

## 赞助

Expand Down
34 changes: 32 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,14 @@ website for no obvious reason, you can give this package a try.
- Pre-compiled, so you don't have to compile on your machine.
- Supports `asyncio` with proxy rotation on each request.
- Supports http 2.0, which requests does not.
- Supports websocket.

|library|requests|aiohttp|httpx|pycurl|curl_cffi|
|---|---|---|---|---|---|
|http2|❌|❌|✅|✅|✅|
|sync|✅|❌|✅|✅|✅|
|async|❌|✅|✅|❌|✅|
|websocket|❌|✅|❌|❌|✅|
|fingerprints|❌|❌|❌|❌|✅|
|speed|🐇|🐇🐇|🐇|🐇🐇|🐇🐇|

Expand All @@ -40,6 +42,8 @@ To install beta releases:

## Usage

Use the latest impersonate versions, do NOT copy `chrome110` here without changing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section could be improved


### requests-like

```python
Expand Down Expand Up @@ -74,19 +78,25 @@ print(r.json())
# {'cookies': {'foo': 'bar'}}
```

Supported impersonate versions, as supported by [curl-impersonate](https://github.com/lwthiker/curl-impersonate):
Supported impersonate versions, as supported by my [fork](https://github.com/yifeikong/curl-impersonate) of [curl-impersonate](https://github.com/lwthiker/curl-impersonate):

However, only Chrome-like browsers are supported. Firefox support is tracked in #55

- chrome99
- chrome100
- chrome101
- chrome104
- chrome107
- chrome110
- chrome116
- chrome119
- chrome120
- chrome99_android
- edge99
- edge101
- safari15_3
- safari15_5
- safari17_2_ios

### asyncio

Expand Down Expand Up @@ -117,6 +127,22 @@ async with AsyncSession() as s:
results = await asyncio.gather(*tasks)
```

### WebSockets

```python
from curl_cffi.requests import Session, WebSocket

def on_message(ws: WebSocket, message):
print(message)

with Session() as s:
ws = s.ws_connect(
"wss://api.gemini.com/v1/marketdata/BTCUSD",
on_message=on_message,
)
ws.run_forever()
```

### curl-like

Alternatively, you can use the low-level curl-like API:
Expand All @@ -140,13 +166,17 @@ print(body.decode())

See the [docs](https://curl-cffi.readthedocs.io) for more details.

If you are using scrapy, check out this middleware: [tieyongjie/scrapy-fingerprint](https://github.com/tieyongjie/scrapy-fingerprint)
If you are using scrapy, check out these middlewares:

- [tieyongjie/scrapy-fingerprint](https://github.com/tieyongjie/scrapy-fingerprint)
- [jxlil/scrapy-impersonate](https://github.com/jxlil/scrapy-impersonate)

## Acknowledgement

- Originally forked from [multippt/python_curl_cffi](https://github.com/multippt/python_curl_cffi), which is under the MIT license.
- Headers/Cookies files are copied from [httpx](https://github.com/encode/httpx/blob/master/httpx/_models.py), which is under the BSD license.
- Asyncio support is inspired by Tornado's curl http client.
- The WebSocket API is inspired by [websocket_client](https://github.com/websocket-client/websocket-client)

## Sponsor

Expand Down
12 changes: 12 additions & 0 deletions bump_version.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/bin/bash

VERSION=$1

# Makefile
gsed "s/^VERSION := .*/VERSION := ${VERSION}/g" -i Makefile

# curl_cffi/__version__.py
gsed "s/^__version__ = .*/__version__ = \"${VERSION}\"/g" -i curl_cffi/__version__.py

# pyproject.toml
gsed "s/^version = .*/version = \"${VERSION}\"/g" -i pyproject.toml
2 changes: 1 addition & 1 deletion curl_cffi/__version__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@
# __description__ = metadata.metadata("curl_cffi")["Summary"]
# __version__ = metadata.version("curl_cffi")
__description__ = "libcurl ffi bindings for Python, with impersonation support"
__version__ = "0.5.10"
__version__ = "0.6.0b6"
__curl_version__ = Curl().version().decode()
46 changes: 27 additions & 19 deletions curl_cffi/const.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ class CurlOpt(IntEnum):
SSL_CTX_DATA = 10000 + 109
FTP_CREATE_MISSING_DIRS = 0 + 110
PROXYAUTH = 0 + 111
FTP_RESPONSE_TIMEOUT = 0 + 112
SERVER_RESPONSE_TIMEOUT = 0 + 112
IPRESOLVE = 0 + 113
MAXFILESIZE = 0 + 114
INFILESIZE_LARGE = 30000 + 115
Expand Down Expand Up @@ -303,14 +303,21 @@ class CurlOpt(IntEnum):
MIME_OPTIONS = 0 + 315
SSH_HOSTKEYFUNCTION = 20000 + 316
SSH_HOSTKEYDATA = 10000 + 317
HTTPBASEHEADER = 10000 + 318
SSL_SIG_HASH_ALGS = 10000 + 319
SSL_ENABLE_ALPS = 0 + 320
SSL_CERT_COMPRESSION = 10000 + 321
SSL_ENABLE_TICKET = 0 + 322
HTTP2_PSEUDO_HEADERS_ORDER = 10000 + 323
HTTP2_NO_SERVER_PUSH = 0 + 324
SSL_PERMUTE_EXTENSIONS = 0 + 325
PROTOCOLS_STR = 10000 + 318
REDIR_PROTOCOLS_STR = 10000 + 319
WS_OPTIONS = 0 + 320
CA_CACHE_TIMEOUT = 0 + 321
QUICK_EXIT = 0 + 322
HTTPBASEHEADER = 10000 + 323
SSL_SIG_HASH_ALGS = 10000 + 324
SSL_ENABLE_ALPS = 0 + 325
SSL_CERT_COMPRESSION = 10000 + 326
SSL_ENABLE_TICKET = 0 + 327
HTTP2_PSEUDO_HEADERS_ORDER = 10000 + 328
HTTP2_SETTINGS = 10000 + 329
SSL_PERMUTE_EXTENSIONS = 0 + 330
HTTP2_WINDOW_UPDATE = 0 + 331
ECH = 10000 + 332

if locals().get("WRITEDATA"):
FILE = locals().get("WRITEDATA")
Expand All @@ -328,22 +335,16 @@ class CurlInfo(IntEnum):
NAMELOOKUP_TIME = 0x300000 + 4
CONNECT_TIME = 0x300000 + 5
PRETRANSFER_TIME = 0x300000 + 6
SIZE_UPLOAD = 0x300000 + 7
SIZE_UPLOAD_T = 0x600000 + 7
SIZE_DOWNLOAD = 0x300000 + 8
SIZE_DOWNLOAD_T = 0x600000 + 8
SPEED_DOWNLOAD = 0x300000 + 9
SPEED_DOWNLOAD_T = 0x600000 + 9
SPEED_UPLOAD = 0x300000 + 10
SPEED_UPLOAD_T = 0x600000 + 10
HEADER_SIZE = 0x200000 + 11
REQUEST_SIZE = 0x200000 + 12
SSL_VERIFYRESULT = 0x200000 + 13
FILETIME = 0x200000 + 14
FILETIME_T = 0x600000 + 14
CONTENT_LENGTH_DOWNLOAD = 0x300000 + 15
CONTENT_LENGTH_DOWNLOAD_T = 0x600000 + 15
CONTENT_LENGTH_UPLOAD = 0x300000 + 16
CONTENT_LENGTH_UPLOAD_T = 0x600000 + 16
STARTTRANSFER_TIME = 0x300000 + 17
CONTENT_TYPE = 0x100000 + 18
Expand All @@ -357,7 +358,6 @@ class CurlInfo(IntEnum):
NUM_CONNECTS = 0x200000 + 26
SSL_ENGINES = 0x400000 + 27
COOKIELIST = 0x400000 + 28
LASTSOCKET = 0x200000 + 29
FTP_ENTRY_PATH = 0x100000 + 30
REDIRECT_URL = 0x100000 + 31
PRIMARY_IP = 0x100000 + 32
Expand All @@ -371,12 +371,10 @@ class CurlInfo(IntEnum):
PRIMARY_PORT = 0x200000 + 40
LOCAL_IP = 0x100000 + 41
LOCAL_PORT = 0x200000 + 42
TLS_SESSION = 0x400000 + 43
ACTIVESOCKET = 0x500000 + 44
TLS_SSL_PTR = 0x400000 + 45
HTTP_VERSION = 0x200000 + 46
PROXY_SSL_VERIFYRESULT = 0x200000 + 47
PROTOCOL = 0x200000 + 48
SCHEME = 0x100000 + 49
TOTAL_TIME_T = 0x600000 + 50
NAMELOOKUP_TIME_T = 0x600000 + 51
Expand Down Expand Up @@ -492,7 +490,7 @@ class CurlECode(IntEnum):
TFTP_UNKNOWNID = 72
REMOTE_FILE_EXISTS = 73
TFTP_NOSUCHUSER = 74
CONV_FAILED = 75
OBSOLETE75 = 75
OBSOLETE76 = 76
SSL_CACERT_BADFILE = 77
REMOTE_FILE_NOT_FOUND = 78
Expand All @@ -517,6 +515,7 @@ class CurlECode(IntEnum):
PROXY = 97
SSL_CLIENTCERT = 98
UNRECOVERABLE_POLL = 99
ECH_REQUIRED = 100


class CurlHttpVersion(IntEnum):
Expand All @@ -527,3 +526,12 @@ class CurlHttpVersion(IntEnum):
V2TLS = 4 # use version 2 for HTTPS, version 1.1 for HTTP */
V2_PRIOR_KNOWLEDGE = 5 # please use HTTP 2 without HTTP/1.1 Upgrade */
V3 = 30 # Makes use of explicit HTTP/3 without fallback.


class CurlWsFlag(IntEnum):
TEXT = (1<<0)
BINARY = (1<<1)
CONT = (1<<2)
CLOSE = (1<<3)
PING = (1<<4)
OFFSET = (1<<5)
Loading
Loading