diff --git a/Dockerfile b/Dockerfile index ae933b8..ef2e597 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,16 +1,16 @@ FROM python:3.12.4-slim -LABEL name="XHS-Downloader" version="2.1 Beta" authors="JoeanAmier" +LABEL name="XHS-Downloader" version="2.1" authors="JoeanAmier" COPY locale /locale COPY source /source COPY static /static COPY LICENSE /LICENSE COPY main.py /main.py -COPY README.md /README.md -COPY README_EN.md /README_EN.md COPY requirements.txt /requirements.txt RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt +EXPOSE 8000 + CMD ["python", "main.py"] diff --git a/README.md b/README.md index b5c7c1e..87d778e 100644 --- a/README.md +++ b/README.md @@ -43,11 +43,11 @@
⭐ XHS-Downloader 开发计划及进度可前往 Projects 查阅
🎥 点击图片观看演示视频
- +https://www.xiaohongshu.com/explore/作品ID
⭐ 推荐使用 Windows 终端 (Windows 11 默认终端)运行程序以便获得最佳显示效果!
如果仅需下载无水印作品文件,建议选择 程序运行;如果有其他需求,建议选择 源码运行!
+如果仅需下载无水印作品文件,建议选择 程序运行 或 Docker 运行;如果有其他需求,建议选择 源码运行!
建议自行设置 cookie
参数,若不设置该参数,程序功能可能无法正常使用!
Windows 10 及以上用户可前往 Releases 下载程序压缩包,解压后打开程序文件夹,双击运行 main.exe
即可使用。
若通过此方式使用程序,文件默认下载路径为:.\_internal\Download
;配置文件路径为:.\_internal\settings.json
Dockerfile
文件构建镜像docker pull joeanamier/xhs-downloader
命令拉取镜像docker run -it joeanamier/xhs-downloader:2.1
docker run -it joeanamier/xhs-downloader:2.1 python main.py server
Docker 运行项目时不支持 命令行调用模式,无法使用 读取剪贴板 与 监听剪贴板 功能,可以正常粘贴内容,其他功能如有异常请反馈!
3.12
的 Python 解释器🔥 Xiaohongshu Artwork Collection Tool: Collect information on Xiaohongshu artworks; Extract the download address of Xiaohongshu artworks; Download the Xiaohongshu watermark-free artwork files!
-❤️ The author only releases XHS-Downloader on GitHub, without collaborating with any individuals or websites. Additionally, there are no charging plans for the tool!
-⭐ Due to the author's limited energy, I was unable to update the English document in a timely manner, and the content may have become outdated. Suggest referring to Chinese documentation. If you want to contribute to translation, we warmly welcome you
-🔥 Xiaohongshu Link Extraction/Content Collection Tool:Extract account-published, favorited, and liked content links; extract search result content links and user links; collect Xiaohongshu content information; extract Xiaohongshu content download addresses; download Xiaohongshu watermark-free content files!
+⭐ Due to the author's limited energy, I was unable to update the English document in a timely manner, and the content may have become outdated, partial translation is machine translation, the translation result may be incorrect, Suggest referring to Chinese documentation. If you want to contribute to translation, we warmly welcome you.
+🎥 Click on the image to watch the demo video
- +⭐ The development plan and progress of XHS-Downloader can be found at Projects
+🎥 Click the images to watch the demo video
+ +https://www.xiaohongshu.com/explore/artwork's ID
https://www.xiaohongshu.com/discovery/item/artwork's ID
https://xhslink.com/share code
https://www.xiaohongshu.com/explore/WorksID
https://www.xiaohongshu.com/discovery/item/WorksID
https://xhslink.com/ShareCode
The program supports entering multiple artwork links in a single input box, separated by spaces.
+Supports entering multiple content links at once, separated by spaces; the program will automatically extract valid links without additional processing!
⭐ Windows Terminal (Default terminal in Windows 11) is recommended to run the program for optimal display performance!
-If you only need to download watermark-free artwork files, Program Running is recommended; If you have other needs, Source Code Running is recommended!
-Users with Windows 10 or above can go to Releases download the program zip file, unzip it, open the program folder, and double-click main.exe
to run the program
If you use the program this way, the default download path for files is: .\_internal\Download
; configuration file path: .\_internal\settings.json
⭐ It is recommended to use the Windows Terminal (default terminal for Windows 11) to run the program for the best display effect!
+If you only need to download watermark-free content files, it is recommended to choose Program Run; if you have other needs, it is recommended to choose Source Code Run!
+It is recommended to set the cookie
parameter manually; if this parameter is not set, the program functions may not work properly!
Windows 10 and above users can go to Releases to download the program package, unzip it, open the program folder, and double-click to run main.exe
to use.
If you use the program in this way, the default download path for files is: .\_internal\Download
; the configuration file path is: .\_internal\settings.json
3.12
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt
to install the required modules for the programmain.py
to use the programIf your browser has installed Tampermonkey extension, add User script, and you can experience the project's functionalities without downloading the program!
-Tip: You can use the XHS-Downloader user script to extract artwork links in batches from web pages. Combine it with the XHS-Downloader program to achieve batch downloading of watermark-free artwork files!
-Dockerfile
docker pull joeanamier/xhs-downloader
docker run -it joeanamier/xhs-downloader:2.1
docker run -it joeanamier/xhs-downloader:2.1 python main.py server
When running the project via Docker, the command line call mode is not supported. The clipboard reading and clipboard monitoring functions are unavailable, but pasting content works fine. Please provide feedback if other features are not functioning properly!
+3.12
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt
to install the required modulesmain.py
to useThe project supports command line mode. If you want to download specific images from a text and image work, you can use this mode to set the image sequence number you want to download!
+You can use the command line to read cookies from the browser and write to the configuration file! Note that you need to close the browser to read the data!
+Command example: python .\main.py --browser_cookie Chrome --update_settings
The bool
type parameters support setting with true
, false
, 1
, 0
, yes
, no
, on
or off
(case insensitive).
Start: Run the command: python .\main.py server
Stop: Press Ctrl
+ C
to stop the server
Request endpoint:
+/xhs/
Request method:
+POST
Request format:
+JSON
Request parameters:
+Parameter | +Type | +Description | +Default | +
---|---|---|---|
url | +str | +Xiaohongshu content link, auto-extraction, does not support multiple links | +None | +
download | +bool | +Whether to download the content file; set to true will take more time |
+false | +
index | +list[int] | +Download specific image files by index, only effective for text and image works; not effective when the download parameter is set to false |
+null | +
skip | +bool | +Whether to skip content with download records; set to true will not return content data with download records |
+false | +
Code example:
++def api_demo(): + server = "http://127.0.0.1:8000/xhs/" + data = { + "url": "https://www.xiaohongshu.com/explore/123456789", + "download": True, + "index": [ + 3, + 6, + 9, + ], + } + response = requests.post(server, json=data) + print(response.json()) ++
If your browser has the Tampermonkey browser extension installed, you can add the user script to experience the project features without needing to download or install anything!
+ +After successfully installing the script, open the Xiaohongshu page, check the script instructions, and follow the prompts to operate.
Note: Using the XHS-Downloader user script to batch extract content links, in combination with the XHS-Downloader program, can achieve batch downloading of watermark-free content files!
If there are other requirements, you can call or modify the program refer to the comments in main.py
If you have other needs, you can perform code calls or modifications based on the comments in main.py
!
-# Example links -error_link = "https://github.com/JoeanAmier/XHS_Downloader" -demo_link = "https://www.xiaohongshu.com/explore/xxxxxxxxxx" -multiple_links = f"{demo_link} {demo_link} {demo_link}" -# Instance object -work_path = "D:\\" # Artwork data/file save root path, default value: project root path -folder_name = "Download" # Artwork file storage folder name (automatically created), default value: Download -user_agent = "" # Request Header: User-Agent -cookie = "" # Xiaohongshu web version Cookie, no need to log in -proxy = None # Network proxy -timeout = 5 # Request data timeout limit, unit: seconds, default value: 10 -chunk = 1024 * 1024 * 10 # When downloading files, the size of each data block obtained from the server each time, unit: bytes -max_retry = 2 # Maximum number of retries when requesting data fails, unit: seconds, default value: 5 -record_data = False # Whether to record artwork data to a file -image_format = "WEBP" # Graphic artwork file download format, supports: PNG, WEBP -folder_mode = False # Whether to store each artwork's file in a separate folder -async with XHS() as xhs: - pass # Use default parameters -async with XHS(work_path=work_path, - folder_name=folder_name, - user_agent=user_agent, - cookie=cookie, - proxy=proxy, - timeout=timeout, - chunk=chunk, - max_retry=max_retry, - record_data=record_data, - image_format=image_format, - folder_mode=folder_mode, - ) as xhs: # Use custom parameters - download = True # Whether to download artwork files, default value: False - # Return detailed information about the artwork, including download addresses - print(await xhs.extract(error_link, download)) # Return an empty dictionary when data retrieval fails - print(await xhs.extract(demo_link, download)) - print(await xhs.extract(multiple_links, download)) # Support input of multiple artwork links +async def example(): + """通过代码设置参数,适合二次开发""" + # 示例链接 + error_link = "https://github.com/JoeanAmier/XHS_Downloader" + demo_link = "https://www.xiaohongshu.com/explore/xxxxxxxxxx" + multiple_links = f"{demo_link} {demo_link} {demo_link}" + # 实例对象 + work_path = "D:\\" # 作品数据/文件保存根路径,默认值:项目根路径 + folder_name = "Download" # 作品文件储存文件夹名称(自动创建),默认值:Download + name_format = "作品标题 作品描述" + sec_ch_ua = "" # 请求头 Sec-Ch-Ua + sec_ch_ua_platform = "" # 请求头 Sec-Ch-Ua-Platform + user_agent = "" # User-Agent + cookie = "" # 小红书网页版 Cookie,无需登录,必需参数,登录状态对数据采集有影响 + proxy = None # 网络代理 + timeout = 5 # 请求数据超时限制,单位:秒,默认值:10 + chunk = 1024 * 1024 * 10 # 下载文件时,每次从服务器获取的数据块大小,单位:字节 + max_retry = 2 # 请求数据失败时,重试的最大次数,单位:秒,默认值:5 + record_data = False # 是否保存作品数据至文件 + image_format = "WEBP" # 图文作品文件下载格式,支持:PNG、WEBP + folder_mode = False # 是否将每个作品的文件储存至单独的文件夹 + async with XHS() as xhs: + pass # 使用默认参数 + async with XHS(work_path=work_path, + folder_name=folder_name, + name_format=name_format, + sec_ch_ua=sec_ch_ua, + sec_ch_ua_platform=sec_ch_ua_platform, + user_agent=user_agent, + cookie=cookie, + proxy=proxy, + timeout=timeout, + chunk=chunk, + max_retry=max_retry, + record_data=record_data, + image_format=image_format, + folder_mode=folder_mode, + ) as xhs: # 使用自定义参数 + download = True # 是否下载作品文件,默认值:False + # 返回作品详细信息,包括下载地址 + # 获取数据失败时返回空字典 + print(await xhs.extract(error_link, download, )) + print(await xhs.extract(demo_link, download, )) + # 支持传入多个作品链接 + print(await xhs.extract(multiple_links, download, ))
settings.json
in the project's root directory, generated automatically on the first run, and allows customization of certain runtime parameters
If your computer doesn't have a suitable program to edit JSON files, it is recommended to use JSON Online Tool to edit the content of the configuration file
+The settings.json
file in the root directory of the project is automatically generated on the first run and allows customization of some runtime parameters.
If invalid parameter values are set, the program will use the default values!
Parameters | +Parameter | Type | -Meaning | +Description | Default Value |
---|---|---|---|---|---|
work_path | str | -Artwork data/file save root path | +Root path for saving content data/files | Project root path | |
folder_name | str | -Artwork file storage folder name | +Name of the folder for storing content files | Download | |
name_format | +str | +Format for content file names. Separate fields with spaces. Supported fields: collects , comments , shares , likes , tags , ID , title , description , type , publish_time , last_update_time , author_nickname , author_id |
+publish_time author_nickname title |
+||
sec_ch_ua | +str | +Browser request header Sec-Ch-Ua | +Built-in Chrome Sec-Ch-Ua | +||
sec_ch_ua_platform | +str | +Browser request header Sec-Ch-Ua-Platform | +Built-in Chrome Sec-Ch-Ua-Platform | +||
user_agent | str | -Request Header: User-Agent | -Default UA | +Browser User Agent | +Built-in Chrome User Agent |
cookie | str | -Xiaohongshu web version Cookie,No need to log in, modification recommended | -Default Cookie | +Xiaohongshu web version cookie, login not required | +None |
proxy | -str | +str|dict | Set program proxy | null | |
timeout | int | -Request data timeout limit, unit: seconds | +Request data timeout limit, in seconds | 10 | |
chunk | int | -Size of each data block obtained from the server when downloading files, unit: bytes | +Size of data chunk to fetch from the server each time when downloading files, in bytes | 1048576(1 MB) | |
max_retry | int | -Maximum number of retries when requesting data fails, unit: seconds | +Maximum number of retries when requesting data fails | 5 | |
record_data | bool | -Whether to record artwork data to TXT file |
+Whether to save content data to a file, saved in SQLite format |
false | |
image_format | str | -Graphic and text artwork file download format, support: PNG 、WEBP |
+Download format for text and image content files, supported formats: PNG , WEBP |
PNG | |
image_download | +bool | +Switch for downloading text and image content files | +true | +||
video_download | +bool | +Switch for downloading video content files | +true | +||
live_download | +bool | +Switch for downloading animated image files | +false | +||
folder_mode | bool | -Whether to store each artwork's file in a separate folder; folder names are consistent with file names | +Whether to store each content's files in a separate folder; the folder name matches the file name | false | |
language | str | -Set programming language, currently support: zh_CN , en_GB |
+Set program language. Currently supported: zh_CN , en_GB |
zh_CN |
Additional Notes: The parameters sec_ch_ua
, sec_ch_ua_platform
, and user_agent
examples are provided for reference, and need to be set manually only if the program fails to fetch data!
F12
to open developer toolsConsole
document.cookie
then press Enter to confirmhttps://www.xiaohongshu.com/explore
F12
to open the developer toolsNetwork
tabPreserve log
Filter
input box, enter cookie-name:web_session
Fetch/XHR
filterNetwork
tab, select any data packet (if no packets appear, repeat step 7)XHS-Downloader will store the IDs of downloaded content in a database. When downloading the same content again, XHS-Downloader will automatically skip the file download (even if the content file does not exist). If you want to re-download the content file, please delete the corresponding content ID from the database and then use XHS-Downloader to download the content file again!
If XHS-Downloader is helpful, please consider giving it a Star ⭐, thank you for your support!
+If XHS-Downloader has been helpful to you, please consider giving it a Star ⭐. Thank you for your support!
Alipay | +微信(WeChat) | +支付宝(Alipay) | |
---|---|---|---|
If you wish, consider funding additional support for the XHS-Downloader!
-If you are willing, you may consider making a donation to provide additional support for XHS-Downloader!
+✨ Other Open Source Projects by the Author:
-If you contact me via email, I may not be able to check and respond promptly. I will do my best to reply to your email within seven days. If there are urgent matters or you need a faster response, please contact me through other means. Thank you for your understanding! -
-If you're interested in DouYin / TikTok, you can check out my other open-source project TikTokDownloader