ptt-image-crawler

Overview

ptt-image-crawler is a web crawling tool that crawls images/photos from PTT (a bulletin board system in Taiwan). You can specify the board, pages, path, and even the number of threads you want to use for crawling.

Installation

Clone this repository:

git clone https://github.com/JudeTe/ptt-image-crawler.git

Install the required packages:

python -m pip install -r requirements.txt

Usage

crawler.py [-h] [--board nba] [-i 50 100] [--path C://] [--dir nba] [--thread 10]

optional arguments:

-h, --help show the help message and exit
-b nba, --board nba specify the board you want to download (default: 'beauty')
-i 50 100 specify start and end page you want to download in the given board (default: 0 to 1)
-p C://, --path C:// specify the path for storing the file (default: './')
-d nba --dir nba specify the directory name for storing the file (default: '{board name}')
-t 10, --thread 10 specify how many threads to use for running the program. (default: numbers of your core)

Custom arguments example:

python crawler.py -b nba -i 50 100 -p ./ -d nba -t 10

P.S. If the number of threads is not specified, the default is to use the number of cores in the current system as the number of threads to be used.

Quick Start

python crawler.py

Issue reporting

If you discover an issue with ptt-image-crawler, please report it at https://github.com/JudeTe/ptt-image-crawler/issues, thanks!

License

This project is licensed under the MIT License - see the LICENSE.txt file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
pic		pic
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ptt-image-crawler

Overview

Installation

Usage

Quick Start

Issue reporting

License

About

Releases 1

Packages

Contributors 2

Languages

License

JudeTe/ptt-image-crawler

Folders and files

Latest commit

History

Repository files navigation

ptt-image-crawler

Overview

Installation

Usage

Quick Start

Issue reporting

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages