Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc: Prepare for release #188

Merged
merged 1 commit into from
Apr 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 14 additions & 13 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
[package]
name = "suckit"
version = "0.1.2"
version = "0.2.0"
edition = "2018"
authors = ["Esteban \"Skallwar\" Blanc <[email protected]>",
"Arthur \"CohenArthur\" Cohen <[email protected]>"]
"Arthur \"CohenArthur\" Cohen <[email protected]>"]
license = "MIT OR Apache-2.0"
homepage = "https://github.com/skallwar/suckit"
repository = "https://github.com/skallwar/suckit"
Expand All @@ -20,7 +20,7 @@ include = [
]

[package.metadata]
msrv = "1.44.1"
msrv = "1.49.0"

[lib]
name = "suckit"
Expand Down
65 changes: 49 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
[![Deps](https://deps.rs/repo/github/Skallwar/suckit/status.svg)](https://deps.rs/repo/github/Skallwar/suckit)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
![MSRV](https://img.shields.io/badge/MSRV-1.46.0-blue)
![MSRV](https://img.shields.io/badge/MSRV-1.49.0-blue)

# SuckIT

Expand All @@ -24,21 +24,54 @@ your disk.
* [ ] Saves application state on CTRL-C for later pickup

# Options

|Option|Behavior|
|---|---|
|`-h, --help`|Displays help information|
|`-v, --verbose`|Activate Verbose output|
|`-d, --depth`|Specify the level of depth to go to when visiting the website. Default is -1 (infinity)|
|`--ext-depth`|Specify the level of depth to go to when visiting websites that have a different domain name. Default is 0 (ignore external links), -1 is infinity|
|`-j, --jobs`|Number of threads to use|
|`-o, --output`|Output directory where the downloaded files are written|
|`-t, --tries`|Number of times to retry when the downloading of a page fails|
|`-u, --user-agent`|User agent to be used for sending requests|
|`-i, --include`|Specify a regex to include pages that match this pattern|
|`-e, --exclude`|Specify a regex to exclude pages that match this pattern|
|`-a, --auth`|Provide usernames and passwords for the downloader to use|
|`--dry-run`|Do everything without saving the files to the disk|
```console
USAGE:
suckit [FLAGS] [OPTIONS] <url>

FLAGS:
-c, --continue-on-error Flag to enable or disable exit on error
--dry-run Do everything without saving the files to the disk
-h, --help Prints help information
-V, --version Prints version information
-v, --verbose Enable more information regarding the scraping process
--visit-filter-is-download-filter Use the dowload filter in/exclude regexes for visiting as well

OPTIONS:
-a, --auth <auth>...
HTTP basic authentication credentials space-separated as "username password host". Can be repeated for
multiple credentials as "u1 p1 h1 u2 p2 h2"
--delay <delay>
Add a delay in seconds between downloads to reduce the likelihood of getting banned [default: 0]

-d, --depth <depth>
Maximum recursion depth to reach when visiting. Default is -1 (infinity) [default: -1]

-e, --exclude-download <exclude-download>
Regex filter to exclude saving pages that match this expression [default: $^]

--exclude-visit <exclude-visit>
Regex filter to exclude visiting pages that match this expression [default: $^]

--ext-depth <ext-depth>
Maximum recursion depth to reach when visiting external domains. Default is 0. -1 means infinity [default:
0]
-i, --include-download <include-download>
Regex filter to limit to only saving pages that match this expression [default: .*]

--include-visit <include-visit>
Regex filter to limit to only visiting pages that match this expression [default: .*]

-j, --jobs <jobs> Maximum number of threads to use concurrently [default: 1]
-o, --output <output> Output directory
--random-range <random-range>
Generate an extra random delay between downloads, from 0 to this number. This is added to the base delay
seconds [default: 0]
-t, --tries <tries> Maximum amount of retries on download failure [default: 20]
-u, --user-agent <user-agent> User agent to be used for sending requests [default: suckit]

ARGS:
<url> Entry point of the scraping
```

# Example

Expand Down