Skip to content

Throw all URIs in a page on to Wayback Machine savepagenow from CLI.

License

Notifications You must be signed in to change notification settings

victoriatomzik/wbsv-cli

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wbsv

PyPI version Codacy Badge Maintainability MIT License Downloads Downloads Downloads

wbsv("Wabisavi", "わびさび", stands for "WayBack machine SavepageNow") is…

CLI tool for saving webpage on Wayback Machine forever. Enables you to save all URIs in a webpage forever on Wayback Machine.

Try now

You can try this tool on Google Cloud Shell. (First, sudo pip3 install -e .)

Open in Cloud Shell

DEMO

demo.gif

Install

$ pip install wbsv # Python3.0+

Run & Examples

Help

$ wbsv -h
wbsv 0.1.5
CLI tool for save webpage on Wayback Machine forever.
Save webpage and one's all URI(s) on Wayback Machine.

Usage:
    wbsv [options] <url1> <url2> ... <urln>

Args:
    <urls>                      Saving pages in order.
    no arg                      Launch Interactive mode.
                                (To quit interactive mode,
                                 type "end", "exit", "exit()",
                                 "break", "bye", ":q" or "finish".)

Options:
    -h, --help                  Show help and exit.
    -v, --version               Show version and exit.
    -r, --retry <times>         Set a retry limit on failed save.
    -t, --only-target           Save just target webpage(s).
    -L, --level <depth>          Set maximum recursion depth.

Interactive mode

$ wbsv
[[Input a target url (ex: https://google.com)]]
>>> https://tsukuba.ac.jp
[!]Now: https://tsukuba.ac.jp
[!]class 'urllib.error.URLError'
[!]urlopen error [Errno -2] Name or service not known
[!]traceback object at 0x7eff0d207188
[[Input a target url (ex: https://google.com)]]
>>> https://www.u.tsukuba.ac.jp
[+]Now: https://www.u.tsukuba.ac.jp
87 URI(s) found.
[01]: <NOW> https://web.archive.org/web/20200123135244/https://www.u.tsukuba.ac.jp/20180622terminals/
[02]: <NOW> https://web.archive.org/web/20200123135247/https://www.u.tsukuba.ac.jp/
[03]: <NOW> https://web.archive.org/web/20200123135250/https://www.u.tsukuba.ac.jp/anti-virus/
...
[85]: <NOW> https://web.archive.org/web/20200123140917/https://www.u.tsukuba.ac.jp/snapshot/
[86]: <FAIL> https://www.u.tsukuba.ac.jp/wp-json/oembed/1.0/embed?url=https%3A%2F%2Fwww.u.tsukuba.ac.jp%2F&format=xml
[87]: <FAIL> https://www.u.tsukuba.ac.jp/info_lit/tebiki.html
[+]FIN!: https://www.u.tsukuba.ac.jp
[+]ALL: 87 SAVE: 61 FAIL: 21
[+]To exit, use CTRL+C or type 'end'
[[Input a target url (ex: https://google.com)]]
>>> exit
$

From stdin

$ wbsv https://tsumanne.net https://tsumanne.net/ct
[+]Now: https://tsumanne.net
9 URI(s) found.
[1]: <NOW> https://web.archive.org/web/20200123194439/https://tsumanne.net
...
[9]: <FAIL> https://tsumanne.net/src/iphone.png
[+]FIN!: https://tsumanne.net
[+]ALL: 9 SAVE: 5 FAIL: 4
[+]Now: https://tsumanne.net/ct
7 URI(s) found.
[1]: <NOW> https://web.archive.org/web/20200123194602/https://tsumanne.net/ct/?cat=&of=25
...
[7]: <FAIL> https://tsumanne.net/src/site.js
[+]FIN!: https://tsumanne.net/ct
[+]ALL: 7 SAVE: 5 FAIL: 2
$

Increase limit of retry

$ wbsv https://tsumanne.net --retry 10

VERSION

wbsv 0.1.5

LISENCE

MIT

Author

eggplants (haruna)

About

Throw all URIs in a page on to Wayback Machine savepagenow from CLI.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 91.5%
  • Shell 8.5%