Skip to content

simonfrey/save_to_web.archive.org

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Save to web.archive.org logo


Like my work?

Tip me


Description

Scrapes the given website for internal links and saves the found ones into web.archive.org

Installation

I assume you have already installed go. (Go installation manual)

Dependencies

Download the dependecies via go get

Execute the following two commands:

go get -u github.com/simonfrey/proxyfy
go get -u github.com/PuerkitoBio/goquery

Download tool

Just clone the git repo

git clone https://github.com/simonfrey/save_to_web.archive.org.git

Execution

Navigate into the directory of the git repo.

Execute with:

Please Replace http[s]://[yourwebsite.com] with the url of the website you want to scrape and save.

go run main.go http[s]://[yourwebsite.com]

****Additional commandline arguments:

-p for proxyfing the requests

-i for also crawling internal urls (e.g. /test/foo)

So if you want to use the tool with also crawling interal links and use a proxy for that it would be the following command

go run main.go -p -i http[s]://[yourwebsite.com] 

About

Scrapes the given website for internal links and saves the found ones into https://web.archive.org/

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages