Skip to content

Commit

Permalink
Several improvements (#335)
Browse files Browse the repository at this point in the history
- Add client timeout for telegra.ph
- Log pooling errors
- Warcraft supports timeout and return waiting error
- Telegra.ph performance improvement
- Upload artifact remotely with timeout
- Remotely file upload with separate function
- Throw a fatal error if the command-line flag value is not specified
- Replace os.Tempdir with testing.T.TempDir
- Replace ioutil.ReadAll with io.ReadAll
- Add storage testing
- Change default ipfs port to 5001
- Place ipfs related environments for testing
- Wrap testing using t.Run
 - Add `chromedp.NoModifyURL` compatibility
  • Loading branch information
waybackarchiver authored Mar 21, 2023
1 parent 142f095 commit 58f49f2
Show file tree
Hide file tree
Showing 24 changed files with 324 additions and 216 deletions.
11 changes: 8 additions & 3 deletions .github/workflows/testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,9 @@ jobs:
SENDER_PWD: ${{ secrets.MATRIX_SENDER_PWD }}
RECVER_UID: ${{ secrets.MATRIX_RECVER_UID }}
RECVER_PWD: ${{ secrets.MATRIX_RECVER_PWD }}
WAYBACK_IPFS_MODE: ${{ vars.WAYBACK_IPFS_MODE }}
WAYBACK_IPFS_HOST: ${{ vars.WAYBACK_IPFS_HOST }}
WAYBACK_IPFS_PORT: ${{ vars.WAYBACK_IPFS_PORT }}
steps:
- name: Harden Runner
uses: step-security/harden-runner@2e205a28d0e1da00c5f53b161f4067b052c61f34 # v1.5.0
Expand Down Expand Up @@ -92,7 +95,7 @@ jobs:
with:
args: -h

- name: Install Packages
- name: Install Packages for Linux
if: matrix.os == 'ubuntu-latest'
shell: bash
run: |
Expand All @@ -105,7 +108,7 @@ jobs:
you-get --version
ffmpeg -version
- name: Install Packages
- name: Install Packages for MacOS
if: matrix.os == 'macos-latest'
shell: bash
run: |
Expand All @@ -115,7 +118,7 @@ jobs:
you-get --version
ffmpeg -version
- name: Install Packages
- name: Install Packages for Windows
if: matrix.os == 'windows-latest'
shell: bash
run: |
Expand All @@ -129,6 +132,8 @@ jobs:
- name: Set environments
shell: bash
run: |
ipfsMode="${{ vars.WAYBACK_IPFS_MODE }}"
echo "WAYBACK_IPFS_MODE=${ipfsMode:-daemon}" >> $GITHUB_ENV
# Set env to enable reduxer
echo "WAYBACK_STORAGE_DIR=${{ runner.temp }}" >> $GITHUB_ENV
# Append paths to environment path
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ You can also specify configuration options either via command flags or via envir
| - | `LOG_LEVEL` | `info` | Log level, supported level are `debug`, `info`, `warn`, `error`, `fatal`, defaults to `info` |
| - | `ENABLE_METRICS` | `false` | Enable metrics collector |
| - | `WAYBACK_LISTEN_ADDR` | `0.0.0.0:8964` | The listen address for the HTTP server |
| - | `CHROME_REMOTE_ADDR` | - | Chrome/Chromium remote debugging address, for screenshot |
| - | `CHROME_REMOTE_ADDR` | - | Chrome/Chromium remote debugging address, for screenshot, format: `host:port`, `wss://domain.tld` |
| - | `WAYBACK_POOLING_SIZE` | `3` | Number of worker pool for wayback at once |
| - | `WAYBACK_BOLT_PATH` | `./wayback.db` | File path of bolt database |
| - | `WAYBACK_STORAGE_DIR` | - | Directory to store binary file, e.g. PDF, html file |
Expand Down
2 changes: 1 addition & 1 deletion cmd/wayback/serve.go
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ func (srv *services) run(ctx context.Context, opts service.Options) *services {
name: s,
})
default:
logger.Error("unrecognize %s in `--daemon`", s)
logger.Fatal("unrecognize %s in `--daemon`", s)
}
}

Expand Down
2 changes: 1 addition & 1 deletion config/options.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ const (
defOverTor = false

defIPFSHost = "127.0.0.1"
defIPFSPort = 4001
defIPFSPort = 5001
defIPFSMode = "pinner"
defIPFSTarget = ""
defIPFSApikey = ""
Expand Down
15 changes: 15 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,21 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- No longer build image for `linux/s390x`
- Get rid of the Tor binary ([#336](https://github.com/wabarc/wayback/pull/336))
- Adjusting lux to pluggable mode ([#337](https://github.com/wabarc/wayback/pull/337))
- Several improvements ([#335](https://github.com/wabarc/wayback/pull/335))
- Add client timeout for telegra.ph
- Log pooling errors
- Warcraft supports timeout and return waiting error
- Telegra.ph performance improvement
- Upload artifact remotely with timeout
- Remotely file upload with separate function
- Throw a fatal error if the command-line flag value is not specified
- Replace os.Tempdir with testing.T.TempDir
- Replace ioutil.ReadAll with io.ReadAll
- Add storage testing
- Change default ipfs port to 5001
- Place ipfs related environments for testing
- Wrap testing using t.Run
- Add `chromedp.NoModifyURL` compatibility

### Fixed
- Fix semgrep scan workflow ([#312](https://github.com/wabarc/wayback/pull/312))
Expand Down
5 changes: 5 additions & 0 deletions docs/resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,8 @@ From the popular Wayback Machine to lesser-known platforms, there is something h
## Wiki

- [Web archiving](https://en.wikipedia.org/wiki/Web_archiving)

## Tools

- [Browserless](https://www.browserless.io/): Web Automation & Headless Browser Automation Tool.

20 changes: 10 additions & 10 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,16 @@ module github.com/wabarc/wayback
go 1.18

require (
github.com/PuerkitoBio/goquery v1.8.0
github.com/PuerkitoBio/goquery v1.8.1
github.com/bwmarrin/discordgo v0.23.3-0.20210627161652-421e14965030
github.com/cretz/bine v0.2.0
github.com/davecgh/go-spew v1.1.1
github.com/dghubble/go-twitter v0.0.0-20201011215211-4b180d0cc78d
github.com/dghubble/oauth1 v0.7.1
github.com/dstotijn/go-notion v0.6.1
github.com/dustin/go-humanize v1.0.0
github.com/fatih/color v1.13.0
github.com/gabriel-vasile/mimetype v1.4.1
github.com/fatih/color v1.15.0
github.com/gabriel-vasile/mimetype v1.4.2
github.com/go-shiori/go-readability v0.0.0-20220215145315-dd6828d2f09b
github.com/go-shiori/obelisk v0.0.0-20221119111008-23c015a8fad7
github.com/google/go-github/v40 v40.0.0
Expand All @@ -38,15 +38,15 @@ require (
github.com/wabarc/archive.org v1.2.1-0.20210708220121-cb9b83ff9896
github.com/wabarc/go-anonfile v0.1.0
github.com/wabarc/go-catbox v0.1.0
github.com/wabarc/helper v0.0.0-20230209075818-96584f1ebf9d
github.com/wabarc/helper v0.0.0-20230318095659-969de9ddf4b6
github.com/wabarc/imgbb v1.0.0
github.com/wabarc/ipfs-pinner v1.1.1-0.20220126131044-16299c0dd43d
github.com/wabarc/logger v0.0.0-20210730133522-86bd3f31e792
github.com/wabarc/playback v0.0.0-20220715111526-90d0327d3f04
github.com/wabarc/rivet v0.1.4-0.20221226142645-ebc8a29d914f
github.com/wabarc/screenshot v1.6.0
github.com/wabarc/telegra.ph v0.0.0-20221226141851-edf1cc14c076
github.com/wabarc/warcraft v0.2.2-0.20211107142816-7beea5a75ab5
github.com/wabarc/screenshot v1.6.1-0.20230315004517-7587f8bc14e0
github.com/wabarc/telegra.ph v0.0.0-20230318134541-a0922e1ace3a
github.com/wabarc/warcraft v0.3.1-0.20230308125707-3daa5592ba52
go.etcd.io/bbolt v1.3.6
golang.org/x/net v0.8.0
golang.org/x/sync v0.1.0
Expand All @@ -68,8 +68,8 @@ require (
github.com/cenkalti/backoff/v4 v4.2.0 // indirect
github.com/cespare/xxhash/v2 v2.2.0 // indirect
github.com/cheggaaa/pb/v3 v3.0.8 // indirect
github.com/chromedp/cdproto v0.0.0-20221126224343-3a0787b8dd28 // indirect
github.com/chromedp/chromedp v0.8.6 // indirect
github.com/chromedp/cdproto v0.0.0-20230310204135-a6d692f2c96d // indirect
github.com/chromedp/chromedp v0.9.1 // indirect
github.com/chromedp/sysutil v1.0.0 // indirect
github.com/crackcomm/go-gitignore v0.0.0-20170627025303-887ab5e44cc3 // indirect
github.com/decred/dcrd/crypto/blake256 v1.0.0 // indirect
Expand Down Expand Up @@ -97,7 +97,7 @@ require (
github.com/itchyny/timefmt-go v0.1.3 // indirect
github.com/josharian/intern v1.0.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/kallydev/telegraph-go v1.0.0 // indirect
github.com/kallydev/telegraph-go v1.0.1-0.20230318133700-df034d9eed50 // indirect
github.com/kennygrant/sanitize v1.2.4 // indirect
github.com/kkdai/youtube/v2 v2.7.18 // indirect
github.com/klauspost/cpuid/v2 v2.2.2 // indirect
Expand Down
Loading

0 comments on commit 58f49f2

Please sign in to comment.