Skip to content

flinkt/pywb-proxy-demo

 
 

Repository files navigation

pywb-proxy-demo

This project provides a demo for using the pywb web replay tools as an HTTP/S proxy mode.

It demonstrates the following new features in upcoming release of pywb:

  • HTTPS proxy mode support (with trusted certificate)
  • Capture time switiching proxy mode.
  • Archive collection switching in proxy mode.
  • POST request replay (more-info)
  • Optional banner or unaltered replay (banner enabled in the demo).
  • HTTPS -> HTTP rewriting when accessing via http
  • Compatible with standard WSGI container running on a single port (uWSGI preferred)

Usage

  1. Clone and install with pip install -r requirements.txt

  2. Start the demo by running ./run-proxy-demo.sh

  3. Set the browser settings to localhost:9080 by default. Set this in the browser's HTTP/S proxy as well.

  4. Browse to the special url http://pywb.proxy/ to get to the pywb HTTPS certificate download page. Then, follow the instructions on the page to download either the all-platform/Firefox version or Windows-specific version of the pywb root certificate. Follow browser instructions and grant certificate rights to authenticate websites.

    If there are issues with the process, the certificate pywb_ca.pem (located in ./ca this repository) can also be added manually into the browser. This certificate must be added as a trusted certificate for verifying websites in order for HTTPS replay to work. In a sense, it is granting pywb the right to server https sites for replay.

    (This step only needs to be done once per browser)

  5. This demo includes two urls which can be browsed in multiple collections and at multiple capture times:

    https://twitter.com/netpreserve

    https://plus.google.com/communities/105126210690761809187

(Note: The sites can actually be accessed via https (if configured) or http, but browser will likely force https, especially on the first url. When accessing via http, all links are rewritten to be http only as well)

When first accessing a url, the collection selection page will be shown. A link to this page will be available in the banner to allow switching of collections at any time.

In this demo, several collections are defined to have multiple combinations of captures (all of one, all older, all newer and all captures). This allows the user to try switching collections and switiching replay times within a collection, all while in proxy mode.

The collections are defined as follows, from config.yaml

collections:
    all: ./samples/cdx/

    older:
        - ./samples/cdx/twitter-1.cdx
        - ./samples/cdx/gplus-1.cdx

    newer:
        - ./samples/cdx/twitter-2.cdx
        - ./samples/cdx/gplus-2.cdx

    twitter-only:
        - ./samples/cdx/twitter-1.cdx
        - ./samples/cdx/twitter-2.cdx

    gplus-only:
        - ./samples/cdx/gplus-1.cdx
        - ./samples/cdx/gplus-2.cdx

Although not the intent of this demo, the archives can also be accessed in non-proxy mode as well: eg. http://localhost:8090/all/... , http://localhost:9080/older/... when HTTP/S proxy is disabled

More Info

Additional information about proxy mode may be found on the pywb project wiki: https://github.com/ikreymer/pywb/wiki/Pywb-Proxy-Mode-Usage

Additional info about POST request replay: https://github.com/ikreymer/pywb/wiki/POST-request-replay

About

Demo of pywb usage as HTTP/S Proxy for Web Replay

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 100.0%