Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a proxy flag to customize URLs for rewriting #315

Closed
ibnesayeed opened this issue Dec 8, 2017 · 6 comments
Closed

Add a proxy flag to customize URLs for rewriting #315

ibnesayeed opened this issue Dec 8, 2017 · 6 comments
Assignees

Comments

@ibnesayeed
Copy link
Member

Inferring the base URL of the service from the [default or supplied] HOST and PORT is a good default, but there should be an ability to customize it independent of the HOST and PORT config. This is important when the service is running behind a reverse proxy or in a container. With a config like this, one should be able to generate URLs like http://example.com/foo even if the service is running on localhost:5000

In MemGator, we do it something like this:

	baseURL = strings.TrimRight(baseURL, "/")
	if *proxy == "http://{HOST}[:{PORT}]{ROOT}" {
		*proxy = baseURL
	} else {
		*proxy = strings.TrimRight(*proxy, "/")
	}
@ibnesayeed
Copy link
Member Author

Please feel free to add me in the reviewers when you create a PR for this. I want to make sure that this is done right as it could be a little tricky.

@machawk1
Copy link
Member

machawk1 commented Dec 9, 2017

Ping @ibnesayeed. This is mostly implemented. Please check the below example. I also need to apply this to the CDXJ TM endpoint and any other places you suggest.

$ ipwb index ipwb/samples/warcs/5mementos.warc | ipwb replay -p ipwb.matkelly.com:9000
Processing WARC records in 5mementos.warc complete
Proxying to ipwb.matkelly.com:9000
IPWB replay started on http://localhost:5000
$ curl -i localhost:5000/timemap/link/memento.us
HTTP/1.0 200 OK
Content-Type: application/link-format
Content-Length: 835
Server: InterPlanetary Wayback Replay/0.2017.12.09.0845
Date: Sat, 09 Dec 2017 14:57:30 GMT

<memento.us/>; rel="original",
<http://ipwb.matkelly.com:9000/timemap/link/memento.us>; rel="self timemap"; type="application/link-format",
<http://ipwb.matkelly.com:9000/timemap/cdxj/memento.us>; rel="timemap"; type="application/cdxj+ors",
<http://ipwb.matkelly.com:9000/20130202100000/memento.us/>; rel="first memento"; datetime="Sat, 02 Feb 2013 10:00:00 GMT",
<http://ipwb.matkelly.com:9000/20140114100000/memento.us/>; rel="memento"; datetime="Tue, 14 Jan 2014 10:00:00 GMT",
<http://ipwb.matkelly.com:9000/20140115101500/memento.us/>; rel="memento"; datetime="Wed, 15 Jan 2014 10:15:00 GMT",
<http://ipwb.matkelly.com:9000/20161231110000/memento.us/>; rel="memento"; datetime="Sat, 31 Dec 2016 11:00:00 GMT",
<http://ipwb.matkelly.com:9000/20161231110001/memento.us/>; rel="last memento"; datetime="Sat, 31 Dec 2016 11:00:01 GMT"

UPDATE: Above CI fixes were just re:pep8, since fixed in 1957f01

@ibnesayeed
Copy link
Member Author

ibnesayeed commented Dec 9, 2017

I would propose a few changes in the approach:

  • Accept proper URI with protocol (and paths or trailing slashes) as proxy rather than a combination of just the host name and port. This will allow a more flexible setup (such as HTTPS termination at reverse proxy).
  • I am not sure if there is any flag to change the PORT, if you plan to allow that, you might want to preserve -p flag for that and use -P (upper case) for proxy instead.
  • Make the default host 0.0.0.0 instead of localhost to make the service accessible from outside out of the box. This is almost essential when the service is running inside a container. (as per Change default replay listening IP to 0.0.0.0 #235)
  • Do not check whether proxy is None in various places, instead, immediately after parsing flags, create a variable, say, baseurl and populate it with the value of proxy flag if not None . Otherwise populate baseurl with a combination of protocol, HOST, and PORT (do not include PORT if it is the default for the corresponding protocol (e.g., 80 for HTTP)). Once this baseurl is set properly, it can be used everywhere in the code where service's URLs are generated (without repeatedly checking conditions).

@machawk1
Copy link
Member

machawk1 commented Dec 9, 2017

@ibnesayeed The first two bullets have been implemented.

#235 will not be completed for this ticket but I have mentally bumped it up in priority due to the dependency for Dockerization (#70).

The fourth bullet is mostly stylistic. An issue remains in that main.py does the argparsing and replay.py setting the host.

@ibnesayeed
Copy link
Member Author

The fourth bullet was to DRY the code to make it more maintainable. Repeating the same logic in many places would make it difficult to faithfully make changes in those areas in future without the fear of breaking it.

@machawk1
Copy link
Member

machawk1 commented Dec 9, 2017

Sounds like an optimization. This ticket is to introduce the proxy mode. Some rearchitecting is needed to DRY. Hopefully the new proxy mode will inform that rearchitecting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants