Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caching of results, operating on external resources #5

Closed
disruptek opened this issue Mar 14, 2014 · 4 comments
Closed

caching of results, operating on external resources #5

disruptek opened this issue Mar 14, 2014 · 4 comments

Comments

@disruptek
Copy link

I've implemented a tool to solve the same problems (currently closed source) which uses ImageMagick to perform the transformations.

One feature that makes the service more powerful is the caching of the resulting image in an S3 bucket. That bucket can have various lifecycle attributes to auto-expire the cache. The service can return a HTTP/302 redirection to the cache object (or a CloudFront distribution pointing at the cache) or it can stream the object to the client itself.

My version takes an escaped URL as the source input. As the URL can point to either S3, CloudFront, or any other image on the web (or local host), it's quite versatile.

Here's an example URL that shows an escaped source URL which is built up with the following pseudocode, resulting in a URL which is more cacheable than something with a query string, yet still contains all the image transformations requested.

transforms = ['geometry=800x600', 'colors=256']
result = escape(imageurl)
for each operation in transforms
    append ';' + escape(operation) to result
return escape(result)

http://somewhere.com/halfshellesque/http%253A%252F%252Fwww.wallpaperswala.com%252Fwp-content%252Fgallery%252Fbill-gates%252Fcool-bill-gates.jpg%3Bgeometry%3D800x600%3Bcolors%3D256

That path info portion of the above URL is ideally hashed for use as the key in the S3 cache, and a simple test of the source URL's domain might allow one to choose the lifecycle of the cache item based on, for example, whether it is an image from your organization or from elsewhere on the web.

Anyway, I don't mean to hijack the project; I just thought I could contribute some experience and offer some ideas...

@rafikk
Copy link
Owner

rafikk commented Mar 16, 2014

Hi Andy,

Thanks for your feedback! I certainly don't think you're hijacking anything. I would like to provide some justification for the design. Portions of this should perhaps be moved into the README so the tradeoffs are made immediately clear.

Caching

The lack of a caching system is a feature. There are two possible caching options, a) using an in-memory/local cache or b) caching on an external/global store, like S3. Both of these have disadvantages, in that they limit throughput and complicate the retrieval logic. Further, it's trivial to set up a CDN and/or a proxy cache (e.g. Varnish) in front of Halfshell. Either option would do a better job caching than implementing a cache into Halfshell itself as they wouldn't limit memory or use up network resources that could be used to serve other requests.

Source Versatility

The decision to require configuration of each source (be it a filesystem, S3, or a URL root) is a feature and quite deliberate. At Oyster, the images that are served from Halfshell are publicly accessible (via a CDN), and so many of the design decisions are intended to limit access/abuse. For instance, we considered using something like the following for the URL scheme:

http://localhost:8080/<s3_bucket>/<s3_url>

This would have been easier than setting up a separate route for each S3 bucket but would have the disadvantage that we're now exposing a resizing server that can be (ab)used to process images from any S3 bucket. Other examples of imposing limits are the max_width and max_height settings. We store high resolution images in S3 that we're contractually barred from distributing.

I appreciate the feedback and I think this is a very worthwhile conversation. We've tried to design Halfshell to be quite versatile but have chosen to use explicit configuration to set the boundaries of that versatility rather than exposing a service where all configuration is established in the request.

I'd like to keep that philosophy in tact, but continue to make Halfshell full-featured. If you have any feature suggestions or patches, feel free to contribute.

Thanks again.

@disruptek
Copy link
Author

Your explanation hints around the reasons my solution is closed-source; it's really a back-end service that drops into existing infrastructure and so doesn't meet the same goals.

Have you given any thought to wrapping the ImageMagick bindings more thinly? I assume that was also a decision motivated by security concerns...

Perhaps I'll contribute an extension to Halfshell to meet my needs; I wouldn't mind piggy-backing on someone else's hard work. :-)

Thanks for the detailed response.

@rafikk
Copy link
Owner

rafikk commented Mar 17, 2014

The ImageMagick bindings are currently the least considered part of the design. In fact, I'm considering doing away with ImageMagick altogether. I've started investigating using libvips or GraphicsMagick and have a preliminary implementation. What do you mean about wrapping it more lightly? Even if we continue to use ImageMagick, one of the things I'd like to do is use the MagickCore API instead of MagickWand.

If you have an opinion on the matter, I'd like to hear it.

Can we close this issue and open a new one to discuss the image bindings?

@rafikk
Copy link
Owner

rafikk commented Mar 17, 2014

Closing and moving to #8.

@rafikk rafikk closed this as completed Mar 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants