caching of results, operating on external resources #5

disruptek · 2014-03-14T21:02:02Z

I've implemented a tool to solve the same problems (currently closed source) which uses ImageMagick to perform the transformations.

One feature that makes the service more powerful is the caching of the resulting image in an S3 bucket. That bucket can have various lifecycle attributes to auto-expire the cache. The service can return a HTTP/302 redirection to the cache object (or a CloudFront distribution pointing at the cache) or it can stream the object to the client itself.

My version takes an escaped URL as the source input. As the URL can point to either S3, CloudFront, or any other image on the web (or local host), it's quite versatile.

Here's an example URL that shows an escaped source URL which is built up with the following pseudocode, resulting in a URL which is more cacheable than something with a query string, yet still contains all the image transformations requested.

transforms = ['geometry=800x600', 'colors=256']
result = escape(imageurl)
for each operation in transforms
    append ';' + escape(operation) to result
return escape(result)

http://somewhere.com/halfshellesque/http%253A%252F%252Fwww.wallpaperswala.com%252Fwp-content%252Fgallery%252Fbill-gates%252Fcool-bill-gates.jpg%3Bgeometry%3D800x600%3Bcolors%3D256

That path info portion of the above URL is ideally hashed for use as the key in the S3 cache, and a simple test of the source URL's domain might allow one to choose the lifecycle of the cache item based on, for example, whether it is an image from your organization or from elsewhere on the web.

Anyway, I don't mean to hijack the project; I just thought I could contribute some experience and offer some ideas...

The text was updated successfully, but these errors were encountered:

rafikk · 2014-03-16T16:42:42Z

Hi Andy,

Thanks for your feedback! I certainly don't think you're hijacking anything. I would like to provide some justification for the design. Portions of this should perhaps be moved into the README so the tradeoffs are made immediately clear.

Caching

The lack of a caching system is a feature. There are two possible caching options, a) using an in-memory/local cache or b) caching on an external/global store, like S3. Both of these have disadvantages, in that they limit throughput and complicate the retrieval logic. Further, it's trivial to set up a CDN and/or a proxy cache (e.g. Varnish) in front of Halfshell. Either option would do a better job caching than implementing a cache into Halfshell itself as they wouldn't limit memory or use up network resources that could be used to serve other requests.

Source Versatility

The decision to require configuration of each source (be it a filesystem, S3, or a URL root) is a feature and quite deliberate. At Oyster, the images that are served from Halfshell are publicly accessible (via a CDN), and so many of the design decisions are intended to limit access/abuse. For instance, we considered using something like the following for the URL scheme:

http://localhost:8080/<s3_bucket>/<s3_url>

This would have been easier than setting up a separate route for each S3 bucket but would have the disadvantage that we're now exposing a resizing server that can be (ab)used to process images from any S3 bucket. Other examples of imposing limits are the max_width and max_height settings. We store high resolution images in S3 that we're contractually barred from distributing.

I appreciate the feedback and I think this is a very worthwhile conversation. We've tried to design Halfshell to be quite versatile but have chosen to use explicit configuration to set the boundaries of that versatility rather than exposing a service where all configuration is established in the request.

I'd like to keep that philosophy in tact, but continue to make Halfshell full-featured. If you have any feature suggestions or patches, feel free to contribute.

Thanks again.

disruptek · 2014-03-16T17:36:45Z

Your explanation hints around the reasons my solution is closed-source; it's really a back-end service that drops into existing infrastructure and so doesn't meet the same goals.

Have you given any thought to wrapping the ImageMagick bindings more thinly? I assume that was also a decision motivated by security concerns...

Perhaps I'll contribute an extension to Halfshell to meet my needs; I wouldn't mind piggy-backing on someone else's hard work. :-)

Thanks for the detailed response.

rafikk · 2014-03-17T16:11:39Z

The ImageMagick bindings are currently the least considered part of the design. In fact, I'm considering doing away with ImageMagick altogether. I've started investigating using libvips or GraphicsMagick and have a preliminary implementation. What do you mean about wrapping it more lightly? Even if we continue to use ImageMagick, one of the things I'd like to do is use the MagickCore API instead of MagickWand.

If you have an opinion on the matter, I'd like to hear it.

Can we close this issue and open a new one to discuss the image bindings?

rafikk · 2014-03-17T17:21:26Z

Closing and moving to #8.

rafikk added question labels Mar 17, 2014

rafikk closed this as completed Mar 17, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

caching of results, operating on external resources #5

caching of results, operating on external resources #5

disruptek commented Mar 14, 2014

rafikk commented Mar 16, 2014

disruptek commented Mar 16, 2014

rafikk commented Mar 17, 2014

rafikk commented Mar 17, 2014

caching of results, operating on external resources #5

caching of results, operating on external resources #5

Comments

disruptek commented Mar 14, 2014

rafikk commented Mar 16, 2014

Caching

Source Versatility

disruptek commented Mar 16, 2014

rafikk commented Mar 17, 2014

rafikk commented Mar 17, 2014