Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Fix invalid image URLs not being cached and causing timeouts #8986

Merged
merged 8 commits into from
Sep 12, 2017

Conversation

aileen
Copy link
Member

@aileen aileen commented Sep 7, 2017

refs #8868

  • swapped the usage of got for requests with the request util, which validates a URL before doing a request and resolving the Promises in image-size so it gets saved in the image-size cache.
  • Uses catch predicates instead of conditionals in getImageSizeFromUrl
  • Return NotFoundError if applicable in getImageSizeFromFilePath as the caller function cachedImageSizeFromUrl is differentiating those between this error and others.
  • Using ImageObject as a global var resulted in having the url property being the same for all requests coming in.
  • The logic that checked for an existing protocol (e. g. gravatar URLs) was overly complicated. Refactored it to be more simple.
  • Passing the correct value to fetchDimensionsFromBuffer as the population of imageObject.url happens there. These are used in our structured data and need to be full URLs (in case of locally stored files) or the original URL (in case of URLs missing the protocol)
  • Added two more debug logs in getCachedImageSizeFromUrl so it's logged when an image is added to the cache even tho it was returned as error.
  • We're always resolving the result in getCachedImageSizeFromUrl, so there's no need to return the values with a Promise.resolve(). The caller fn uses waits for the Promises to be fulfilled.

Todos:

  • investigate and refactor usage of Promise.resolve() in cached image size
  • Use predicate catches in image-size
  • Add NotFoundError handling to storage.read request, as the calling fn cachedImageSizeFromUrl is differentiating those errors
  • check if global var imageObject interferes with image size cache
  • find and improve unnecessary code (nice to have)
  • solve catch predicates issue see 🐛 Fix invalid image URLs not being cached and causing timeouts #8986 (comment)

@aileen aileen force-pushed the use-request-util-for-image-size branch 6 times, most recently from 74220cd to 7e6e944 Compare September 7, 2017 13:53
@aileen
Copy link
Member Author

aileen commented Sep 8, 2017

@ErisDS one question I have on my mind is the handling of the different errors in cachedImageSizeFromUrl. Right now, we're not logging or producing the NotFoundErrors see here but everything else and I'm wondering why? Mostly the question if timeout errors should also be caught here, as they can be caused by a redirect which is indeed a bit slower now.

@aileen aileen force-pushed the use-request-util-for-image-size branch from 686706f to 77c2a1f Compare September 8, 2017 02:54
message: err.message,
code: 'IMAGE_SIZE',
err: err,
context: {

This comment was marked as abuse.

This comment was marked as abuse.

@aileen
Copy link
Member Author

aileen commented Sep 8, 2017

There seems to be still some issues with the errors returned from the got, is the catch predicates in getCachedImageSizeFromUrl don't handle it properly. It works totally fine with errors returned from reading the storage.

So here's what's happening:

Any filtered catch (e. g. NotFoundError) works in getImageSizeFromUrl but then it seems like this error (at least it is not the got error anymore, but the NotFoundError we returned) gets caught by the 'catch all' predicate and that's what get's returned to getCachedImageSizeFromUrl in the end so the predicates there won't work. I can't explain why this happens. It works fine when I remove the 'catch all' predicate in getImageSizeFromUrl, but this is definitely not a solution!

@aileen aileen changed the title [WIP] 🐛 Fix invalid image URLs not being cached and causing timeouts 🐛 Fix invalid image URLs not being cached and causing timeouts Sep 8, 2017
@ErisDS
Copy link
Member

ErisDS commented Sep 8, 2017

@ErisDS one question I have on my mind is the handling of the different errors in cachedImageSizeFromUrl. Right now, we're not logging or producing the NotFoundErrors see here but everything else and I'm wondering why? Mostly the question if timeout errors should also be caught here, as they can be caused by a redirect which is indeed a bit slower now.

We don't do anything with not found errors because they are an expected case. Images will 404 over time. We don't want to fill up the logs, nothing went wrong :)

Any filtered catch (e. g. NotFoundError) works in getImageSizeFromUrl but then it seems like this error (at least it is not the got error anymore, but the NotFoundError we returned) gets caught by the 'catch all' predicate and that's what get's returned to getCachedImageSizeFromUrl in the end so the predicates there won't work. I can't explain why this happens. It works fine when I remove the 'catch all' predicate in getImageSizeFromUrl, but this is definitely not a solution!

This is expected behaviour as far as I know. If you catch an error in a predicate and then re-throw (reject) it, it will next get caught by the catch all. The way to deal with this is to check in the catch-all for any errors which are not already GhostErrors. If not, wrap them, else, just reject again.

@aileen
Copy link
Member Author

aileen commented Sep 8, 2017

We don't do anything with not found errors because they are an expected case. Images will 404 over time. We don't want to fill up the logs, nothing went wrong :)

I am sorry, I expressed myself wrong :( I was referring to logging the errors. My concern was only, that also timeout errors might now increase (because of the ability to follow redirects and they actually take a bit longer and might hit the timeout) and will therefore be logged. My question was, if we want them to be logged as well?

@ErisDS
Copy link
Member

ErisDS commented Sep 8, 2017

@AileenCGN sounds like you might be onto something - my advice would be to experiment with what feels right.

@aileen
Copy link
Member Author

aileen commented Sep 8, 2017

@ErisDS This PR is ready for review and merge 🎉

@@ -143,6 +143,9 @@ getImageSizeFromUrl = function getImageSizeFromUrl(imagePath) {
context: err.url || imagePath
}));
}).catch(function (err) {
if (err instanceof errors.NotFoundError || err instanceof errors.InternalServerError) {

This comment was marked as abuse.

This comment was marked as abuse.

@@ -199,6 +202,9 @@ getImageSizeFromFilePath = function getImageSizeFromFilePath(imagePath) {
}
}));
}).catch(function (err) {
if (err instanceof errors.NotFoundError || err instanceof errors.InternalServerError) {

This comment was marked as abuse.

@ErisDS
Copy link
Member

ErisDS commented Sep 11, 2017

Just checking: as a result of these changes, what now happens in the original error case I had with the old-nonsubdirectory local path?

@aileen
Copy link
Member Author

aileen commented Sep 11, 2017

Just checking: as a result of these changes, what now happens in the original error case I had with the old-nonsubdirectory local path?

This case will result in an invalid or missing URL error thrown here and will be caught here: https://github.com/TryGhost/Ghost/pull/8986/files#diff-6d3ba398735146e7c9d8c64ac57dbc5bR34

The cache issue is resolved as well, so a second request results in reading from the cache.

@ErisDS
Copy link
Member

ErisDS commented Sep 11, 2017

Awesome 👍

There are a couple of minor amends needed. Will review again first thing in the am :)

@aileen aileen force-pushed the use-request-util-for-image-size branch from 1a01fc0 to 1fa3dbe Compare September 11, 2017 13:27
@aileen
Copy link
Member Author

aileen commented Sep 11, 2017

Summary of cases I successfully tested:

  • Local storage files
    • Missing local image
    • old non-subdirectory path
  • Image URL
    • redirected image
    • invalid image URL
    • image not found (404)
    • timeouts for image requests

Of course I tested positive cases as well ;)
For all of these cases, the cache works and the error handling does its job as well.

refs TryGhost#8868

- swapped the usage of `got` for requests with the request util
refs TryGhost#8868

- Uses catch predicates instead of conditionals in `getImageSizeFromUrl`
- Return `NotFoundError` if applicable in `getImageSizeFromFilePath` as the caller function `cachedImageSizeFromUrl` is differentiating those between this error and others.
- Using `ImageObject` as a global var resulted in having the `url` property being the same for all requests coming in.
- The logic that checked for an existing protocol (e. g. gravatar URLs) was overly complicated. Refactored it to be more simple.
- Passing the correct value to `fetchDimensionsFromBuffer` as the population of `imageObject.url` happens there. These are used in our structured data and need to be full URLs (in case of locally stored files) or the original URL (in case of URLs missing the protocol)
- Added two more debug logs in `getCachedImageSizeFromUrl` so it's logged when an image is added to the cache even tho it was returned as error.
We're always resolving the result in `getCachedImageSizeFromUrl`, so there's no need to return the values with a `Promise.resolve()`. The caller fn uses waits for the Promises to be fulfilled.
closes TryGhost/Support#361

- adds a guard that checks the image URL for `/assets/` in the beginning and passes a completed URL to the request util to try and fetch the image size
- adds tests
@aileen aileen force-pushed the use-request-util-for-image-size branch from 1fa3dbe to f7afb8b Compare September 12, 2017 10:05
@aileen
Copy link
Member Author

aileen commented Sep 12, 2017

@ErisDS fyi I left commits separate to make the review easier, but they should totally be squashed when being merged.

@ErisDS ErisDS merged commit a45a91c into TryGhost:master Sep 12, 2017
kirrg001 pushed a commit that referenced this pull request Oct 9, 2017
closes #8589, refs #8868

This PR ports most of the changes of #8892 and #8986 to improve the image-size util.

- Swapped `request` with `got` in `getImageSizeFromUrl` fn.
- wrapped `got` request library in its own `request.js` util that returns bluebird promises and validates URL before starting a request
- Used catch predicates instead of conditionals in `getImageSizeFromUrl`
- Returned `NotFoundError` if applicable as the caller function `cachedImageSizeFromUrl` is differentiating those between this error and others.
- The logic that checked for an existing protocol (e. g. gravatar URLs) was overly complicated. Refactored it to be more simple.
- We're always resolving the result in `getCachedImageSizeFromUrl`, so there's no need to return the values with a `Promise.resolve()`. The caller fn uses waits for the Promises to be fulfilled.
- added and improved tests
@aileen aileen deleted the use-request-util-for-image-size branch October 13, 2017 08:30
@ErisDS ErisDS removed their assignment Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants