Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement to assets/api/readFolder #337

Closed
wants to merge 2 commits into from

Conversation

mikenz
Copy link
Contributor

@mikenz mikenz commented Dec 16, 2016

I have a few folders with 900+ images in them. This was the change I made to get reasonable performance in asset-admin.

Only get the full details about a file after we have decided to return it.

Only get the full details about a file after we have decided to return it.
@tractorcow
Copy link
Contributor

+1 from me, although I have a feeling that this will shortly be superseded by the upcoming graphql work. @chillu is it ok for me to merge this?

@mikenz there is a tiny linting issue if you like to patch it. :)

@mikenz
Copy link
Contributor Author

mikenz commented Dec 19, 2016

@tractorcow That's what happens when I use the github online editor to create my PR :-)

@chillu
Copy link
Member

chillu commented Dec 19, 2016

Thank you Mike! As @tractorcow suggested, this has been fixed in the GraphQL work - I've disabled sorting by the File->getSize() getter since that required us to retrieve all files before pagination, which is pretty much always a bad idea. I think this is a regression from recent work around the file table view @flamerohr?

Feel free to check out the GraphQL branches:
github.com/open-sausages/silverstripe-asset-admin/tree/pulls/4.0/graphql
github.com/open-sausages/silverstripe-framework/tree/pulls/4.0/graphql
silverstripe/silverstripe-graphql#24

For the moment, I don't think its worth introducing partial "sort object data" complexities in the implementation, if the GraphQL fixes it by better pagination. Pull request incoming this week, see #316. I haven't tested it with large amounts of files yet, but in theory it should scale to 10k files per folder (which is our NFR for the feature). Would you able to check out the branches and see if it helps your cause?

@chillu
Copy link
Member

chillu commented Dec 19, 2016

Damian just pointed out that this optimisation might be about calculating file sizes, rather than pagination. It's probably less impactful to calc file size on 15 paginated items than to load 900 items through the ORM? Anyway, GraphQL will only call File->getSize() when the field is actually requested by the GraphQL client - so we can decide that on a case-by-case basis (e.g. list vs. detail view)

@mikenz
Copy link
Contributor Author

mikenz commented Dec 19, 2016

For me the performance hit is on canArchive()/canDelete() call which has some logic in it for me (can't delete/archive if there's a published page using the file + some more permissions stuff).

I'll have a play with the GraphQL branches and see how it goes.

@tractorcow
Copy link
Contributor

We've had a chat and decided to let this PR site for a while, as we are finalising the graphql work. Sorry if this blocks your work in any way!

@flamerohr
Copy link
Contributor

I wouldn't have imagined getSize() goes through all that trouble... :P

My first thought was the can*() calls, with their potential extension points being a bottleneck. Then after that, any calculated method calls like getSize()

@tractorcow tractorcow closed this Dec 20, 2016
@tractorcow tractorcow reopened this Dec 20, 2016
@tractorcow
Copy link
Contributor

Oops. :D

@chillu
Copy link
Member

chillu commented Jan 8, 2017

@mikenz I'm closing this in favour of silverstripe/silverstripe-framework#6458. Have tried it out with 1000 files in a folder (generated with silverstripe/silverstripe-frameworktest#26), and 128M memory_limit. I'm getting 3.5s response times for 50 paginated items (with already generated thumbnails), so it's not exactly fast.

The logic is quite different now (with GraphQL), so this PR isn't really mergeable any more. The optimisations it implements (only get full details of file after returning) aren't fully reflected in the work. Given it does the pagination before querying all records (as part of converting to a ArrayList for canView() checks), any unnecessary record fetches where canView() = false shouldn't be as impactful, right? We don't have a good way to "fill up" paginated lists where most or all records have canView() = false, but that's beyond the scope of this discussion.

Would you be able to switch to the PR branches mentioned above and check if that solves the problem for you?

@chillu chillu closed this Jan 8, 2017
@mikenz
Copy link
Contributor Author

mikenz commented Jan 8, 2017

The new GraphQL based implementation performs well for me. Thanks :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants