Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REST API: Progressively load data into components through pagination #6694

Closed
nylen opened this issue May 11, 2018 · 50 comments
Closed

REST API: Progressively load data into components through pagination #6694

nylen opened this issue May 11, 2018 · 50 comments
Labels
[Priority] High Used to indicate top priority items that need quick attention REST API Interaction Related to REST API [Type] Bug An existing feature does not function as intended [Type] Performance Related to performance efforts

Comments

@nylen
Copy link
Member

nylen commented May 11, 2018

Follow-up to #6657 - cc @danielbachhuber. This PR introduced a magic value of -1 into a couple of API endpoints, which means to return an "unbounded" number of items:

gutenberg/lib/rest-api.php

Lines 512 to 513 in fb804e2

// Change from '1' to '-1', which means unlimited.
$query_params['per_page']['minimum'] = -1;

However, this value doesn't actually do what it says - it really means some arbitrarily high limit instead of unbounded:

gutenberg/lib/rest-api.php

Lines 532 to 536 in fb804e2

// Avoid triggering 'rest_post_invalid_page_number' error
// which will need to be addressed in https://core.trac.wordpress.org/ticket/43998.
if ( -1 === $prepared_args['posts_per_page'] ) {
$prepared_args['posts_per_page'] = 100000;
}

This is a bit unclear and requires more complicated code than necessary, as well as a core patch referenced on the initial PR.

Allowing unbounded (or arbitrarily high) requests is also basically guaranteed to break for some sites due to memory or execution time constraints. This is a pretty opaque failure mode which doesn't leave the average user much chance of recovery.

I think a better trade-off would be to increase the limit for users who can edit_posts to a larger, but still plausible value, such as 1000 or 2000. This value should also be filterable by hosting providers. This way a site with a very large number of items would still work, but the cut-off for when items stop appearing in the UI would be much higher. This will lead to simpler API code with more predictable failure modes.

Ideally a site with this high number of posts would also be tested so that the performance characteristics are roughly understood.

@nylen nylen added [Type] Bug An existing feature does not function as intended REST API Interaction Related to REST API labels May 11, 2018
@tofumatt
Copy link
Member

If we went this route (makes sense to me), it would be useful for the JS to log responses with the maximum number of results. console.warn("Maximum API response size encountered: [Link to GitHub issue where we want to paginate them]") or something.

@chrisvanpatten
Copy link
Contributor

Allowing unbounded (or arbitrarily high) requests is also basically guaranteed to break for some sites due to memory or execution time constraints. This is a pretty opaque failure mode which doesn't leave the average user much chance of recovery.

I may be wrong, but isn't this the case for the current core approach as well? As far as I can tell, the select rendered by wp_dropdown_pages doesn't have any enforced limits.

Is there a difference in memory usage between what gets rendered server-side vs. how it would be done with an API call and rendered through React?

@danielbachhuber
Copy link
Member

However, this value doesn't actually do what it says - it really means some arbitrarily high limit instead of unbounded:

Sorry for the confusion. To clarify, the arbitrarily high limit was a workaround to a current bug in WordPress core that will need to be addressed in the ticket. The end goal is to pass per_page=-1 all of the way through into the controller.

If you were to comment out the workaround and run test_get_pages_unbounded_per_page locally, you'd see:

1) Gutenberg_REST_API_Test::test_get_pages_unbounded_per_page
Failed asserting that 400 matches expected 200.

Ultimately, the problem is WP_REST_Posts_Controller isn't configured to properly handle posts_per_page=-1 right now because of:

$max_pages = ceil( $total_posts / (int) $posts_query->query_vars['posts_per_page'] );
if ( $page > $max_pages && $total_posts > 0 ) {
	return new WP_Error( 'rest_post_invalid_page_number', __( 'The page number requested is larger than the number of pages available.' ), array( 'status' => 400 ) );
}

https://github.com/WordPress/WordPress/blob/2f792d442bf771a3aade170cc9cae459f024c57b/wp-includes/rest-api/endpoints/class-wp-rest-posts-controller.php#L330-L334

I've updated the Trac ticket accordingly: https://core.trac.wordpress.org/ticket/43998#comment:2

I think a better trade-off would be to increase the limit for users who can edit_posts to a larger, but still plausible value, such as 1000 or 2000. This value should also be filterable by hosting providers. This way a site with a very large number of items would still work, but the cut-off for when items stop appearing in the UI would be much higher.

Can you explain what the user experience for this might be, particularly as it relates to issues like #6487?

I may be wrong, but isn't this the case for the current core approach as well? As far as I can tell, the select rendered by wp_dropdown_pages doesn't have any enforced limits.

Correct, and neither does wp_dropdown_users. See history in https://core.trac.wordpress.org/ticket/19867

For a summary of the path core went down, see #5921 (comment)

@tofumatt
Copy link
Member

Can you explain what the user experience for this might be, particularly as it relates to issues like #6487?

My suggestion from earlier: a number high enough (20,000 or whatever) will probably just crash most servers anyway. Returning the first 20,000 results and outputting a console warning might be better than 500ing.

@nylen
Copy link
Member Author

nylen commented May 12, 2018

Is there a difference in memory usage between what gets rendered server-side vs. how it would be done with an API call and rendered through React?

I would expect the REST API to perform significantly worse than wp_dropdown_pages in terms of both memory usage and execution time.

The end goal is to pass per_page=-1 all of the way through into the controller.

Right, this is the main part I disagree with. I think it is better to always set a reasonable limit to the number of results the REST API can return.

My suggestion from earlier: a number high enough (20,000 or whatever) will probably just crash most servers anyway. Returning the first 20,000 results and outputting a console warning might be better than 500ing.

I suspect that 20,000 is way too high for most servers.

@danielbachhuber danielbachhuber added this to the Merge Proposal: REST API milestone May 14, 2018
@adamsilverstein
Copy link
Member

Right, this is the main part I disagree with. I think it is better to always set a reasonable limit to the number of results the REST API can return.

I second this concern (and previously raised on #6657). We should not enable unbounded requests - even for authenticated users.

@danielbachhuber
Copy link
Member

@nylen @adamsilverstein To move the conversation forward, can you suggest an alternative implementation with design- and accessibility-approved UX?

@adamsilverstein
Copy link
Member

Great question. I still think my original concept of a select that also performs an ajax search both the best UX and the least resource intensive design:

Unfortunately, my original implementation in #5921 was with react-select which @afercia pointed out was not accessible. @afercia have you seen this selector, the author is at least very active and motivated to improve accessibility: https://github.com/alsoscotland/react-super-select - I still think a selector is the best UX here.

I've explored leveraging our autocompletion code here, this would work similar to the current tags selector (except you can't add authors, and you can only select one)... The goal is for it to would look something like this:

Unless we can find an accessible select component we like, this seems like a reasonable second choice.

@nylen
Copy link
Member Author

nylen commented May 14, 2018

Building a UI that uses the existing pagination features of the API would be ideal.

I realize this is difficult, and this is not the right issue to debate how this is accomplished. So my proposal here is not related to UI implementation at all, instead it is just a robustness improvement to the API code merged in #6657:

I think a better trade-off would be to increase the limit for users who can edit_posts to a larger, but still plausible value, such as 1000 or 2000. This value should also be filterable by hosting providers. This way a site with a very large number of items would still work, but the cut-off for when items stop appearing in the UI would be much higher. This will lead to simpler API code with more predictable failure modes.

@chriscct7
Copy link
Member

There was a time when WP Core was working with and auditing Select2 for compatibility, perhaps that could be used as an alternative (not sure how far along that project got before it seemed to stall out)?

In any case, I further agree, a limit, likely not more than 5k if we want to look at the average server of an average WP user should be used. The average WordPress site has resources that are often significantly less than an avg WP dev's localhost resources.

An ajax based approach has the ability to scale, meet accessibility requirements, provide for a significantly better user experience, and also meet the design of Gutenberg.

@danielbachhuber
Copy link
Member

danielbachhuber commented May 18, 2018

There was a time when WP Core was working with and auditing Select2 for compatibility, perhaps that could be used as an alternative (not sure how far along that project got before it seemed to stall out)?

See #5921 (comment), which I mentioned earlier.

In any case, I further agree, a limit, likely not more than 5k if we want to look at the average server of an average WP user should be used. The average WordPress site has resources that are often significantly less than an avg WP dev's localhost resources.

Can you share your data and methodology for coming to this amount?

An ajax based approach has the ability to scale, meet accessibility requirements, provide for a significantly better user experience, and also meet the design of Gutenberg.

Again, all of the prior research has not identified an AJAX-based approach that meets accessibility requirements. If you have a specific proposal to share, please do.

@danielbachhuber
Copy link
Member

Also, as I mentioned in Post Status Slack, this was a Hard decision, not an Easy one. Easy decisions make sense to everyone. Hard decisions involve trade-offs, and require a great deal of background information to understand fully.

#4622 sat for months before it was fixed with #6627. It was not a rushed decision, but rather weeks of discussion and consideration leveraging years of domain experience. The decision we came to is documented in #6180 (comment)

If someone wants to put in the effort towards producing an accessible and acceptable UI alternative, we welcome your participation. Keep in mind that it will be multiple weeks of research, documentation, and planning — not a quick pull request.

And, if you have time to spend on other Hard problems, we have an entire Back Compat milestone to work through.

@nylen
Copy link
Member Author

nylen commented May 19, 2018

Again, this issue is not the right place to discuss alternative UI implementations, because that task is orders of magnitude more complicated than what I am proposing here.

@danielbachhuber so far, I have not seen a response to the specific suggestion of moving to a tested and considered higher limit that is not essentially infinite. Is this something worth pursuing, or is there a concern with this approach that I am not aware of?

I created this issue because I think the current use of the -1 parameter will lead to unexplained and difficult-to-debug failures. I think what I'm proposing (a lower API limit with additional testing and calibration) will adequately address this concern, as well as leading to clearer and simpler code.

@tofumatt
Copy link
Member

Can you share your data and methodology for coming to this amount?

WordPress is run on plenty of shared hosts. Plenty of developers working on Gutenberg are using quad-core MacBook Pros with 16GB of RAM. I think it’s pretty safe assumption.

@danielbachhuber
Copy link
Member

Again, this issue is not the right place to discuss alternative UI implementations, because that task is orders of magnitude more complicated than what I am proposing here.

But that is the core of the problem: if you introduce some arbitrary upper limit to the request, you'll need to communicate (in some way) to the end user why they can't access all potential authors, categories, pages, etc.

so far, I have not seen a response to the specific suggestion of moving to a tested and considered higher limit that is not essentially infinite. Is this something worth pursuing, or is there a concern with this approach that I am not aware of?

If you can get design sign-off on the user experience, then by all means. Based on the best of my knowledge, this would be an inadequate user experience and unlikely to be approved from a design/UX perspective.

WordPress is run on plenty of shared hosts. Plenty of developers working on Gutenberg are using quad-core MacBook Pros with 16GB of RAM. I think it’s pretty safe assumption.

Just so it's stated, this is an assumption, not data and methodology.

@danielbachhuber
Copy link
Member

To take this a step forward...

The specific data I'd like to see is:

  1. Distribution of WordPress sites to # of authors and pages (e.g. X% have at least 500 authors, Y% have at least 1000 authors, etc.). This will provide a better sense of real-world impact.
  2. Performance characteristics (time, memory, and CPU utilization) of these REST API requests in real-world server environments (e.g. a GET /wp/v2/users?per_page=-1 takes X milliseconds on a shared server with Y and Z).

Some other mitigation technicals we can explore are:

  1. Making the authors, categories and pages UI pluggable, such that someone could write a plugin that injects an alternative, paginated UI for impacted sites.
  2. Ensuring per_page=-1 requests are as optimized as possible. For instance, I'm not positive that GET /wp/v2/pages?_fields=id,title&per_page=-1 isn't also fetching post meta and terms by default.
  3. Providing documentation / best practices for hosts on how to cache per_page=-1 requests for WordPress sites with poor performance.

@nylen
Copy link
Member Author

nylen commented May 19, 2018

How about moving from per_page=-1 to an explicit per_page=100000 in the meantime?

@adamsilverstein
Copy link
Member

Again, all of the prior research has not identified an AJAX-based approach that meets accessibility requirements. If you have a specific proposal to share, please do.

Actually, we already identified the Gutenberg autocomplete component that fills the need. It’s accessible, already in Gutenberg and performs Ajax searches.

@danielbachhuber
Copy link
Member

Actually, we already identified the Gutenberg autocomplete component that fills the need. It’s accessible, already in Gutenberg and performs Ajax searches.

Can you open a new issue with the specifics of your proposal, please?

@danielbachhuber
Copy link
Member

Can you open a new issue with the specifics of your proposal, please?

Sorry, I just realized you posted detail with #6694 (comment)

I'll defer to design as to whether autocomplete is an appropriate replacement for author, category, and page selection. It doesn't seem like it would solve the problem for shared blocks.

@adamsilverstein
Copy link
Member

I've posted some follow up work on enabling autocomplete as an author search mechanism, see #5921 (comment).

@afercia
Copy link
Contributor

afercia commented May 24, 2018

I've looked a bit at the super-select example http://alsoscotland.github.io/react-super-select/react-super-select-examples.html#basic_example and, although it doesn't fully meet the ARIA patterns, it doesn't differ so much from the Gutenberg autocompleters. From an UX perspective, the main difference is that it presents an initial set of results, as also mentioned by Adam on #5921 (comment)

Will comment further there.

@joehoyle
Copy link

I just ran into this in a pre-production deployment of Gutenberg. We have a few thousand pages, so trying to show the Page Attributes brings MySQL to it's knees. Of course there's other factors, this site is using Visual Composer which overload the_content which adds significant overhead.

In may not come as a surprise that my opinion is all unbound queries should be an absolute no-go, if we want WordPress to be able to scale. 100k pages, or 100k users should be perfectly possible with WordPress, and right now with Gutenberg that is not the case. I mainly just want to add one datapoint though: we have Gutenberg on a site with a moderate amount of content (maybe 6GB DB in total) and I'm now having to work out how to disable this Gutenberg behaviour.

@joehoyle
Copy link

joehoyle commented May 27, 2018

Ensuring per_page=-1 requests are as optimized as possible. For instance, I'm not positive that GET /wp/v2/pages?_fields=id,title&per_page=-1 isn't also fetching post meta and terms by default.

Taking this a little further, there are a lot optimisations that could be done in the REST API endpoints. In the above example from @danielbachhuber, the pages endpoint is running hooks for the content, author, and pretty much all other fields. The controllers I believe are not optimised for _fields, that's just a blanket field strip before the response is set. Of course I think that's focusing effort in the wrong area for this particular issue, but that would be at-least one thing we could do if we want to keep _doing_it_wrong

@danielbachhuber
Copy link
Member

danielbachhuber commented May 29, 2018

I just ran into this in a pre-production deployment of Gutenberg. We have a few thousand pages, so trying to show the Page Attributes brings MySQL to it's knees. Of course there's other factors, this site is using Visual Composer which overload the_content which adds significant overhead.

What you've identified is likely the true cause of the problem. I'd be curious to know if there's anything else.

Taking this a little further, there are a lot optimisations that could be done in the REST API endpoints. In the above example from @danielbachhuber, the pages endpoint is running hooks for the content, author, and pretty much all other fields. The controllers I believe are not optimised for _fields, that's just a blanket field strip before the response is set.

This is fixed in WordPress trunk: https://core.trac.wordpress.org/changeset/43087

Can you apply that change in your pre-prod environment to assess its impact?

@joehoyle
Copy link

https://core.trac.wordpress.org/changeset/43087

Nice patch! I'm leaving for paternity break but I'll see if I can pick up testing this after that.

What you've identified is likely the true cause of the problem. I'd be curious to know if there's anything else.

I don't know how much this line will be taken forever more to try justify to unbound queries, but as I said, I mainly wanted to added a data point of one reason why for practicality this is going to cause issues in the wild.

@davisshaver
Copy link
Contributor

Thanks @danielbachhuber! New ticket looks good.

@mtias mtias modified the milestones: Merge: REST API, WordPress 5.0 Oct 3, 2018
@mtias mtias removed the [Type] Bug An existing feature does not function as intended label Oct 3, 2018
@danielbachhuber danielbachhuber added [Type] Performance Related to performance efforts [Type] Bug An existing feature does not function as intended labels Oct 15, 2018
@danielbachhuber
Copy link
Member

danielbachhuber commented Oct 15, 2018

As of this morning, I think we have an alternative path forward:

  1. Keep the existing UI for Categories, Tags, Post Parent, Authors, and Shared Blocks.
  2. Load data into Gutenberg with ?per_page=100&page=1, ?per_page=100&page=2.
  3. Use Provide loading UX when UI is dependent on an API request #6723 to avoid UI inconsistency while the data is loading.

To resolve this particular issue, we can update the API fetching code to traverse pagination after a general purpose solution for #6723 lands. The JavaScript code for the former doesn't necessarily need to be blocked by the latter.

@danielbachhuber danielbachhuber changed the title REST API: Use large limit instead of allowing "unbounded" requests REST API: Progressively load data into components through pagination Oct 15, 2018
@aaronjorbin
Copy link
Member

So an idea:

WHat about using localstorage? For any collection, we could add a version number that increments when there is a change such as a new user or a deleted tag or a renamed category and that is used to trigger a sync that goes through all of the pagination? This way we don't need to always be going through the entire collection.

@kadamwhite
Copy link
Contributor

@aaronjorbin as a data cache? I think the version parameter for changed data could be confusing, but that would definitely speed things up on subsequent loads. I'd like to get the core functionality in, first, though, and enhance the performance on repeat visits later.

@danielbachhuber
Copy link
Member

I'd like to get the core functionality in, first, though, and enhance the performance on repeat visits later.

👍 to this. Let's fix the blocker first.

@kadamwhite
Copy link
Contributor

I've begun work on an implementation of the recursive page fetching, I will update by tomorrow at the latest with progress.

@danielbachhuber danielbachhuber added the [Priority] High Used to indicate top priority items that need quick attention label Oct 18, 2018
@danielbachhuber
Copy link
Member

Just so it's noted, this is a blocker for 5.0 beta1. Because https://core.trac.wordpress.org/ticket/43998 was closed, this issue needs to be resolved prior to beta1. Without a solution to this issue, Gutenberg the editor will query with ?per_page=-1 against the existing core endpoints and get REST API failures.

@kadamwhite
Copy link
Contributor

Notes from @joehoyle on this paginated approach:

I’m not really sure if [assembling the full response by hitting each page of API data] is even a bandaid, it’s really worse [than -1], because in “total compute” this does way more: as well as generating data for 5k items, it’s going to also bootstrap WP 50 times. Speaking as someone who keeps the servers on, I’m not sure if I’d be more or less happy about this solution versus per_page=-1

This is good feedback, and this approach raises a few questions:

  1. Do we arbitrarily cap the number of results at a certain number of pages, to avoid letting editors trigger so many hits on our API?
  2. What happens if we encounter errors on one page out of the whole? Do we just omit it? Throw an error for the entire set, even if we've already assembled most of it?

In the core REST-API channel I've proposed upping the per_page cap to 500 or 1000 to more efficiently handle the majority of sites; I don't know we are going to come up with a perfect solution to this in time for 5.0. I feel the long-term correct path forward continues to be working to implement an accessible typeahead search dropdown.

@danielbachhuber
Copy link
Member

@kadamwhite Our options at this point are:

  1. Introduce support for per_page=-1 in the REST API; leave the current Gutenberg implementation as-is.
  2. Have Gutenberg iterate the paginated responses; leave the REST API upper bound at per_page=100.

Arbitrarily capping the number of results is not an option.

I feel the long-term correct path forward continues to be working to implement an accessible typeahead search dropdown.

Like I've mentioned several times before, an accessible typeahead search dropdown doesn't solve for all five UI elements that are impacted by this:

  1. Categories.
  2. Tags.
  3. Authors.
  4. Page Parent.
  5. Shared Blocks (or whatever they're called these days).

@kadamwhite
Copy link
Contributor

Arbitrarily capping the results is not an option, as in raising the cap to 500? May I ask why? It feels like that would be a nicely complementary modification to any other changes we make here.

@danielbachhuber
Copy link
Member

danielbachhuber commented Oct 18, 2018

Arbitrarily capping the results is not an option, as in raising the cap to 500?

Just to clarify, if we raise the cap to 500, would I be able to access my 501st item?

If yes, then we're talking about option 2. If no, then it's not a viable option.

It feels like that would be a nicely complementary modification to any other changes we make here.

We'd need to evaluate performance / memory consumption. My gut says 200 would be fine, 350 might be ok, and 500 is where we get to a magically bad number for WP_Query: https://github.com/WordPress/WordPress/blob/47d32decd6668b3ccf100acc8a40aa0a9fc35003/wp-includes/class-wp-query.php#L2916

@kadamwhite
Copy link
Contributor

I was specifically proposing upping the limit in conjunction with paginating over each page response to assemble the complete set. The goal would be to balance the cost of bootstrapping WordPress with the cost of requesting too many items; so for example if we cap at 350, a request for 800 items will generate three requests to the server.

@danielbachhuber
Copy link
Member

I was specifically proposing upping the limit in conjunction with paginating over each page response to assemble the complete set. The goal would be to balance the cost of bootstrapping WordPress with the cost of requesting too many items; so for example if we cap at 350, a request for 800 items will generate three requests to the server.

Ok. I'd be fine with 350 unless @pento has opinions otherwise.

@danielbachhuber
Copy link
Member

To be clear though, we should land #10762 first and foremost. We can more easily raise the limit after beta1.

@joehoyle
Copy link

Like I've mentioned several times before, an accessible typeahead search dropdown doesn't solve for all five UI elements that are impacted by this:

For ideal implementations of these, do you think that there is ever a need to load all content? I'm not sure if it's widely accepted that it's pretty much always a bad idea to have to load all of x where x can be of infinite length. I know it is a lot of work to create such UI, I'm not saying that has to be done, but there does seem to be a undercurrent here of just hoping we can load all data, when I think that's just a fundamental broken approach.

@danielbachhuber
Copy link
Member

For ideal implementations of these, do you think that there is ever a need to load all content?

We're not blue-sky thinking right now. We're a day away from 5.0 beta1, this issue is blocking it, and we're producing the most practical solution possible given constraints and circumstances.

@nylen
Copy link
Member Author

nylen commented Oct 19, 2018

FYI: I am unsubscribing from this issue because despite repeating myself multiple times and asking follow-up questions, there was one single comment from a Gutenberg team member where the original goal of the issue was understood, and the rest are either unrelated to what I was proposing, worse solutions, or orders of magnitude more difficult.

we're producing the most practical solution possible given constraints and circumstances.

This was my goal from the start with this issue. If you'd like to discuss what this could look like, pick a Slack instance and ping me on it.

@kadamwhite
Copy link
Contributor

I have opened #10845 implementing paging handling directly within apiFetch instead of using Middleware. After burning a number of hours working on the prior PR, I never found a clean solution for ensuring the error handling is correctly applied in all cases when using middleware. The system was not designed assuming additional asynchronous requests would be generated from within a middleware method, and I ran into many different issues trying to work around that; applying this logic inside of apiFetch feels cleaner, requires much less duplication of logic, and is now my recommendation.

@danielbachhuber
Copy link
Member

Fixed by #10762

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Priority] High Used to indicate top priority items that need quick attention REST API Interaction Related to REST API [Type] Bug An existing feature does not function as intended [Type] Performance Related to performance efforts
Projects
None yet
Development

No branches or pull requests