Implement fetchAllMiddleware to handle per_page=-1 through pagination #10762

kadamwhite · 2018-10-18T21:14:39Z

Description

Introduce a new middleware function to iterate through all available pages for a large collection, replacing any encountered per_page=-1 requests with a series of sequential requests which are assembled into a final merged array of all available results.

This middleware method has been implemented inline within the apiFetch
module's index for expediency, given some dependencies on other apiFetch
methods; this initial commit should be regarded as a proof-of-concept.

See #6694

Checklist:

My code is tested.
My code follows the WordPress code style.
My code follows the accessibility standards.
My code has proper inline documentation.

chrisvanpatten · 2018-10-18T21:17:54Z

packages/api-fetch/src/index.js

+	// Swap back to requesting the max of 100 items per page.
+	// TODO: This feels brittle. Is there a better way to manage request parameters?
+	path: options.path && `${ options.path.replace( /(\&?per_page)=-1/, '$1=100' ) }&page=1`,
+	url: options.url && `${ options.url.replace( /(\&?per_page)=-1/, '$1=100' ) }&page=1`,


Can you use @wordpress/url's addQueryArgs?

what about just setting a flag on the options for turning on paging?

per_page seems to be too specific to WP queries. This package is meant to be wp-agnostic I think? Maybe options could accept something like:

{ page: 'page' }

Where the presence of that key "turns on" paging and the value is the paging arg that gets appended to the request url?

I was not aware this package was intended to be wp-agnostic; no solution we come up with will be guaranteed to work with an arbitrary API. We can use another option here, but we will have to be making assumptions about how the WordPress REST API works in order for this to address our specific issue.

It was my impression that published packages were intended to be wp-agnostic but I'll leave for others to confirm. If so, maybe the middleware itself would be a part of wp-core then?

nvm my points, I see that apiFetch itself is wp-agnostic but the bundled middlewares don't have to be.

That said, that does represent an argument in favor of having WP apply this middleware rather than having it applied by default. I'll take that into consideration 👍

kadamwhite · 2018-10-18T21:23:36Z

My idea was to leave perPage: -1 specifications alone wherever they occur, and to add a middleware object which would intercept those requests and convert them to a sequence of API requests for each page in the specified collection. There's a couple limitations of the specifics of this patch, but I hope to work through it with the team over the coming day.

Chief among the issues: this doesn't actually work. While inspecting the API requests and the response object demonstrates that the returned collection is properly assembled, when testing with many categories I received this error:

Warning: Can't call setState (or forceUpdate) on an unmounted component. (click to see full error)


Warning: Can't call setState (or forceUpdate) on an unmounted component. This is a no-op, but it indicates a memory leak in your application. To fix, cancel all subscriptions and asynchronous tasks in the componentWillUnmount method.
    in HierarchicalTermSelector (created by WithFilters(HierarchicalTermSelector))
    in WithFilters(HierarchicalTermSelector) (created by WithInstanceId(WithFilters(HierarchicalTermSelector)))
    in WithInstanceId(WithFilters(HierarchicalTermSelector)) (created by WithSpokenMessages(WithInstanceId(WithFilters(HierarchicalTermSelector))))
    in WithSpokenMessages(WithInstanceId(WithFilters(HierarchicalTermSelector))) (created by _class)
    in _class (created by RemountOnPropChange(_class))
    in RemountOnPropChange(_class)
    in Unknown (created by WithDispatch(WithSpokenMessages(WithInstanceId(WithFilters(HierarchicalTermSelector)))))
    in WithDispatch(WithSpokenMessages(WithInstanceId(WithFilters(HierarchicalTermSelector)))) (created by _class)
    in _class (created by RemountOnPropChange(_class))
    in RemountOnPropChange(_class)
    in WithSelect(WithDispatch(WithSpokenMessages(WithInstanceId(WithFilters(HierarchicalTermSelector))))) (created by PostTaxonomies)
    in div (created by PanelBody)
    in PanelBody (created by TaxonomyPanel)
    in TaxonomyPanel (created by _class)
    in _class (created by RemountOnPropChange(_class))
    in RemountOnPropChange(_class)
    in Unknown (created by WithDispatch(TaxonomyPanel))
    in WithDispatch(TaxonomyPanel) (created by _class)
    in _class (created by RemountOnPropChange(_class))
    in RemountOnPropChange(_class)
    in WithSelect(WithDispatch(TaxonomyPanel)) (created by PostTaxonomies)

Known things I'd like to improve, which may be better addressed through a solution other than middleware:

Manually parsing query strings feels less than ideal. Per @chrisvanpatten the query-args package may help.
Extracting this into a middleware module would introduce a circular dependency with apiFetch, since apiFetch is needed to trigger additional requests -- we can't use next repeatedly because it pops things off its own internal stack and runs out of middleware methods.

While a better option than middleware did not suggest itself, if there's a higher-level method that consumes apiFetch which I could build this into, I could use a pointer about where to find it.

TimothyBJacobs · 2018-10-18T21:38:06Z

Can we follow the links here instead of messing with the URL params at all? Those should carry over the existing query params.

kadamwhite · 2018-10-18T21:41:39Z

@TimothyBJacobs We have to mess with the params for the first request (to unset -1), but yes, we can do that for all subsequent requests; good suggestion.

kadamwhite · 2018-10-19T02:10:07Z

@TimothyBJacobs I believe we should stick with the manual page property generation for now. Parsing link headers would require introducing another dependency which would be used nowhere else, or else rolling our own implementation; it's not complex code but it's a non-trivial thing to add.

If we assume ?page= and manually request each one we also have the option of considering more robust failure recovery such as omitting failed pages (if only one out of many requests fails), rather than throwing away everything we did successfully fetch from the server.

youknowriad · 2018-10-19T10:13:25Z

packages/api-fetch/src/index.js

@@ -26,9 +27,17 @@ function checkCloudflareError( error ) {
 	}
 }

+function parseResponse( response, parse = true ) {


Any particular reason for this change, doesn't seem related to the newly added middleware?

Good catch. The middleware started off in this file, so moving this out allowed code re-use; I haven't fully backed the change back out.

youknowriad · 2018-10-19T10:13:59Z

packages/api-fetch/src/middlewares/fetch-all-middleware.js

+// function will recursively assemble a full response by paging over all available
+// pages of API data.
+const fetchAllMiddleware = async ( options, next ) => {
+	try {


Any particular reason for this try/catch?

Otherwise there's no handling for any rejections; this was a guard against errors when we invoke the method, to make sure it still returns a rejecting promise. I can remove it if you feel it is unnecessary.

Isn't this try/case exactly what will happen by default with async/await though?

youknowriad · 2018-10-19T10:16:32Z

packages/api-fetch/src/middlewares/fetch-all-middleware.js

+// pages of API data.
+const fetchAllMiddleware = async ( options, next ) => {
+	try {
+		if ( options.url && options.url.indexOf( 'per_page=-1' ) < 0 ) {


Sometimes it's path and not url (according to the raw implementation), should we do something like url || path before testing?

Makes me think we could introduce a new mandatory middleware normalizing these two to a single property before reaching the raw middleware.

I'd be in favor of that

youknowriad · 2018-10-19T10:22:04Z

packages/api-fetch/src/middlewares/fetch-all-middleware.js

+
+		return remainingPageRequests.reduce(
+			// Request each remaining page in sequence, and return a merged array.
+			async ( previousPageRequest, nextPageOptions ) => {


Reduce with asyncnot sure this works properly as the accumulator function is meant to return the next value synchronously? Maybe just a for loop.

Actually, this can work as a pattern for creating a chain of promises which resolves in series.

(I've done such before myself, before the fancy async/await)

A for loop -- perhaps for await (ES2018) with async generator -- could work as well.

really smart code :)

youknowriad · 2018-10-19T15:57:58Z

Should we ignore the middleware if parse: true because we don’t have a unique response object to return

youknowriad · 2018-10-19T17:19:25Z

lib/client-assets.php

@@ -206,6 +206,11 @@ function gutenberg_register_scripts_and_styles() {
 		),
 		'after'
 	);
+	wp_add_inline_script(
+		'wp-api-fetch',
+		'wp.apiFetch.use( wp.apiFetch.fetchAllMiddleware );',


Should this be an inline script? I mean this feels like a mandatory middleware which should be always added as the last middleware before raw so we can always add it in js?

youknowriad · 2018-10-19T19:44:54Z

packages/api-fetch/src/middlewares/fetch-all-middleware.js

+
+	if ( ! Array.isArray( results ) ) {
+		// We have no reliable way of merging non-array results.
+		return results;


I don't think this is a good solution. As we changed the intent of the request but didn't took it into consideration when providing the result.

What would you suggest doing in this case? Endpoints which return an object will almost never return a next link header anyway, so there's not a huge impact to this that I can see.

Endpoints which return an object will almost never return a next link header anyway

It's the "almost" that's worrying me why can't we just merge the objects?

I have not seen an endpoint following our REST API's conventions which would both expose pagination headers and return an object, so I don't know how I should begin to attempt to merge object responses. If we assume that each subsequent page should overwrite the values from the prior, there may be data loss. If we assume that we should only merge array children, that could work, but again, there could be data loss.

Given that we do not have an immediate need to support this for core WordPress endpoints (that I know of), I suggest we look for prior art and revisit the question to add object merging logic once we encounter a specific situation where this becomes an issue.

I have not seen an endpoint following our REST API's conventions which would both expose pagination headers and return an object,

Right now in Gutenberg we pass per_page=-1 in a request to fetch taxonomies. This might be wrong though?

Not wrong so much as unneedd -- the taxonomies controller does not accept a per_page argument, so that query parameter gets ignored. We use get_object_taxonomies and get_taxonomies within wp-rest-taxonomies-controller.php, niether of which accepts a limiting argument.

kadamwhite · 2018-10-19T19:45:02Z

I have update the PR accounting for the above feedback:

parse: false is now detected and the middleware is not applied.
The links header is now used for pagination. (note: I attempted to include parse-link-header but was not familiar enough with the repo setup to properly add the module, so I wrote a shim which accomplishes the same logic to keep us moving forwardnp. We can swap out for the more robust, tested module later on.)
The middleware module has been refactored for clarity.
Per @youknowriad and @gziolo the middleware is now registered by default rather than being added in client-scripts.php

TODO:

Fix the unit test I broke: errors are not being correctly passed back through to the calling functions.

youknowriad · 2018-10-19T19:46:32Z

packages/api-fetch/src/middlewares/fetch-all-middleware.js

+const fetchAllMiddleware = async ( options, next ) => {
+	if ( options.parse === false ) {
+		// If a consumer has opted out of parsing, do not apply middleware.
+		return next( options );


Should we throw an error here if we detect per_page: -1 in the request as the server won't know how to deal with it?

I feel like we should let it through; we don't do that anywhere ourselves, and the server itself will throw the error, which should make the issue pretty clear.

youknowriad · 2018-10-19T19:47:37Z

I'm a bit concerned about this being a middleware because if there's a custom endpoint supporting per_page: -1, it won't work with this middleware. (This refactoring: moving to the resolvers) can be achieved separately.

kadamwhite · 2018-10-19T19:48:48Z

That's a valid point, but I agree moving this to the resolvers can wait; I'll need some help from y'all on that anyway :)

Currently I'm trying to figure out the best way to pass errors through so that this does not impact any existing tests, then I think we should be in decent shape, unless you have any other concerns.

Primary need at this time: other people than me to pull this down and test.

kadamwhite · 2018-10-21T03:30:15Z

Superseded by #10845, which proposes adding this to the logic of apiFetch itself instead of attempting to apply it as a middleware. After three days of experimentation the only way I can consistently get the pagination to behave the way we want is to keep it as close to the core window.fetch call as possible.

youknowriad · 2018-10-21T13:56:36Z

I personally still prefer this over #10845 so I'm going to reopen it and see what was wrong here.

Introduce a new middleware function to iterate through all available pages for a large collection, replacing any encountered `per_page=-1` requests with a series of sequential requests which are assembled into a final merged array of all available results.

…leware

…ne script

- Remove unneeded modification to apiFetch parsing logic. - Utilize the links header to traverse collection pages. - Remove confusing try/catch behavior. - Add more escape hatches for cases where middleware should not apply.

youknowriad · 2018-10-21T14:57:34Z

@danielbachhuber I'd appreciate a review for this one, I still think this is not a great solution but we need to move forward now and this is the best of the alternatives on the table at the moment if we don't want to keep the argument server-side.

youknowriad · 2018-10-21T14:58:20Z

lib/client-assets.php

@@ -231,7 +231,7 @@ function gutenberg_register_scripts_and_styles() {
 	gutenberg_override_script(
 		'wp-api-fetch',
 		gutenberg_url( 'build/api-fetch/index.js' ),
-		array( 'wp-polyfill', 'wp-hooks', 'wp-i18n' ),
+		array( 'wp-polyfill', 'wp-hooks', 'wp-i18n', 'wp-url' ),


@atimmer Something we should update upstream.

packages/edit-post/src/hooks/components/media-upload/index.js

danielbachhuber

Looks good once we can get a test for 93eeecc in

youknowriad · 2018-10-21T16:15:26Z

@danielbachhuber So, there was an issue with the middleware chain because we were not able to call next twice inside the same middleware which should possible in theory. We were mutating the chain. I pushed a fix and this also allows us to just use next instead of apiFetch

youknowriad · 2018-10-21T17:12:26Z

Thanks everyone 👍 Team job

…WordPress#10762)

kadamwhite added the [Type] Technical Prototype Offers a technical exploration into an idea as an example of what's possible label Oct 18, 2018

chrisvanpatten reviewed Oct 18, 2018

View reviewed changes

danielbachhuber mentioned this pull request Oct 18, 2018

REST API: Progressively load data into components through pagination #6694

Closed

youknowriad reviewed Oct 19, 2018

View reviewed changes

kadamwhite mentioned this pull request Oct 21, 2018

Support recursively fetching paginated collections within apiFetch module #10845

Closed

4 tasks

kadamwhite closed this Oct 21, 2018

youknowriad deleted the try/6694-paginate-large-collection-requests branch October 21, 2018 08:35

youknowriad mentioned this pull request Oct 21, 2018

Try implementing the auto-pagination in Core Data #10866

Closed

youknowriad restored the try/6694-paginate-large-collection-requests branch October 21, 2018 13:56

youknowriad reopened this Oct 21, 2018

kadamwhite added 7 commits October 21, 2018 14:59

Move fetchAllMiddleware to its own module

a3f3517

Use @wordpress/url module to modify query strings within fetchAllMidd…

0b13284

…leware

Make fetchAllMiddleware optional within apiFetch and add it with inli…

c54aa57

…ne script

Improve comments for fetchAllMiddleware module

ed69cb5

Improve fetchAllMiddleware: use links for pagination.

49d80cd

- Remove unneeded modification to apiFetch parsing logic. - Utilize the links header to traverse collection pages. - Remove confusing try/catch behavior. - Add more escape hatches for cases where middleware should not apply.

Remove leftover debugging comment

c615efd

kadamwhite and others added 2 commits October 21, 2018 14:59

Fix improperly reverted parsing code change

7703ce2

Fix unit tests

c2c2f6c

youknowriad force-pushed the try/6694-paginate-large-collection-requests branch from 94de136 to c2c2f6c Compare October 21, 2018 14:31

Avoid circular dependency

158f088

youknowriad mentioned this pull request Oct 21, 2018

Add user locale api-fetch middleware #10862

Merged

4 tasks

youknowriad added 2 commits October 21, 2018 15:50

Adding some unit tests

5c7a2f2

Remove useless per_page argument

b7bb2cd

youknowriad reviewed Oct 21, 2018

View reviewed changes

youknowriad added 2 commits October 21, 2018 16:06

Update the lock file

5436e18

Update api-fetch CHANGELOG

8803b88

danielbachhuber reviewed Oct 21, 2018

View reviewed changes

packages/edit-post/src/hooks/components/media-upload/index.js Show resolved Hide resolved

danielbachhuber added 2 commits October 21, 2018 08:20

Fetch as many attachments as we can, even if its not all of them

0a08388

Remove per_page=-1 support shim in REST API

1ce1f2b

youknowriad approved these changes Oct 21, 2018

View reviewed changes

Ensure the next page of results is actually fetched

93eeecc

danielbachhuber added this to the 4.1 - UI freeze milestone Oct 21, 2018

danielbachhuber self-requested a review October 21, 2018 16:01

danielbachhuber approved these changes Oct 21, 2018

View reviewed changes

Tweak the middleware chain to avoid mutations

08e8ca8

Testing fetching consequential pages

562250d

youknowriad merged commit dee3dcf into master Oct 21, 2018

youknowriad deleted the try/6694-paginate-large-collection-requests branch October 21, 2018 17:12

atimmer mentioned this pull request Oct 24, 2018

Primary category picker isn't working (anymore) Yoast/wordpress-seo#11394

Closed

danielbachhuber mentioned this pull request Oct 24, 2018

[4.1.0-rc.2] Categories are not displayed any more #10993

Closed

antpb pushed a commit to antpb/gutenberg that referenced this pull request Oct 26, 2018

Implement fetchAllMiddleware to handle per_page=-1 through pagination (…

2ec8deb

…WordPress#10762)

danielbachhuber mentioned this pull request Nov 9, 2018

Fetch all categories to display in Latest Posts dropdown #11654

Merged

aduth mentioned this pull request Mar 20, 2019

Add X-WP-Total response to getEntityRecords() #12083

Closed

Implement fetchAllMiddleware to handle per_page=-1 through pagination #10762

Implement fetchAllMiddleware to handle per_page=-1 through pagination #10762

Conversation

kadamwhite commented Oct 18, 2018

Description

Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kadamwhite commented Oct 18, 2018

TimothyBJacobs commented Oct 18, 2018

kadamwhite commented Oct 18, 2018

kadamwhite commented Oct 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youknowriad commented Oct 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kadamwhite commented Oct 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youknowriad commented Oct 19, 2018

kadamwhite commented Oct 19, 2018

kadamwhite commented Oct 21, 2018

youknowriad commented Oct 21, 2018

youknowriad commented Oct 21, 2018

Choose a reason for hiding this comment

danielbachhuber left a comment

Choose a reason for hiding this comment

youknowriad commented Oct 21, 2018

youknowriad commented Oct 21, 2018