-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement fetchAllMiddleware to handle per_page=-1 through pagination #10762
Implement fetchAllMiddleware to handle per_page=-1 through pagination #10762
Conversation
packages/api-fetch/src/index.js
Outdated
// Swap back to requesting the max of 100 items per page. | ||
// TODO: This feels brittle. Is there a better way to manage request parameters? | ||
path: options.path && `${ options.path.replace( /(\&?per_page)=-1/, '$1=100' ) }&page=1`, | ||
url: options.url && `${ options.url.replace( /(\&?per_page)=-1/, '$1=100' ) }&page=1`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use @wordpress/url
's addQueryArgs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about just setting a flag on the options
for turning on paging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per_page
seems to be too specific to WP queries. This package is meant to be wp-agnostic I think? Maybe options
could accept something like:
{
page: 'page'
}
Where the presence of that key "turns on" paging and the value is the paging arg that gets appended to the request url?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not aware this package was intended to be wp-agnostic; no solution we come up with will be guaranteed to work with an arbitrary API. We can use another option here, but we will have to be making assumptions about how the WordPress REST API works in order for this to address our specific issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was my impression that published packages were intended to be wp-agnostic but I'll leave for others to confirm. If so, maybe the middleware itself would be a part of wp-core then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm my points, I see that apiFetch itself is wp-agnostic but the bundled middlewares don't have to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, that does represent an argument in favor of having WP apply this middleware rather than having it applied by default. I'll take that into consideration 👍
My idea was to leave Chief among the issues: this doesn't actually work. While inspecting the API requests and the response object demonstrates that the returned collection is properly assembled, when testing with many categories I received this error:
|
Can we follow the links here instead of messing with the URL params at all? Those should carry over the existing query params. |
@TimothyBJacobs We have to mess with the params for the first request (to unset |
@TimothyBJacobs I believe we should stick with the manual page property generation for now. Parsing link headers would require introducing another dependency which would be used nowhere else, or else rolling our own implementation; it's not complex code but it's a non-trivial thing to add. If we assume |
packages/api-fetch/src/index.js
Outdated
@@ -26,9 +27,17 @@ function checkCloudflareError( error ) { | |||
} | |||
} | |||
|
|||
function parseResponse( response, parse = true ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for this change, doesn't seem related to the newly added middleware?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. The middleware started off in this file, so moving this out allowed code re-use; I haven't fully backed the change back out.
// function will recursively assemble a full response by paging over all available | ||
// pages of API data. | ||
const fetchAllMiddleware = async ( options, next ) => { | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for this try/catch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise there's no handling for any rejections; this was a guard against errors when we invoke the method, to make sure it still returns a rejecting promise. I can remove it if you feel it is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this try/case exactly what will happen by default with async/await
though?
// pages of API data. | ||
const fetchAllMiddleware = async ( options, next ) => { | ||
try { | ||
if ( options.url && options.url.indexOf( 'per_page=-1' ) < 0 ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes it's path and not url (according to the raw implementation), should we do something like url || path
before testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes me think we could introduce a new mandatory middleware normalizing these two to a single property before reaching the raw
middleware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be in favor of that
|
||
return remainingPageRequests.reduce( | ||
// Request each remaining page in sequence, and return a merged array. | ||
async ( previousPageRequest, nextPageOptions ) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduce with async
not sure this works properly as the accumulator function is meant to return the next value synchronously? Maybe just a for
loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this can work as a pattern for creating a chain of promises which resolves in series.
(I've done such before myself, before the fancy async
/await
)
A for
loop -- perhaps for await
(ES2018) with async generator -- could work as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really smart code :)
Should we ignore the middleware if |
lib/client-assets.php
Outdated
@@ -206,6 +206,11 @@ function gutenberg_register_scripts_and_styles() { | |||
), | |||
'after' | |||
); | |||
wp_add_inline_script( | |||
'wp-api-fetch', | |||
'wp.apiFetch.use( wp.apiFetch.fetchAllMiddleware );', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an inline script? I mean this feels like a mandatory middleware which should be always added as the last middleware before raw
so we can always add it in js?
|
||
if ( ! Array.isArray( results ) ) { | ||
// We have no reliable way of merging non-array results. | ||
return results; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a good solution. As we changed the intent of the request but didn't took it into consideration when providing the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you suggest doing in this case? Endpoints which return an object will almost never return a next
link header anyway, so there's not a huge impact to this that I can see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Endpoints which return an object will almost never return a next link header anyway
It's the "almost" that's worrying me why can't we just merge the objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not seen an endpoint following our REST API's conventions which would both expose pagination headers and return an object, so I don't know how I should begin to attempt to merge object responses. If we assume that each subsequent page should overwrite the values from the prior, there may be data loss. If we assume that we should only merge array children, that could work, but again, there could be data loss.
Given that we do not have an immediate need to support this for core WordPress endpoints (that I know of), I suggest we look for prior art and revisit the question to add object merging logic once we encounter a specific situation where this becomes an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not seen an endpoint following our REST API's conventions which would both expose pagination headers and return an object,
Right now in Gutenberg we pass per_page=-1
in a request to fetch taxonomies. This might be wrong though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not wrong so much as unneedd -- the taxonomies controller does not accept a per_page
argument, so that query parameter gets ignored. We use get_object_taxonomies
and get_taxonomies
within wp-rest-taxonomies-controller.php
, niether of which accepts a limiting argument.
I have update the PR accounting for the above feedback:
TODO:
|
const fetchAllMiddleware = async ( options, next ) => { | ||
if ( options.parse === false ) { | ||
// If a consumer has opted out of parsing, do not apply middleware. | ||
return next( options ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we throw an error here if we detect per_page: -1
in the request as the server won't know how to deal with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we should let it through; we don't do that anywhere ourselves, and the server itself will throw the error, which should make the issue pretty clear.
I'm a bit concerned about this being a middleware because if there's a custom endpoint supporting per_page: -1, it won't work with this middleware. (This refactoring: moving to the resolvers) can be achieved separately. |
That's a valid point, but I agree moving this to the resolvers can wait; I'll need some help from y'all on that anyway :) Currently I'm trying to figure out the best way to pass errors through so that this does not impact any existing tests, then I think we should be in decent shape, unless you have any other concerns. Primary need at this time: other people than me to pull this down and test. |
Superseded by #10845, which proposes adding this to the logic of |
I personally still prefer this over #10845 so I'm going to reopen it and see what was wrong here. |
Introduce a new middleware function to iterate through all available pages for a large collection, replacing any encountered `per_page=-1` requests with a series of sequential requests which are assembled into a final merged array of all available results.
- Remove unneeded modification to apiFetch parsing logic. - Utilize the links header to traverse collection pages. - Remove confusing try/catch behavior. - Add more escape hatches for cases where middleware should not apply.
94de136
to
c2c2f6c
Compare
@danielbachhuber I'd appreciate a review for this one, I still think this is not a great solution but we need to move forward now and this is the best of the alternatives on the table at the moment if we don't want to keep the argument server-side. |
@@ -231,7 +231,7 @@ function gutenberg_register_scripts_and_styles() { | |||
gutenberg_override_script( | |||
'wp-api-fetch', | |||
gutenberg_url( 'build/api-fetch/index.js' ), | |||
array( 'wp-polyfill', 'wp-hooks', 'wp-i18n' ), | |||
array( 'wp-polyfill', 'wp-hooks', 'wp-i18n', 'wp-url' ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atimmer Something we should update upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good once we can get a test for 93eeecc in
@danielbachhuber So, there was an issue with the middleware chain because we were not able to call |
Thanks everyone 👍 Team job |
Description
Introduce a new middleware function to iterate through all available pages for a large collection, replacing any encountered
per_page=-1
requests with a series of sequential requests which are assembled into a final merged array of all available results.This middleware method has been implemented inline within the apiFetch
module's index for expediency, given some dependencies on other apiFetch
methods; this initial commit should be regarded as a proof-of-concept.
See #6694
Checklist: