Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement fetchAllMiddleware to handle per_page=-1 through pagination #10762

Merged
merged 21 commits into from
Oct 21, 2018

Conversation

kadamwhite
Copy link
Contributor

Description

Introduce a new middleware function to iterate through all available pages for a large collection, replacing any encountered per_page=-1 requests with a series of sequential requests which are assembled into a final merged array of all available results.

This middleware method has been implemented inline within the apiFetch
module's index for expediency, given some dependencies on other apiFetch
methods; this initial commit should be regarded as a proof-of-concept.

See #6694

Checklist:

  • My code is tested.
  • My code follows the WordPress code style.
  • My code follows the accessibility standards.
  • My code has proper inline documentation.

@kadamwhite kadamwhite added the [Type] Technical Prototype Offers a technical exploration into an idea as an example of what's possible label Oct 18, 2018
// Swap back to requesting the max of 100 items per page.
// TODO: This feels brittle. Is there a better way to manage request parameters?
path: options.path && `${ options.path.replace( /(\&?per_page)=-1/, '$1=100' ) }&page=1`,
url: options.url && `${ options.url.replace( /(\&?per_page)=-1/, '$1=100' ) }&page=1`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use @wordpress/url's addQueryArgs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about just setting a flag on the options for turning on paging?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per_page seems to be too specific to WP queries. This package is meant to be wp-agnostic I think? Maybe options could accept something like:

{
     page: 'page'
}

Where the presence of that key "turns on" paging and the value is the paging arg that gets appended to the request url?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not aware this package was intended to be wp-agnostic; no solution we come up with will be guaranteed to work with an arbitrary API. We can use another option here, but we will have to be making assumptions about how the WordPress REST API works in order for this to address our specific issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my impression that published packages were intended to be wp-agnostic but I'll leave for others to confirm. If so, maybe the middleware itself would be a part of wp-core then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm my points, I see that apiFetch itself is wp-agnostic but the bundled middlewares don't have to be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, that does represent an argument in favor of having WP apply this middleware rather than having it applied by default. I'll take that into consideration 👍

@kadamwhite
Copy link
Contributor Author

My idea was to leave perPage: -1 specifications alone wherever they occur, and to add a middleware object which would intercept those requests and convert them to a sequence of API requests for each page in the specified collection. There's a couple limitations of the specifics of this patch, but I hope to work through it with the team over the coming day.

Chief among the issues: this doesn't actually work. While inspecting the API requests and the response object demonstrates that the returned collection is properly assembled, when testing with many categories I received this error:

Warning: Can't call setState (or forceUpdate) on an unmounted component. (click to see full error)

Warning: Can't call setState (or forceUpdate) on an unmounted component. This is a no-op, but it indicates a memory leak in your application. To fix, cancel all subscriptions and asynchronous tasks in the componentWillUnmount method.
    in HierarchicalTermSelector (created by WithFilters(HierarchicalTermSelector))
    in WithFilters(HierarchicalTermSelector) (created by WithInstanceId(WithFilters(HierarchicalTermSelector)))
    in WithInstanceId(WithFilters(HierarchicalTermSelector)) (created by WithSpokenMessages(WithInstanceId(WithFilters(HierarchicalTermSelector))))
    in WithSpokenMessages(WithInstanceId(WithFilters(HierarchicalTermSelector))) (created by _class)
    in _class (created by RemountOnPropChange(_class))
    in RemountOnPropChange(_class)
    in Unknown (created by WithDispatch(WithSpokenMessages(WithInstanceId(WithFilters(HierarchicalTermSelector)))))
    in WithDispatch(WithSpokenMessages(WithInstanceId(WithFilters(HierarchicalTermSelector)))) (created by _class)
    in _class (created by RemountOnPropChange(_class))
    in RemountOnPropChange(_class)
    in WithSelect(WithDispatch(WithSpokenMessages(WithInstanceId(WithFilters(HierarchicalTermSelector))))) (created by PostTaxonomies)
    in div (created by PanelBody)
    in PanelBody (created by TaxonomyPanel)
    in TaxonomyPanel (created by _class)
    in _class (created by RemountOnPropChange(_class))
    in RemountOnPropChange(_class)
    in Unknown (created by WithDispatch(TaxonomyPanel))
    in WithDispatch(TaxonomyPanel) (created by _class)
    in _class (created by RemountOnPropChange(_class))
    in RemountOnPropChange(_class)
    in WithSelect(WithDispatch(TaxonomyPanel)) (created by PostTaxonomies)

Known things I'd like to improve, which may be better addressed through a solution other than middleware:

  • Manually parsing query strings feels less than ideal. Per @chrisvanpatten the query-args package may help.
  • Extracting this into a middleware module would introduce a circular dependency with apiFetch, since apiFetch is needed to trigger additional requests -- we can't use next repeatedly because it pops things off its own internal stack and runs out of middleware methods.

While a better option than middleware did not suggest itself, if there's a higher-level method that consumes apiFetch which I could build this into, I could use a pointer about where to find it.

@TimothyBJacobs
Copy link
Member

Can we follow the links here instead of messing with the URL params at all? Those should carry over the existing query params.

@kadamwhite
Copy link
Contributor Author

@TimothyBJacobs We have to mess with the params for the first request (to unset -1), but yes, we can do that for all subsequent requests; good suggestion.

@kadamwhite
Copy link
Contributor Author

@TimothyBJacobs I believe we should stick with the manual page property generation for now. Parsing link headers would require introducing another dependency which would be used nowhere else, or else rolling our own implementation; it's not complex code but it's a non-trivial thing to add.

If we assume ?page= and manually request each one we also have the option of considering more robust failure recovery such as omitting failed pages (if only one out of many requests fails), rather than throwing away everything we did successfully fetch from the server.

@@ -26,9 +27,17 @@ function checkCloudflareError( error ) {
}
}

function parseResponse( response, parse = true ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for this change, doesn't seem related to the newly added middleware?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The middleware started off in this file, so moving this out allowed code re-use; I haven't fully backed the change back out.

// function will recursively assemble a full response by paging over all available
// pages of API data.
const fetchAllMiddleware = async ( options, next ) => {
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for this try/catch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise there's no handling for any rejections; this was a guard against errors when we invoke the method, to make sure it still returns a rejecting promise. I can remove it if you feel it is unnecessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this try/case exactly what will happen by default with async/await though?

// pages of API data.
const fetchAllMiddleware = async ( options, next ) => {
try {
if ( options.url && options.url.indexOf( 'per_page=-1' ) < 0 ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes it's path and not url (according to the raw implementation), should we do something like url || path before testing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes me think we could introduce a new mandatory middleware normalizing these two to a single property before reaching the raw middleware.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favor of that


return remainingPageRequests.reduce(
// Request each remaining page in sequence, and return a merged array.
async ( previousPageRequest, nextPageOptions ) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduce with asyncnot sure this works properly as the accumulator function is meant to return the next value synchronously? Maybe just a for loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this can work as a pattern for creating a chain of promises which resolves in series.

(I've done such before myself, before the fancy async/await)

A for loop -- perhaps for await (ES2018) with async generator -- could work as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really smart code :)

@youknowriad
Copy link
Contributor

Should we ignore the middleware if parse: true because we don’t have a unique response object to return

@@ -206,6 +206,11 @@ function gutenberg_register_scripts_and_styles() {
),
'after'
);
wp_add_inline_script(
'wp-api-fetch',
'wp.apiFetch.use( wp.apiFetch.fetchAllMiddleware );',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an inline script? I mean this feels like a mandatory middleware which should be always added as the last middleware before raw so we can always add it in js?


if ( ! Array.isArray( results ) ) {
// We have no reliable way of merging non-array results.
return results;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good solution. As we changed the intent of the request but didn't took it into consideration when providing the result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you suggest doing in this case? Endpoints which return an object will almost never return a next link header anyway, so there's not a huge impact to this that I can see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Endpoints which return an object will almost never return a next link header anyway

It's the "almost" that's worrying me why can't we just merge the objects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not seen an endpoint following our REST API's conventions which would both expose pagination headers and return an object, so I don't know how I should begin to attempt to merge object responses. If we assume that each subsequent page should overwrite the values from the prior, there may be data loss. If we assume that we should only merge array children, that could work, but again, there could be data loss.

Given that we do not have an immediate need to support this for core WordPress endpoints (that I know of), I suggest we look for prior art and revisit the question to add object merging logic once we encounter a specific situation where this becomes an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not seen an endpoint following our REST API's conventions which would both expose pagination headers and return an object,

Right now in Gutenberg we pass per_page=-1 in a request to fetch taxonomies. This might be wrong though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not wrong so much as unneedd -- the taxonomies controller does not accept a per_page argument, so that query parameter gets ignored. We use get_object_taxonomies and get_taxonomies within wp-rest-taxonomies-controller.php, niether of which accepts a limiting argument.

@kadamwhite
Copy link
Contributor Author

I have update the PR accounting for the above feedback:

  • parse: false is now detected and the middleware is not applied.
  • The links header is now used for pagination. (note: I attempted to include parse-link-header but was not familiar enough with the repo setup to properly add the module, so I wrote a shim which accomplishes the same logic to keep us moving forwardnp. We can swap out for the more robust, tested module later on.)
  • The middleware module has been refactored for clarity.
  • Per @youknowriad and @gziolo the middleware is now registered by default rather than being added in client-scripts.php

TODO:

  • Fix the unit test I broke: errors are not being correctly passed back through to the calling functions.

const fetchAllMiddleware = async ( options, next ) => {
if ( options.parse === false ) {
// If a consumer has opted out of parsing, do not apply middleware.
return next( options );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we throw an error here if we detect per_page: -1 in the request as the server won't know how to deal with it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should let it through; we don't do that anywhere ourselves, and the server itself will throw the error, which should make the issue pretty clear.

@youknowriad
Copy link
Contributor

I'm a bit concerned about this being a middleware because if there's a custom endpoint supporting per_page: -1, it won't work with this middleware. (This refactoring: moving to the resolvers) can be achieved separately.

@kadamwhite
Copy link
Contributor Author

That's a valid point, but I agree moving this to the resolvers can wait; I'll need some help from y'all on that anyway :)

Currently I'm trying to figure out the best way to pass errors through so that this does not impact any existing tests, then I think we should be in decent shape, unless you have any other concerns.

Primary need at this time: other people than me to pull this down and test.

@kadamwhite
Copy link
Contributor Author

Superseded by #10845, which proposes adding this to the logic of apiFetch itself instead of attempting to apply it as a middleware. After three days of experimentation the only way I can consistently get the pagination to behave the way we want is to keep it as close to the core window.fetch call as possible.

@kadamwhite kadamwhite closed this Oct 21, 2018
@youknowriad youknowriad deleted the try/6694-paginate-large-collection-requests branch October 21, 2018 08:35
@youknowriad
Copy link
Contributor

I personally still prefer this over #10845 so I'm going to reopen it and see what was wrong here.

@youknowriad youknowriad restored the try/6694-paginate-large-collection-requests branch October 21, 2018 13:56
@youknowriad youknowriad reopened this Oct 21, 2018
Introduce a new middleware function to iterate through all available pages
for a large collection, replacing any encountered `per_page=-1` requests
with a series of sequential requests which are assembled into a final
merged array of all available results.
- Remove unneeded modification to apiFetch parsing logic.
- Utilize the links header to traverse collection pages.
- Remove confusing try/catch behavior.
- Add more escape hatches for cases where middleware should not apply.
@youknowriad youknowriad force-pushed the try/6694-paginate-large-collection-requests branch from 94de136 to c2c2f6c Compare October 21, 2018 14:31
@youknowriad
Copy link
Contributor

@danielbachhuber I'd appreciate a review for this one, I still think this is not a great solution but we need to move forward now and this is the best of the alternatives on the table at the moment if we don't want to keep the argument server-side.

@@ -231,7 +231,7 @@ function gutenberg_register_scripts_and_styles() {
gutenberg_override_script(
'wp-api-fetch',
gutenberg_url( 'build/api-fetch/index.js' ),
array( 'wp-polyfill', 'wp-hooks', 'wp-i18n' ),
array( 'wp-polyfill', 'wp-hooks', 'wp-i18n', 'wp-url' ),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atimmer Something we should update upstream.

@danielbachhuber danielbachhuber added this to the 4.1 - UI freeze milestone Oct 21, 2018
@danielbachhuber danielbachhuber self-requested a review October 21, 2018 16:01
Copy link
Member

@danielbachhuber danielbachhuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good once we can get a test for 93eeecc in

@youknowriad
Copy link
Contributor

@danielbachhuber So, there was an issue with the middleware chain because we were not able to call next twice inside the same middleware which should possible in theory. We were mutating the chain. I pushed a fix and this also allows us to just use next instead of apiFetch

@youknowriad youknowriad merged commit dee3dcf into master Oct 21, 2018
@youknowriad youknowriad deleted the try/6694-paginate-large-collection-requests branch October 21, 2018 17:12
@youknowriad
Copy link
Contributor

Thanks everyone 👍 Team job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Type] Technical Prototype Offers a technical exploration into an idea as an example of what's possible
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants