Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pull list table views include/exclude more performant #431

Merged
merged 6 commits into from
Jul 18, 2019

Conversation

helen
Copy link
Contributor

@helen helen commented Jul 17, 2019

Description of the Change

As reported in #418, getting content from external connections for the pull list table when there are a lot of pulled/skipped posts ends up failing due to header size. This PR truncates the list of posts found in the sync log to include/exclude so as to avoid that failure and adjusts the UI to accommodate the change.

Alternate Designs

Initial suggestion was to strip the referer, which might help if you're right above the limit as it would let a few more characters through, but would eventually stop working. In the future, not as an alternate, we should explore generating/storing the sync log for authenticated connections on the remote side so a full include/exclude list doesn't have to be passed.

Benefits

Besides that you will no longer get a failed authentication message, this makes the pulled and skipped lists more performant, and IMO all the lists are more sensibly ordered. For pulled and skipped, only the necessary IDs for the current page are passed and in reverse order of when things were pulled/skipped (e.g. latest acted upon show up first, regardless of date), negating the need to run extra queries to sort by date on the remote site and, again IMO, match user expectations/needs for referring to those screens better. For new content, it is now sorted by ID descending instead of date to match the sync log truncation method, which means newly created content shows up at the top, as opposed to just by date.

Possible Drawbacks

Posts that were pulled/skipped a while back will start showing up in the new list again, although they are marked/styled appropriately and cannot be pulled. For sites that pull some but not all/most content, these should not show up until later pages. For sites that pull most or all content from a connection, these items will show up sooner and could lead to some weirdness where you're just getting pages and pages of already-processed content. Not great, but better than not being able to connect at all.

We can discuss changing skipped to be able to be pulled, but for the moment, it looks like this:

Screen Shot 2019-07-17 at 8 56 31 AM

Verification Process

For pulled/skipped, I set the posts per page in screen options to 1, pulled/skipped posts in mixed chronology, and made sure that pagination and ordering worked as expected.

For new, I commented out the post__not_in argument to see pulled/skipped posts in place. I also reproduced the originally reported failure by inserting $this->sync_log = array_combine ( range( 1, 500 ), range( 1, 500 ) ).

Checklist:

  • I have read the CONTRIBUTING document.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my change.
  • All new and existing tests passed.

Applicable Issues

Fixes #418

helen added 5 commits July 16, 2019 12:47
Above this runs the risk of exceeding header size limits, leading to failed authentication. Even further leads to a `max_input_vars` warning.

Not accounted for yet: pagination of included values (we can truncate smartly here), and making sure we are truncating from newest first.
This is being done in order to support truncating the post include/exclude argument that is passed, as that can only sanely be ordered numerically. We may be able to support ASC at some point, but no need to explore that right now. It would not really be correct to keep the column named "Date" at that point but it's possibly close enough for most cases.

This removes the sortable column headers which do not appear to have ever worked.
For new items, this puts newest IDs first and will only exclude the first 200 pulled/skipped items. Still to come: disabling pull for already-pulled/skipped posts in later pages when they show up.

For skipped and pulled, this puts the latest skipped/pulled first, which seems more sensible. Also I believe this makes the request more performant, as it's providing specific IDs, avoiding the first query that grabs IDs before building out the post objects, but I could be wrong.
Ideally they wouldn't show at all but until we have a better method of referring to an exclude list that doesn't rely on headers of limited size, this is what we have.
@helen helen added this to the 1.5.0 milestone Jul 17, 2019
@jeffpaul jeffpaul added the type:bug Something isn't working. label Jul 17, 2019
@helen helen merged commit c74a7e2 into develop Jul 18, 2019
@helen helen deleted the fix/sync-log-header-length branch July 18, 2019 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Authorization Failing // Truncate Referrer URLs
2 participants