Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning to search response when pages have shifted #3590

Closed
lmsurpre opened this issue Apr 22, 2022 · 6 comments
Closed

Add warning to search response when pages have shifted #3590

lmsurpre opened this issue Apr 22, 2022 · 6 comments
Assignees
Labels
enhancement New feature or request P2 Priority 2 - Should Have

Comments

@lmsurpre
Copy link
Member

lmsurpre commented Apr 22, 2022

Is your feature request related to a problem? Please describe.
Many FHIR Server implementations hold search session state to provide stable paging. That hurts performance/scalability but may be more appropriate for certain use cases.

Describe the solution you'd like
In cases where we cannot provide stable paging, we can at least detect that the pages have shifted and provide a warning.

  1. add the "changeId" of the first result to the 'next' link with a param name like 'prevChangeId'
  2. add the "changeId" of the last result to the 'prev' link with a param name like 'nextChangeId'
  3. when we get a search request with a 'prevChangeId' value, fetch an extra resource just before the page and confirm that its id matches the expected value
  4. when we get a search request with a 'nextChangeId' value, fetch an extra resource just after the page and confirm that its id matches the expected value

For 3 and 4, if the id doesn't match the expected value then add an OperationOutcome to the response bundle with a warning like:

"Pages have shifted; check prior pages for changed results"

Worked example:

GET [base]/Patient?_count=3

Bundle
links:
* next: /Patient?_page=2&_count=3&prevChangeId=3
entry:
* id=1
* id=2
* id=3
GET [base]/Patient?_page=2&_count=3&prevChangeId=3

Bundle
links:
* prev: /Patient?page=1&_count=3&nextChangeId=4
* next: /Patient?page=3&_count=3&prevChangeId=6
entry:
* id=4
* id=5
* id=6

Describe alternatives you've considered
Fetch full pages on either side of the current page and hash those, then use that hash.
This would cover cases where there was an insert and a delete on either side...so the page looks stable but there is some changed content.

Acceptance Criteria

  1. GIVEN a search with multiple pages of results
    AND a user has requested a page (e.g. page 1)
    WHEN a resource from that page (or prior) is deleted
    AND the user then follows the provided (now-stale) "next" link (e.g. to page 2)
    THEN the response bundle contains an entry with mode=outcome and an OperationOutcome resource that includes an appropriate warning

  2. GIVEN a search with multiple pages of results
    AND a user has requested a page (e.g. page 2)
    WHEN a resource that would land on that page (or prior) is added
    AND the user then follows the provided (now-stale) "prev" link (e.g. to page 1)
    THEN the response bundle contains an entry with mode=outcome and an OperationOutcome resource that includes an appropriate warning

  3. GIVEN a search with multiple pages of results
    AND a user has requested a page (e.g. page 1)
    WHEN no resource from that page (or prior) is deleted (either no deletes or a delete of a resource on a latter page)
    AND the user then follows the provided "next" link (e.g. to page 2)
    THEN the response bundle does not contain an entry with mode=outcome and an OperationOutcome resource about shifted pages

  4. GIVEN a search with multiple pages of results
    AND a user has requested a page (e.g. page 2)
    WHEN no resource that would land on that page (or prior) is added
    AND the user then follows the provided "prev" link (e.g. to page 1)
    THEN the response bundle does not contain an entry with mode=outcome and an OperationOutcome resource about shifted pages

Additional context
If we do it with #3476 I think we could provide more stable paging when _sort=none (e.g. instead of issuing a warning when pages shift, we could potentially just start the next page at the most appropriate place)

@lmsurpre lmsurpre added the enhancement New feature or request label Apr 22, 2022
@lmsurpre lmsurpre added the P2 Priority 2 - Should Have label Sep 20, 2022
@punktilious
Copy link
Collaborator

punktilious commented Sep 28, 2022

Here's my thought on this. Let's assume we have a search result with 8 items and we use count=3, so we get 3 pages:

PAGE 1                                           PAGE 2                                     PAGE 3             
id=1                                             id=4                                         id=7
id=2                                             id=5                                         id=8
id=3                                             id=6
next:_page=2&_firstId=4
self:_page=1&_firstId=1&_lastId=3
                                            next:_page=3&_firstId=7
                                            prev:_page=1&_lastId=3
                                            self:_page=2&_firstId=4&_lastId=6
                                                                                     prev:_page=2&_lastId=6
                                                                                     self:_page=3&_firstId=7&_lastId=8

I'm being careful with the parameter names to try to make it clear which parameters relate to the page, and which relate to a neighbor page (prev or next).

For the first page, we need to set the limit to be (pageSize + 1).

When we fetch any other page, we need to adjust the offset to be (startOfPage - 1) and the limit to be (pageSize + 2). This will give us one item before and one item after. These extras are not to be included in the result bundle,. Only their ids are used.

firstId: the id of the first entry expected on the target page (referenced by the next link)
lastId: the id of the last entry expected on the target page (referenced by the prev link)

Using the above example, on page 1 we add a next link with the firstId set to be the id obtained from the last item in the ResultSet (because LIMIT was pageSize+1).

For page 2, we add a next link with the firstId set to be the id obtained from the last item in the ResultSet. We add a prev link with the lastId set to be the id of the first item in the ResultSet (which is not included in the result bundle).

When we process a page, if we are given firstId, we check that it matches the first item included in the result bundle. If we are given lastId, we check that it matches the last item included in the result bundle.

@PrasannaHegde1 PrasannaHegde1 self-assigned this Sep 29, 2022
@lmsurpre
Copy link
Member Author

lmsurpre commented Oct 26, 2022

Once difference between what was implemented and what I sort of expected is that I thought that the values of the _firstId and _lastId parameters would be the internal, numeric row ids for the X_RESOURCES table (something that is not meant to be meaningful to the end user).

Instead, the current implementation uses the actual Resource.id value for the first and last entry on the page.
One downside to that is that these ids can be pretty long, so we end up with self urls like

https://localhost:9443/fhir-server/api/v4/Patient?_count=10&_page=2&_firstId=1832f27c296-37254f91-0382-4bb7-bb66-bb840fcc97b2&_lastId=1832f28f1c5-af934227-9088-40fc-860f-6baa58b8094b

instead of

https://localhost:9443/fhir-server/api/v4/Patient?_count=10&_page=2&_firstId=12345678&_lastId=12345688

@lmsurpre
Copy link
Member Author

With that said, I verified the implementation by issuing a Patient search that yeilded over 100 results (using the default page size of 10).
I then proceeded to

  1. copy the self and next urls; and
  2. delete the first entry of the result set (via HTTP DELETE to the server)

After this, I confirmed that dereferencing the self url AND the next url now result in an 11th entry with the "pages have shifted" warning:

    {
      "resource": {
        "resourceType": "OperationOutcome",
        "issue": [
          {
            "severity": "warning",
            "code": "conflict",
            "details": {
              "text": "Pages have shifted; check pages for changed results."
            }
          }
        ]
      },
      "search": {
        "mode": "outcome"
      }
    }

@lmsurpre
Copy link
Member Author

I also confirmed that this warning is not added to the pages when the _firstId or _lastId parameters are omitted or unchanged (i.e. the actual id matches the expected id)

@lmsurpre
Copy link
Member Author

Talked with Robin and he was expecting us to use the primary key of the x_RESOURCES table for these entries as well.

One possible downside to that is that resource updates.
On the other hand, this could be helpful if users were expecting the pages to reflect the state of the system when the original search was performed.

Team decision: mostly just for consistency with the history API, lets use the database primary key of the RESOURCES table (X_RESOURCE.RESOURCE_ID) as the value for these parameters.

@lmsurpre lmsurpre reopened this Oct 26, 2022
@lmsurpre
Copy link
Member Author

lmsurpre commented Nov 8, 2022

I confirm that we're now using the internal resource ids in these urls. For example:
previous: Practitioner?_page=1&_lastId=1037
self: Practitioner?_count=10&_page=2&_firstId=1041&_lastId=1077
next: Practitioner?_page=3&_firstId=1081

I also reconfirmed that the warning shows up on each of those links after deleting the first entry of the first page.
Looks good.

@lmsurpre lmsurpre closed this as completed Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P2 Priority 2 - Should Have
Projects
None yet
Development

No branches or pull requests

3 participants