Project pagination (backend) #2268

sanderegg · 2021-04-13T14:03:25Z

What do these changes do?

brings limit/offset style pagination to /v0/projects endpoint:
- using pydantic to validate/document pagination for aiohttp-based services
- returns a _meta object containing: count,offset,limit,total (e.g. number of data returned, offset, limit, total number of records)
- returns a _link object containing standard links to self,first,prev,next,last pages
- NOTE: that is not a page but an offset with the records

Bonus: removed some very old things.

Related issue/s

How to test

Checklist

codecov · 2021-04-13T14:04:34Z

Codecov Report

Merging #2268 (d0285d2) into master (486f316) will increase coverage by 0.0%.
The diff coverage is 97.0%.

@@          Coverage Diff           @@
##           master   #2268   +/-   ##
======================================
  Coverage    71.5%   71.6%           
======================================
  Files         486     488    +2     
  Lines       19270   19286   +16     
  Branches     1902    1897    -5     
======================================
+ Hits        13792   13817   +25     
+ Misses       5015    5007    -8     
+ Partials      463     462    -1

Flag	Coverage Δ
integrationtests	`62.0% <93.0%> (-0.1%)`	⬇️
unittests	`66.8% <97.0%> (+<0.1%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...kages/service-library/src/servicelib/rest_utils.py	`65.7% <ø> (ø)`
.../simcore_service_webserver/projects/projects_db.py	`90.7% <93.3%> (+0.5%)`	⬆️
...ce-library/src/servicelib/rest_pagination_utils.py	`96.4% <96.4%> (ø)`
...s/service-library/src/servicelib/rest_responses.py	`80.3% <100.0%> (+0.3%)`	⬆️
.../simcore_service_webserver/catalog_api_handlers.py	`35.6% <100.0%> (ø)`
...simcore_service_webserver/projects/module_setup.py	`83.7% <100.0%> (+3.7%)`	⬆️
...mcore_service_webserver/projects/project_models.py	`100.0% <100.0%> (ø)`
...re_service_webserver/projects/projects_handlers.py	`90.2% <100.0%> (-0.3%)`	⬇️
...server/src/simcore_service_webserver/rest_utils.py	`100.0% <100.0%> (ø)`
.../director/src/simcore_service_director/producer.py	`60.8% <0.0%> (-0.7%)`	⬇️
... and 4 more

pcrespov

Very nice! Just some suggestions!

packages/service-library/src/servicelib/rest_pagination_utils.py

packages/service-library/tests/test_rest_pagination_utils.py

services/web/server/src/simcore_service_webserver/projects/project_models.py

services/web/server/src/simcore_service_webserver/projects/projects_db.py

services/web/server/src/simcore_service_webserver/projects/projects_handlers.py

services/web/server/tests/unit/with_dbs/09/test_projects_01.py

GitHK

Let's say this will be used to paginate the files in storage (which can easily be composed of thousands of elements).

Would it not be better to have a more simpler pagination method which dose not need to scroll over the entire dataset (each time I a page of data is requested). In my opinion this pagination method, only solves the problem half way. The backend will get even more load now.

I think that by leveraging the SQL LIMIT command this would be much more efficient. This will properly implement infinte scrolling. You never have any idea what the entire dataset length is, except when you hit the last page and you no longer have any "next page" links.

packages/pytest-simcore/src/pytest_simcore/helpers/utils_assert.py

sanderegg · 2021-04-22T06:47:05Z

@GitHK

Let's say this will be used to paginate the files in storage (which can easily be composed of thousands of elements).

No for this I would not use limit/offset pagination but cursor based. see here or here for all the different kinds of paginations.

Would it not be better to have a more simpler pagination method which dose not need to scroll over the entire dataset (each time I a page of data is requested). In my opinion this pagination method, only solves the problem half way. The backend will get even more load now.

I am not sure I understand what you mean by "need to scroll over the entire dataset" and "the backend will get more load now"?
Calling the database with OFFSET is not something to be used when you have millions of entries to go through (which is not the case with projects), but it has the advantage of being simple. Also it is a first step. Next one will be CURSOR-based pagination (see the links above) which will be the preferred one especially for cases with a huge load of entries.
Now about your comment with the load on the backend I don't get it. currently we get all the entries from the database, then call the director-v2 on each of them to in turn access the database/celery/rabbit which triggers 100s of REST calls + DB calls. now it will be limited to 20 by default/50 max per /v0/projects call. so please explain.

I think that by leveraging the SQL LIMIT command this would be much more efficient. This will properly implement infinte scrolling. You never have any idea what the entire dataset length is, except when you hit the last page and you no longer have any "next page" links.

which is I think precisely what is implemented... see test_projects_01.py and let me know what I am missing here... even the test does call next link...

GitHK · 2021-04-22T06:59:45Z

@sanderegg

@GitHK

Let's say this will be used to paginate the files in storage (which can easily be composed of thousands of elements).

No for this I would not use limit/offset pagination but cursor based. see here or here for all the different kinds of paginations.

OK

Would it not be better to have a more simpler pagination method which dose not need to scroll over the entire dataset (each time I a page of data is requested). In my opinion this pagination method, only solves the problem half way. The backend will get even more load now.

I am not sure I understand what you mean by "need to scroll over the entire dataset" and "the backend will get more load now"?
Calling the database with OFFSET is not something to be used when you have millions of entries to go through (which is not the case with projects), but it has the advantage of being simple. Also it is a first step. Next one will be CURSOR-based pagination (see the links above) which will be the preferred one especially for cases with a huge load of entries.
Now about your comment with the load on the backend I don't get it. currently we get all the entries from the database, then call the director-v2 on each of them to in turn access the database/celery/rabbit which triggers 100s of REST calls + DB calls. now it will be limited to 20 by default/50 max per /v0/projects call. so please explain.

I was trying to say that if there are 7 projects, and pagination api lists 4 projects per page. The api will be called 2 times, once will return the first 4 projects and the second time it will return the last 3 projects. In both situation the backend will fetch from the database all 7 projects.
Is my understanding of this correct?

I think that by leveraging the SQL LIMIT command this would be much more efficient. This will properly implement infinte scrolling. You never have any idea what the entire dataset length is, except when you hit the last page and you no longer have any "next page" links.

which is I think precisely what is implemented... see test_projects_01.py and let me know what I am missing here... even the test does call next link...

OK (minor thing: here you already know and expect the size of the dataset, but it's fine for now)

sanderegg · 2021-04-22T07:23:39Z

@sanderegg

@GitHK

Let's say this will be used to paginate the files in storage (which can easily be composed of thousands of elements).

No for this I would not use limit/offset pagination but cursor based. see here or here for all the different kinds of paginations.

OK

Would it not be better to have a more simpler pagination method which dose not need to scroll over the entire dataset (each time I a page of data is requested). In my opinion this pagination method, only solves the problem half way. The backend will get even more load now.

I am not sure I understand what you mean by "need to scroll over the entire dataset" and "the backend will get more load now"?
Calling the database with OFFSET is not something to be used when you have millions of entries to go through (which is not the case with projects), but it has the advantage of being simple. Also it is a first step. Next one will be CURSOR-based pagination (see the links above) which will be the preferred one especially for cases with a huge load of entries.
Now about your comment with the load on the backend I don't get it. currently we get all the entries from the database, then call the director-v2 on each of them to in turn access the database/celery/rabbit which triggers 100s of REST calls + DB calls. now it will be limited to 20 by default/50 max per /v0/projects call. so please explain.

I was trying to say that if there are 7 projects, and pagination api lists 4 projects per page. The api will be called 2 times, once will return the first 4 projects and the second time it will return the last 3 projects. In both situation the backend will fetch from the database all 7 projects.
Is my understanding of this correct?

No. This is a webAPI. and the problem is not arising with 7 projects but more so when you get over 100 projects.
So let's put it this way:

say you have 500 projects
before: the frontend would ask for the projects and would get all of them: 1 webserver API call -> getting 500 projects out of the DB, then calling 500 times the director-v2, which in turns calls the DB 500 times +- Celery and whatever else, making the webserver and the frontend wait for all this, then send back a big load of data to the frontend, which would then sort them and display them although the user only sees about 20 of them.
after: the frontend asks for 20 projects -> this goes the same way through the database but is limited to the 20 projects asked.
after: if the frontend needs more projects, then it will ask for them, if not... then that is 480 projects not asked for.
So I really do not see where the backend load gets higher??

I think that by leveraging the SQL LIMIT command this would be much more efficient. This will properly implement infinte scrolling. You never have any idea what the entire dataset length is, except when you hit the last page and you no longer have any "next page" links.

which is I think precisely what is implemented... see test_projects_01.py and let me know what I am missing here... even the test does call next link...

OK (minor thing: here you already know and expect the size of the dataset, but it's fine for now)

Yes I always know the size of the dataset (e.g. the number of projects a user can see). But why would I return all of it if the frontend does not need it?

GitHK

👍 I'm fine with it.

odeimaiz

It works like a charm

fix usage of offset. it's not a page

… time

sanderegg added the a:webserver issue related to the webserver service label Apr 13, 2021

sanderegg added this to the Schwarznasenschaf milestone Apr 13, 2021

sanderegg self-assigned this Apr 13, 2021

sanderegg force-pushed the project_pagination branch 5 times, most recently from 73a8ce7 to e4cd7a5 Compare April 21, 2021 07:31

sanderegg changed the title ~~WIP: Project pagination~~ Project pagination (backend) Apr 21, 2021

sanderegg marked this pull request as ready for review April 21, 2021 16:31

sanderegg requested review from pcrespov, odeimaiz, GitHK and ignapas April 21, 2021 16:31

sanderegg force-pushed the project_pagination branch 2 times, most recently from 105f799 to 55f2356 Compare April 21, 2021 21:26

pcrespov approved these changes Apr 21, 2021

View reviewed changes

GitHK reviewed Apr 22, 2021

View reviewed changes

packages/pytest-simcore/src/pytest_simcore/helpers/utils_assert.py Show resolved Hide resolved

GitHK approved these changes Apr 22, 2021

View reviewed changes

odeimaiz mentioned this pull request Apr 22, 2021

Studies pagination #2273

Merged

odeimaiz approved these changes Apr 22, 2021

View reviewed changes

sanderegg force-pushed the project_pagination branch from a97c530 to 618e5bb Compare April 22, 2021 12:15

sanderegg added 4 commits April 22, 2021 14:15

removed commented stuff

da8d8c1

removing separate functions and use parameters instead

c07d8dc

paginating soon

b99e170

renaming

46c77d9

sanderegg and others added 27 commits April 22, 2021 14:15

test validators

3fc5736

rest_pagination_utils fully tested

4f5839a

added testing of paginated projects

1759d93

return None for meta and links if error

2c2d5bc

fix prev links

010ad3c

fix usage of offset. it's not a page

fix prev link

c11288c

removed projects_fakes.py

14c52d2

typo

4438a1c

removed projects_fakes.py

f2710bd

restore project file

15aeb7e

@pcrespov review: use nonnegativeint

625bf0d

@pcrespov review: improve validators

1184e6a

@pcrespov review: code simplification

cf6bdde

@pcrespov review: use generic response model policy

9d4690f

@pcrespov review: update pydantic

db99013

linter

da18f17

fix import

5a4a086

@pcrespov review: test invalid pagination request

11da7e7

fix unit testing pagination utils and added some corner cases

f8333f7

ensure that 2 project with the same date will be sorted the same each…

8e86a37

… time

typo

afee29d

print debug info in case of failure

9bc2103

@pcrespov review: use sqlalchemy

2de7e5c

using wrong ProjectType

618e5bb

fix int test

74a3857

Merge branch 'master' into project_pagination

9f97e15

Merge branch 'master' into project_pagination

d0285d2

sanderegg merged commit 4714c68 into ITISFoundation:master Apr 23, 2021

sanderegg deleted the project_pagination branch April 23, 2021 08:03

sanderegg mentioned this pull request May 5, 2021

maintenance/scaling of the platform ITISFoundation/osparc-issues#428

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project pagination (backend) #2268

Project pagination (backend) #2268

sanderegg commented Apr 13, 2021 •

edited

Loading

codecov bot commented Apr 13, 2021 •

edited

Loading

pcrespov left a comment

GitHK left a comment

sanderegg commented Apr 22, 2021 •

edited

Loading

GitHK commented Apr 22, 2021

sanderegg commented Apr 22, 2021

GitHK left a comment

odeimaiz left a comment

Project pagination (backend) #2268

Project pagination (backend) #2268

Conversation

sanderegg commented Apr 13, 2021 • edited Loading

What do these changes do?

Related issue/s

How to test

Checklist

codecov bot commented Apr 13, 2021 • edited Loading

Codecov Report

pcrespov left a comment

Choose a reason for hiding this comment

GitHK left a comment

Choose a reason for hiding this comment

sanderegg commented Apr 22, 2021 • edited Loading

GitHK commented Apr 22, 2021

sanderegg commented Apr 22, 2021

GitHK left a comment

Choose a reason for hiding this comment

odeimaiz left a comment

Choose a reason for hiding this comment

sanderegg commented Apr 13, 2021 •

edited

Loading

codecov bot commented Apr 13, 2021 •

edited

Loading

sanderegg commented Apr 22, 2021 •

edited

Loading