Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project pagination (backend) #2268

Merged
merged 46 commits into from
Apr 23, 2021

Conversation

sanderegg
Copy link
Member

@sanderegg sanderegg commented Apr 13, 2021

What do these changes do?

  • brings limit/offset style pagination to /v0/projects endpoint:
    • using pydantic to validate/document pagination for aiohttp-based services
    • returns a _meta object containing: count,offset,limit,total (e.g. number of data returned, offset, limit, total number of records)
    • returns a _link object containing standard links to self,first,prev,next,last pages
    • NOTE: that is not a page but an offset with the records

Bonus: removed some very old things.

Related issue/s

How to test

Checklist

@sanderegg sanderegg added the a:webserver issue related to the webserver service label Apr 13, 2021
@sanderegg sanderegg added this to the Schwarznasenschaf milestone Apr 13, 2021
@sanderegg sanderegg self-assigned this Apr 13, 2021
@codecov
Copy link

codecov bot commented Apr 13, 2021

Codecov Report

Merging #2268 (d0285d2) into master (486f316) will increase coverage by 0.0%.
The diff coverage is 97.0%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #2268   +/-   ##
======================================
  Coverage    71.5%   71.6%           
======================================
  Files         486     488    +2     
  Lines       19270   19286   +16     
  Branches     1902    1897    -5     
======================================
+ Hits        13792   13817   +25     
+ Misses       5015    5007    -8     
+ Partials      463     462    -1     
Flag Coverage Δ
integrationtests 62.0% <93.0%> (-0.1%) ⬇️
unittests 66.8% <97.0%> (+<0.1%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...kages/service-library/src/servicelib/rest_utils.py 65.7% <ø> (ø)
.../simcore_service_webserver/projects/projects_db.py 90.7% <93.3%> (+0.5%) ⬆️
...ce-library/src/servicelib/rest_pagination_utils.py 96.4% <96.4%> (ø)
...s/service-library/src/servicelib/rest_responses.py 80.3% <100.0%> (+0.3%) ⬆️
.../simcore_service_webserver/catalog_api_handlers.py 35.6% <100.0%> (ø)
...simcore_service_webserver/projects/module_setup.py 83.7% <100.0%> (+3.7%) ⬆️
...mcore_service_webserver/projects/project_models.py 100.0% <100.0%> (ø)
...re_service_webserver/projects/projects_handlers.py 90.2% <100.0%> (-0.3%) ⬇️
...server/src/simcore_service_webserver/rest_utils.py 100.0% <100.0%> (ø)
.../director/src/simcore_service_director/producer.py 60.8% <0.0%> (-0.7%) ⬇️
... and 4 more

@sanderegg sanderegg force-pushed the project_pagination branch 5 times, most recently from 73a8ce7 to e4cd7a5 Compare April 21, 2021 07:31
@sanderegg sanderegg changed the title WIP: Project pagination Project pagination (backend) Apr 21, 2021
@sanderegg sanderegg marked this pull request as ready for review April 21, 2021 16:31
@sanderegg sanderegg force-pushed the project_pagination branch 2 times, most recently from 105f799 to 55f2356 Compare April 21, 2021 21:26
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Just some suggestions!

Copy link
Contributor

@GitHK GitHK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say this will be used to paginate the files in storage (which can easily be composed of thousands of elements).

Would it not be better to have a more simpler pagination method which dose not need to scroll over the entire dataset (each time I a page of data is requested). In my opinion this pagination method, only solves the problem half way. The backend will get even more load now.

I think that by leveraging the SQL LIMIT command this would be much more efficient. This will properly implement infinte scrolling. You never have any idea what the entire dataset length is, except when you hit the last page and you no longer have any "next page" links.

@sanderegg
Copy link
Member Author

sanderegg commented Apr 22, 2021

@GitHK

Let's say this will be used to paginate the files in storage (which can easily be composed of thousands of elements).

No for this I would not use limit/offset pagination but cursor based. see here or here for all the different kinds of paginations.

Would it not be better to have a more simpler pagination method which dose not need to scroll over the entire dataset (each time I a page of data is requested). In my opinion this pagination method, only solves the problem half way. The backend will get even more load now.

I am not sure I understand what you mean by "need to scroll over the entire dataset" and "the backend will get more load now"?
Calling the database with OFFSET is not something to be used when you have millions of entries to go through (which is not the case with projects), but it has the advantage of being simple. Also it is a first step. Next one will be CURSOR-based pagination (see the links above) which will be the preferred one especially for cases with a huge load of entries.
Now about your comment with the load on the backend I don't get it. currently we get all the entries from the database, then call the director-v2 on each of them to in turn access the database/celery/rabbit which triggers 100s of REST calls + DB calls. now it will be limited to 20 by default/50 max per /v0/projects call. so please explain.

I think that by leveraging the SQL LIMIT command this would be much more efficient. This will properly implement infinte scrolling. You never have any idea what the entire dataset length is, except when you hit the last page and you no longer have any "next page" links.

which is I think precisely what is implemented... see test_projects_01.py and let me know what I am missing here... even the test does call next link...

@GitHK
Copy link
Contributor

GitHK commented Apr 22, 2021

@sanderegg

@GitHK

Let's say this will be used to paginate the files in storage (which can easily be composed of thousands of elements).

No for this I would not use limit/offset pagination but cursor based. see here or here for all the different kinds of paginations.

OK

Would it not be better to have a more simpler pagination method which dose not need to scroll over the entire dataset (each time I a page of data is requested). In my opinion this pagination method, only solves the problem half way. The backend will get even more load now.

I am not sure I understand what you mean by "need to scroll over the entire dataset" and "the backend will get more load now"?
Calling the database with OFFSET is not something to be used when you have millions of entries to go through (which is not the case with projects), but it has the advantage of being simple. Also it is a first step. Next one will be CURSOR-based pagination (see the links above) which will be the preferred one especially for cases with a huge load of entries.
Now about your comment with the load on the backend I don't get it. currently we get all the entries from the database, then call the director-v2 on each of them to in turn access the database/celery/rabbit which triggers 100s of REST calls + DB calls. now it will be limited to 20 by default/50 max per /v0/projects call. so please explain.

I was trying to say that if there are 7 projects, and pagination api lists 4 projects per page. The api will be called 2 times, once will return the first 4 projects and the second time it will return the last 3 projects. In both situation the backend will fetch from the database all 7 projects.
Is my understanding of this correct?

I think that by leveraging the SQL LIMIT command this would be much more efficient. This will properly implement infinte scrolling. You never have any idea what the entire dataset length is, except when you hit the last page and you no longer have any "next page" links.

which is I think precisely what is implemented... see test_projects_01.py and let me know what I am missing here... even the test does call next link...

OK (minor thing: here you already know and expect the size of the dataset, but it's fine for now)

@sanderegg
Copy link
Member Author

@sanderegg

@GitHK

Let's say this will be used to paginate the files in storage (which can easily be composed of thousands of elements).

No for this I would not use limit/offset pagination but cursor based. see here or here for all the different kinds of paginations.

OK

Would it not be better to have a more simpler pagination method which dose not need to scroll over the entire dataset (each time I a page of data is requested). In my opinion this pagination method, only solves the problem half way. The backend will get even more load now.

I am not sure I understand what you mean by "need to scroll over the entire dataset" and "the backend will get more load now"?
Calling the database with OFFSET is not something to be used when you have millions of entries to go through (which is not the case with projects), but it has the advantage of being simple. Also it is a first step. Next one will be CURSOR-based pagination (see the links above) which will be the preferred one especially for cases with a huge load of entries.
Now about your comment with the load on the backend I don't get it. currently we get all the entries from the database, then call the director-v2 on each of them to in turn access the database/celery/rabbit which triggers 100s of REST calls + DB calls. now it will be limited to 20 by default/50 max per /v0/projects call. so please explain.

I was trying to say that if there are 7 projects, and pagination api lists 4 projects per page. The api will be called 2 times, once will return the first 4 projects and the second time it will return the last 3 projects. In both situation the backend will fetch from the database all 7 projects.
Is my understanding of this correct?

No. This is a webAPI. and the problem is not arising with 7 projects but more so when you get over 100 projects.
So let's put it this way:

  • say you have 500 projects
  • before: the frontend would ask for the projects and would get all of them: 1 webserver API call -> getting 500 projects out of the DB, then calling 500 times the director-v2, which in turns calls the DB 500 times +- Celery and whatever else, making the webserver and the frontend wait for all this, then send back a big load of data to the frontend, which would then sort them and display them although the user only sees about 20 of them.
  • after: the frontend asks for 20 projects -> this goes the same way through the database but is limited to the 20 projects asked.
  • after: if the frontend needs more projects, then it will ask for them, if not... then that is 480 projects not asked for.
    So I really do not see where the backend load gets higher??

I think that by leveraging the SQL LIMIT command this would be much more efficient. This will properly implement infinte scrolling. You never have any idea what the entire dataset length is, except when you hit the last page and you no longer have any "next page" links.

which is I think precisely what is implemented... see test_projects_01.py and let me know what I am missing here... even the test does call next link...

OK (minor thing: here you already know and expect the size of the dataset, but it's fine for now)

Yes I always know the size of the dataset (e.g. the number of projects a user can see). But why would I return all of it if the frontend does not need it?

Copy link
Contributor

@GitHK GitHK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I'm fine with it.

@odeimaiz odeimaiz mentioned this pull request Apr 22, 2021
Copy link
Member

@odeimaiz odeimaiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works like a charm

@sanderegg sanderegg force-pushed the project_pagination branch from a97c530 to 618e5bb Compare April 22, 2021 12:15
@sanderegg sanderegg merged commit 4714c68 into ITISFoundation:master Apr 23, 2021
@sanderegg sanderegg deleted the project_pagination branch April 23, 2021 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:webserver issue related to the webserver service
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants