Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance on previewing large tables #168

Closed
trengrj opened this issue Jul 19, 2016 · 5 comments
Closed

Fix performance on previewing large tables #168

trengrj opened this issue Jul 19, 2016 · 5 comments

Comments

@trengrj
Copy link
Contributor

trengrj commented Jul 19, 2016

When a table is viewed the function TableRowsCount is called to get the size of the table for pagination. This unfortunately is a very slow operation on large tables as it calls SELECT COUNT(1) FROM TABLE which is slow in Postgres and will cause the application to hang on tables in the many gigabytes range.

I've got a working commit here which uses the estimated table size (if it is available). I removed the "Page X of Y" at top right and replaced with just "Page X" because sometimes estimated size is not available (i.e. when it is a view).

trengrj@b8cc628

I think it is important to be able to preview any table quickly, but agree the pagination is useful. What are you views on fixing this issue? Maybe some sort of hybrid approach?

@sosedoff
Copy link
Owner

sosedoff commented Sep 3, 2016

Thanks for bringing this up and sorry for the late response. Indeed, COUNT(*) is very slow on large data sets so having a hybrid approach would solve the issue (partially). Ideally, we could first check the estimated rows count and if its below 100k (just an example) rows we use COUNT, otherwise use the estimated rows count for pagination. I would also prefer keeping the pagination where possible instead of removing it altogether, i know this could be a tricky feature, but still.

@felixbuenemann
Copy link

It's also worth noting that OFFSET pagination will get slower the further you paginate, because postgres has to first fetch all rows preceding the offset and throw them away. A faster way is to use keyset pagination, something like WHERE id > (last row id from previous page) LIMIT n. Not sure if that's practical to implement here, because it requires knowledge of the underlying table.

Also see We need tool support for keyset pagination for more info.

@sosedoff
Copy link
Owner

Does anyone have any interest in tackling the problem?

@sosedoff
Copy link
Owner

Coming back to this. So im thinking of having a map (per established connection) that will hold information about tables that are somewhat large (> 100k rows). For table browse requests we will first check if that map contains the table and if it does we will fetch the rows stats using count estimates. On database switch that map will be cleared out.

Any thoughts?

@felixbuenemann
Copy link

The DZone article Faster PostgreSQL Counting has some useful tips on getting good estimate counts on large tables.

Instead of keeping a map of large tables in memory you could just look at the table stats in postgres on the fly, I don't think caching that info is worth the trouble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants