Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(core-database-postgres): faster count method #2832

Closed
wants to merge 4 commits into from
Closed

Conversation

faustbrian
Copy link
Contributor

@faustbrian faustbrian commented Jul 18, 2019

Use a faster query that performs better against larger datasets like on mainnet while the benefit on devnet is still visible but a lot smaller as the database is 1/3 the size. This is obviously not a permanent solution but a nice little improvement for now.

EXPLAIN ANALYZE SELECT COUNT(id) FROM blocks;

 Finalize Aggregate  (cost=298772.77..298772.78 rows=1 width=8) (actual time=2079.334..2079.334 rows=1 loops=1)
   ->  Gather  (cost=298772.55..298772.76 rows=2 width=8) (actual time=2079.232..2087.596 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=297772.55..297772.56 rows=1 width=8) (actual time=2074.550..2074.550 rows=1 loops=3)
               ->  Parallel Index Only Scan using blocks_pkey on blocks  (cost=0.56..288371.32 rows=3760492 width=24) (actual time=0.068..1678.611 rows=3009111 loops=3)
                     Heap Fetches: 311
 Planning time: 0.149 ms
 Execution time: 2087.720 ms
(9 rows)

EXPLAIN ANALYZE SELECT COUNT(*) FROM (SELECT id FROM blocks) AS count;

 Finalize Aggregate  (cost=192153.71..192153.72 rows=1 width=8) (actual time=1095.040..1095.041 rows=1 loops=1)
   ->  Gather  (cost=192153.49..192153.70 rows=2 width=8) (actual time=1094.962..1101.561 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=191153.49..191153.50 rows=1 width=8) (actual time=1086.913..1086.914 rows=1 loops=3)
               ->  Parallel Index Only Scan using blocks_timestamp_key on blocks  (cost=0.43..181752.26 rows=3760492 width=0) (actual time=0.085..719.661 rows=3009114 loops=3)
                     Heap Fetches: 366
 Planning time: 0.135 ms
 Execution time: 1101.631 ms
(9 rows)

Tested on a 4 CPU / 16 GB RAM / 160GB NVME server so results will vary between servers.

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Refactor
  • Performance
  • Tests
  • Build
  • Documentation
  • Code style update
  • Continuous Integration
  • Other, please describe:

Does this PR introduce a breaking change?

  • Yes
  • No

Does this PR release a new version?

  • Yes
  • No

If yes, please describe the impact and migration path for existing applications:

The PR fulfills these requirements:

  • It's submitted to the develop branch, not the master branch
  • All tests are passing
  • New/updated tests are included

If adding a new feature, the PR's description includes:

  • A convincing reason for adding this feature (to avoid wasting your time, it's best to open a suggestion issue first and wait for approval before working on it)

@ghost ghost added Complexity: Low labels Jul 18, 2019
@vasild
Copy link
Contributor

vasild commented Jul 18, 2019

I am afraid that there is no silver bullet for the problem with the slow COUNT(*).

SELECT COUNT(*) FROM (SELECT * FROM table); is not faster than SELECT COUNT(*) FROM table;. The former is either slower, or with a smart enough optimizer, equivalent to the later.

@@ -103,8 +103,8 @@ export abstract class Repository implements Database.IRepository {
return { rows, count: Math.max(count, rows.length), countIsEstimate: true };
}

const countRow = await this.find(selectQueryCount);
const { count } = await this.db.one(`SELECT COUNT(*) FROM (${selectQueryCount.toString()}) AS count;`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are going to do SELECT COUNT(*) FROM (... subquery ...) then you do not need a separate selectQueryCount variable and can use selectQuery for this. But I specifically did not do that when I introduced selectQueryCount because the straight-forward SELECT COUNT(*) FROM table WHERE ... is to be preferred over the convoluted SELECT COUNT(*) FROM (SELECT ... FROM table WHERE ...) AS count; because with the latter you are asking the database to do more stuff and depend on it being smart enough to detect that both queries are the same and actually do the first one under the hood. In other words, the variant with the subquery could also be slower!

@codecov-io
Copy link

Codecov Report

Merging #2832 into master will not change coverage.
The diff coverage is 0%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2832   +/-   ##
=======================================
  Coverage   65.33%   65.33%           
=======================================
  Files         363      363           
  Lines        8193     8193           
  Branches      393      393           
=======================================
  Hits         5353     5353           
  Misses       2806     2806           
  Partials       34       34
Impacted Files Coverage Δ
...database-postgres/src/repositories/transactions.ts 27.08% <0%> (ø) ⬆️
...e-database-postgres/src/repositories/repository.ts 16.12% <0%> (ø) ⬆️
.../core-database-postgres/src/repositories/blocks.ts 35.29% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1f5c1d4...19734cb. Read the comment docs.

@faustbrian faustbrian marked this pull request as ready for review July 25, 2019 01:46
@spkjp
Copy link
Contributor

spkjp commented Jul 26, 2019

Given that the impact varies greatly between nodes, I'll close this PR.

@spkjp spkjp closed this Jul 26, 2019
@ghost ghost removed the Status: In Progress label Jul 26, 2019
@spkjp spkjp deleted the nfsmw branch July 26, 2019 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants