[Dashboard] Performance Review #14615

pabletecodes · 2019-01-16T10:43:26Z

We have to make sure the New Dashboard performs 👌 in accounts with loads of maps and datasets. @andy-esch recently explained how the performance of the actual Dashboard is pretty bad in his account (~1000 datasets).

The goal of this issue is to assess what the performance of the New Dashboard is in accounts like @andy-esch 's and do a little investigation of what's causing the UI to be slow, so that we can decide how to fix some or all of them in the near future.

A more performant Dashboard will be add a lot of value to heavy CARTO users 💪

cc: @javitonino @ethervoid

javitonino · 2019-01-17T17:14:25Z

Randomly came across this which should help: #14621

javitonino · 2019-01-18T13:14:08Z

Some more info that might be relevant: #13953

gonzaloriestra · 2019-01-22T14:40:40Z

Another place where with_presenter_cache could help:

cartodb/app/controllers/carto/api/visualization_presenter.rb

Line 61 in 32d30ba

    
           VisualizationPresenter.new(dependent_visualization, @current_viewer, @context).to_summarized_poro

pabletecodes · 2019-01-24T10:52:09Z

@jesusbotella sent me this issue about the (poor) performance of the datasets view:
https://groups.google.com/forum/#!msg/cartodb/K7dG4p0Tpyw/NP5ncVshCQAJ
Similar to what @andy-esch was expressing. I see @javitonino already replied. If we can fix this in the New Dashboard, it will be a super-win for our users 💪

pabletecodes · 2019-01-25T14:52:05Z

@ethervoid do you have any insights regarding this so far? Thanks!

ethervoid · 2019-01-25T15:00:04Z

@alonsogarciapablo I'm reading all the issues, understanding the measures taken an so on. I finished yesterday the Likes cleaning so I'm on it :)

pabletecodes · 2019-01-25T15:56:18Z

Perfect! Thanks for the update! 💪

ethervoid · 2019-01-29T15:45:09Z

Update

I've been uploading and trying to reproduce the problem in my local environment but is not the same and I don't have the same latency problems that in production. Why?

The metadata and the user database are in the same machine
We don't use pgbouncer
We don't have hundreds/thousands of users in the same database

Then I tested it in staging cloud with similar results. It could be due to similar factors than the local environment.

Next step is to provision a new RUI, redirect @andy-esch user to that rui and start doing profiling and so on.

For example doing this request: https://team.carto.com/u/eschbacher/api/v1/viz?tag_name=&q=&page=1&type=derived&exclude_shared=false&per_page=1&locked=true&shared=no&only_liked=false&api_key=apikey the avg response time is 4 seconds so is a good point to start measuring

pabletecodes · 2019-01-29T16:11:30Z

Thanks for the update @ethervoid! Keep 🕵️ and you will find something! 💪

ethervoid · 2019-01-29T19:25:10Z

Update

I've been doing benchmarking and here are my insights by now.

TL; DR; Looks like either the connection to the database or the size of the database is damaging the performance. We have to take into account the the dbd-team pg data disk is 2.7TB

I've added benchmarking in every visualization object to check where the time goes:

"timeviz": [
                {
                    "label": "to_poro",
                    "real": 0.0069547109997074585,
                    "cstime": 0,
                    "cutime": 0,
                    "stime": 0,
                    "utime": 0,
                    "total": 0
                },
                {
                    "label": "load_related",
                    "real": 0.000002701999619603157,
                    "cstime": 0,
                    "cutime": 0,
                    "stime": 0,
                    "utime": 0,
                    "total": 0
                },
                {
                    "label": "load_dependent",
                    "real": 0.0000027089990908280015,
                    "cstime": 0,
                    "cutime": 0,
                    "stime": 0,
                    "utime": 0,
                    "total": 0
                },
                {
                    "label": "liked",
                    "real": 0.001709437001409242,
                    "cstime": 0,
                    "cutime": 0,
                    "stime": 0,
                    "utime": 0.009999999999999787,
                    "total": 0.009999999999999787
                },
                {
                    "label": "permission",
                    "real": 3.5094704250004725,
                    "cstime": 0,
                    "cutime": 0,
                    "stime": 0.15999999999999992,
                    "utime": 2.4799999999999995,
                    "total": 2.6399999999999997
                },
                {
                    "label": "stats",
                    "real": 0.003290009999545873,
                    "cstime": 0,
                    "cutime": 0,
                    "stime": 0,
                    "utime": 0,
                    "total": 0
                },
                {
                    "label": "user_table",
                    "real": 0.00007167700096033514,
                    "cstime": 0,
                    "cutime": 0,
                    "stime": 0,
                    "utime": 0,
                    "total": 0
                }
            ]

As you can see a huge amount, 3.5s, goes to the permission part. Why? because permissions is the first time the user presenter is loaded, without cache, aaaaaand there we have the db_size part

So it looks like we still have the same problem. Measuring the time spent for @andy-esch 's user to retrieve the db size you can see the amount of time spent here (2,96s):

irb(main):002:0> Benchmark.measure { Carto::User.find_by_username('eschbacher').db_size_in_bytes }
[pool] Creating a new connection for 10.60.255.215:cartodb_user_6b304b59-2c86-4d7b-a8a2-dc98a5a35d48_db:6432:postgres:ar (1)
[pool] Found a connection for 10.60.255.215:cartodb_user_6b304b59-2c86-4d7b-a8a2-dc98a5a35d48_db:6432:postgres:ar (1)
=> #<Benchmark::Tms:0x0000000c8f7378 @label="", @real=3.636054564267397, @cstime=0.0, @cutime=0.0, @stime=0.15000000000000002, @utime=2.8100000000000005, @total=2.9600000000000004>

As an interesting part, @javitonino , was taking a look too and he notice that using the console the connection to the db takes ~3s:

Benchmark.measure { db = u.in_database(as: :superuser) } -> 3s
Benchmark.measure { u.db_size_in_bytes } -> 0.25s

Also @javitonino did other test: ActiveRecord::ConnectionAdapters::PostgreSQL::OID::TypeMapInitializer#run <4.620301> this command list all the different types in the DB and took 4.6s

Tomorrow I'll keep investigating :)

- See #14615 - See rails/rails#19578

ethervoid · 2019-01-31T17:14:07Z

Update

Well, we finally have a winner here and a patch for it

ActiveRecord in Rails 4.0 is loading all the types from the database when a connection is made and in our team database is taking ~2.5 seconds because it loads 145k types.

I did a test using two different user models: one from Sequel and the other from AR to do the same operation:

  0.000000   0.000000   0.000000 (  0.235332) --> Sequel
  1.920000   0.150000   2.070000 (  2.689540) --> AR

You can see the problematic method is this one:

ActiveRecord::ConnectionAdapters::PostgreSQL::OID::TypeMapInitializer#run <4.615232>
                                                ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#execute_and_clear <5.136224>
                                              ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#load_additional_types <5.136321>
                                            ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#initialize_type_map <5.136989>
                                          ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#initialize <5.142473>
                                        Class#new <5.142497>
                                      ActiveRecord::ConnectionHandling#postgresql_connection <5.142617>
                                    ActiveRecord::ConnectionAdapters::ConnectionPool#new_connection <5.142634>
                                  ActiveRecord::ConnectionAdapters::ConnectionPool#checkout_new_connection <5.142647>
                                ActiveRecord::ConnectionAdapters::ConnectionPool#acquire_connection <5.142675>
                              MonitorMixin#mon_synchronize <5.143279>
                            ActiveRecord::ConnectionAdapters::ConnectionPool#checkout <5.143303>
                          MonitorMixin#mon_synchronize <5.143356>
                        ActiveRecord::ConnectionAdapters::ConnectionPool#connection <5.143382>
                      Carto::Db::Connection.get_database <5.144334>
                    CartoDB::ConnectionPool#create_new_connection <5.144612>
                  CartoDB::ConnectionPool#fetch <5.144657>
                Carto::Db::Connection.connect <5.144780>
              Carto::UserService#in_database <5.145184>
            Carto::UserService#cartodb_extension_version <5.146027>
          Carto::UserService#cartodb_extension_version_pre_mu? <5.146076>
        Carto::UserService#db_size_in_bytes <5.672939>
      Kernel#eval <5.833719>
    IRB::WorkSpace#evaluate <5.833742>
  IRB::Context#evaluate <5.833836>
IRB::Irb#signal_status <5.833899>

As you can see, the AR model is taking 2.6 seconds to do the same operation than Sequel. Why? Basically, AR when open a connection with a database get all the types from pg_type which in our database is 145k different types.

We can divide the problem into data gathering and processing:

To execute the query and retrieve all the data takes 1.3seconds

  1.240000   0.090000   1.330000 (  1.331408)

Process the data takes 0.5s

  0.530000   0.010000   0.540000 (  0.537235)

We can improve the data processing a bit but we're not going to deal with the real problem: the huge amount of types.

But why are we having that huge amount of types? Taking a look at the table we can see most of the types are related to tables recordset and arrays of those recordset. So we have decided to filter those in this query and in case our app needs it there is no problem because AR would ask for it doing a query

But...are you saying that every time AR asks for a type that we don't have in our cache it's going to do a query? Yes, but those types are not a common thing to ask for. Basically our app is dealing with the metadata database which has a few custom types (207) so we're, probably, going to notice that overhead.

Is not a perfect solution but is going to save us about ~2 seconds when dealing with huge user databases operations

ethervoid · 2019-01-31T17:30:42Z

Applying this patch now we're doing it in 0.6s

ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#execute_and_clear(sql="              SELECT t.oid, t.typname, t.typelem, t.typdelim, t.typinput, r.rngsubtype, t.typtype, t.typbasetype\n              FROM pg_type as t\n              LEFT JOIN pg_range as r ON oid = rngtypid\n              LEFT JOIN pg_type as tt ON (tt.typtype = 'c' AND tt.typarray = t.oid AND tt.typinput = 'record_in(cstring,oid,integer)'::regprocedure)\n              WHERE\n                tt.oid is null\n                AND (t.typname IN ('int2', 'int4', 'int8', 'oid', 'float4', 'float8', 'text', 'varchar', 'char', 'name', 'bpchar', 'bool', 'bit', 'varbit', 'timestamptz', 'date', 'time', 'money', 'bytea', 'point', 'hstore', 'json', 'jsonb', 'cidr', 'inet', 'uuid', 'xml', 'tsvector', 'macaddr', 'citext', 'ltree', 'interval', 'path', 'line', 'polygon', 'circle', 'lseg', 'box', 'timestamp', 'numeric')\n                OR t.typtype IN ('r', 'e', 'd')\n                OR t.typinput = 'array_in(cstring,oid,integer)'::regprocedure\n                OR t.typelem != 0)\n", name="SCHEMA", binds=[]) <0.598322>

I've made some tests and when a new type is need it ask for it without problem.

- See #14615 - See rails/rails#19578

ethervoid · 2019-02-04T10:47:40Z

Update

We've re-routed team users to our test platform in production and here are the results using the patch:

Without the patch activated

With the patch activated

As you can see we have improved the time for the dashboard:

/me endpoint have improved from 4.3s to 1.4s
/viz endpoints have improved from: about ~5s to ~3s

In summary, we have passed from ~26s to load the dashboard to ~12.8.

I'd let the patch active and the team users routing through it in order to test for undesired side-effects. If we don't have any issue In a couple of days (2-3), I'll put that patch into production.

Hope this improvement is good enough :)

ethervoid · 2019-02-04T11:56:15Z

Now I'm going to start checking the order by size part. Which maybe could be improved in order to avoid making N queries for size data. Stay tunned

pabletecodes · 2019-02-04T12:04:55Z

Great news @ethervoid! It's a great improvement!

ethervoid · 2019-02-04T13:19:22Z

@alonsogarciapablo asked about the performance with the new dashboard. I've changed @andy-esch user to the new dashboard and here you can see the results:

Without the new patch

With the new patch

More or less the same. From ~12s to ~6s

ethervoid · 2019-02-04T13:26:51Z

Update

About the size sort we have a bit of a problem there:

In accounts with a huge number of tables we get a 504 (timeout) after 180s

ethervoid · 2019-02-04T14:45:38Z

The ordering issue is going to continue here

* Patch for load times with pg_type loading in AR - See #14615 - See rails/rails#19578

ethervoid · 2019-02-06T15:18:05Z

Deployed this part. I'll continue here

pabletecodes · 2019-02-06T15:36:51Z

Great @ethervoid 👏👏👏

pabletecodes added the Backend label Jan 16, 2019

javitonino mentioned this issue Jan 17, 2019

Use presenter cache for related tables, saves a query to retrieve the user db size #14621

Merged

javitonino mentioned this issue Jan 18, 2019

/api/v3/me endpoint is painfully slow #13953

Closed

ethervoid self-assigned this Jan 22, 2019

ethervoid added a commit that referenced this issue Jan 31, 2019

Patch for load times with pg_type loading in AR

b108c1f

- See #14615 - See rails/rails#19578

ethervoid mentioned this issue Jan 31, 2019

Patch for load times with pg_type loading in AR #14642

Merged

ethervoid added a commit that referenced this issue Feb 1, 2019

Patch for load times with pg_type loading in AR

f51d95b

- See #14615 - See rails/rails#19578

ethervoid added a commit that referenced this issue Feb 6, 2019

Patch for load times with pg_type loading in AR (#14642)

c044c43

* Patch for load times with pg_type loading in AR - See #14615 - See rails/rails#19578

ethervoid closed this as completed Feb 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dashboard] Performance Review #14615

[Dashboard] Performance Review #14615

pabletecodes commented Jan 16, 2019

javitonino commented Jan 17, 2019

javitonino commented Jan 18, 2019

gonzaloriestra commented Jan 22, 2019

pabletecodes commented Jan 24, 2019

pabletecodes commented Jan 25, 2019

ethervoid commented Jan 25, 2019

pabletecodes commented Jan 25, 2019

ethervoid commented Jan 29, 2019 •

edited

Loading

pabletecodes commented Jan 29, 2019

ethervoid commented Jan 29, 2019 •

edited

Loading

ethervoid commented Jan 31, 2019

ethervoid commented Jan 31, 2019

ethervoid commented Feb 4, 2019 •

edited

Loading

ethervoid commented Feb 4, 2019

pabletecodes commented Feb 4, 2019

ethervoid commented Feb 4, 2019

ethervoid commented Feb 4, 2019

ethervoid commented Feb 4, 2019

ethervoid commented Feb 6, 2019

pabletecodes commented Feb 6, 2019

[Dashboard] Performance Review #14615

[Dashboard] Performance Review #14615

Comments

pabletecodes commented Jan 16, 2019

javitonino commented Jan 17, 2019

javitonino commented Jan 18, 2019

gonzaloriestra commented Jan 22, 2019

pabletecodes commented Jan 24, 2019

pabletecodes commented Jan 25, 2019

ethervoid commented Jan 25, 2019

pabletecodes commented Jan 25, 2019

ethervoid commented Jan 29, 2019 • edited Loading

Update

pabletecodes commented Jan 29, 2019

ethervoid commented Jan 29, 2019 • edited Loading

Update

ethervoid commented Jan 31, 2019

Update

ethervoid commented Jan 31, 2019

ethervoid commented Feb 4, 2019 • edited Loading

Update

Without the patch activated

With the patch activated

ethervoid commented Feb 4, 2019

pabletecodes commented Feb 4, 2019

ethervoid commented Feb 4, 2019

Without the new patch

With the new patch

ethervoid commented Feb 4, 2019

Update

ethervoid commented Feb 4, 2019

ethervoid commented Feb 6, 2019

pabletecodes commented Feb 6, 2019

ethervoid commented Jan 29, 2019 •

edited

Loading

ethervoid commented Jan 29, 2019 •

edited

Loading

ethervoid commented Feb 4, 2019 •

edited

Loading