Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark: Use arrays to store json properties #7863

Closed
wants to merge 1 commit into from

Conversation

macobo
Copy link
Contributor

@macobo macobo commented Jan 3, 2022

Changes

https://eng.uber.com/logging/ had an alternative way of storing properties - as parallel arrays.

This PR (not to be merged) benchmarks how this performs.

Data was populated via the following queries on the benchmark server:

ALTER TABLE events ADD COLUMN property_keys Array(String) DEFAULT arrayMap(x -> x.1, JSONExtractKeysAndValues(properties, 'String'));
ALTER TABLE events ADD COLUMN property_values Array(String) DEFAULT arrayMap(x -> x.2, JSONExtractKeysAndValues(properties, 'String'));
ALTER TABLE events UPDATE property_values = property_values, property_keys = property_keys WHERE 1=1;

ALTER TABLE person ADD COLUMN property_keys Array(String) DEFAULT arrayMap(x -> x.1, JSONExtractKeysAndValues(properties, 'String'));
ALTER TABLE person ADD COLUMN property_values Array(String) DEFAULT arrayMap(x -> x.2, JSONExtractKeysAndValues(properties, 'String'));
ALTER TABLE person UPDATE property_values = property_values, property_keys = property_keys WHERE 1=1;

@macobo macobo added the performance Has to do with performance. For PRs, runs the clickhouse query performance suite label Jan 3, 2022
@macobo macobo temporarily deployed to clickhouse-benchmarks January 3, 2022 13:23 Inactive
@macobo macobo marked this pull request as draft January 3, 2022 13:28
@github-actions
Copy link
Contributor

github-actions bot commented Jan 3, 2022

ClickHouse query benchmark results from GitHub Actions

Lower numbers are good, higher numbers are bad. A ratio less than 1
means a speed up and greater than 1 means a slowdown. Green lines
beginning with + are slowdowns (the PR is slower then master or
master is slower than the previous release). Red lines beginning
with - are speedups. Blank means no changes.

Significantly changed benchmark results (PR vs master)

       before           after         ratio
     [3d8bcdfa]       [6bf09777]
-       4331.0±61        3578.0±64     0.83  benchmarks.QuerySuite.track_session_recordings_list_person_property_filter
Click to view full benchmark results
All benchmarks:

     before           after         ratio
   [3d8bcdfa]       [6bf09777]
37481.5±6.6e+03  30301.5±1.6e+03    ~0.81  benchmarks.QuerySuite.track_correlations_by_event_properties
  12499.5±2e+02    14054.5±2e+02     1.12  benchmarks.QuerySuite.track_correlations_by_event_properties_materialized
10469.5±1.2e+02  11274.0±1.2e+02     1.08  benchmarks.QuerySuite.track_correlations_by_events
      5544.0±73        5872.5±90     1.06  benchmarks.QuerySuite.track_correlations_by_properties
      4835.5±41        5240.0±38     1.08  benchmarks.QuerySuite.track_correlations_by_properties_materialized
  764.0±1.6e+02    874.0±1.7e+02     1.14  benchmarks.QuerySuite.track_earliest_timestamp
      3150.0±95   3183.5±1.1e+02     1.01  benchmarks.QuerySuite.track_funnel_normal
28612.5±3.1e+02  30106.5±1.6e+03     1.05  benchmarks.QuerySuite.track_lifecycle
14024.5±1.7e+02       14529.5±72     1.04  benchmarks.QuerySuite.track_retention
  18254.0±2e+02       16888.5±66     0.93  benchmarks.QuerySuite.track_retention_filter_by_person_property
16493.0±1.5e+03  16951.0±1.7e+02     1.03  benchmarks.QuerySuite.track_retention_filter_by_person_property_materialized
54847.5±7.1e+02  53257.0±3.1e+02     0.97  benchmarks.QuerySuite.track_retention_with_person_breakdown
  789.0±2.2e+02    719.5±1.4e+02     0.91  benchmarks.QuerySuite.track_session_recordings_list
 4202.5±1.4e+02   3217.0±1.2e+02    ~0.77  benchmarks.QuerySuite.track_session_recordings_list_event_filter
-       4331.0±61        3578.0±64     0.83  benchmarks.QuerySuite.track_session_recordings_list_person_property_filter
      3937.0±20        3925.0±22     1.00  benchmarks.QuerySuite.track_stickiness
      5586.0±35        4743.5±18     0.85  benchmarks.QuerySuite.track_stickiness_filter_by_person_property
      4731.0±32        4690.5±57     0.99  benchmarks.QuerySuite.track_stickiness_filter_by_person_property_materialized
      4318.0±32        4320.5±45     1.00  benchmarks.QuerySuite.track_trends_dau
      5949.5±81        5263.0±69     0.88  benchmarks.QuerySuite.track_trends_dau_person_property_filter
      5105.0±40        5241.0±27     1.03  benchmarks.QuerySuite.track_trends_dau_person_property_filter_materialized
20359.5±7.6e+02  18810.0±3.2e+02     0.92  benchmarks.QuerySuite.track_trends_event_property_filter
      4208.5±71        4127.0±70     0.98  benchmarks.QuerySuite.track_trends_event_property_filter_materialized
  41310.0±2e+03    32029.0±6e+02    ~0.78  benchmarks.QuerySuite.track_trends_filter_by_action_current_url
 4305.5±1.4e+02   4431.0±1.2e+02     1.03  benchmarks.QuerySuite.track_trends_filter_by_action_current_url_materialized
10089.0±2.9e+02       10858.0±96     1.08  benchmarks.QuerySuite.track_trends_filter_by_action_with_person_filters
      5060.0±18        5278.5±61     1.04  benchmarks.QuerySuite.track_trends_filter_by_action_with_person_filters_materialized
      5737.0±43        5202.5±14     0.91  benchmarks.QuerySuite.track_trends_filter_by_cohort
      4956.0±41        5262.5±16     1.06  benchmarks.QuerySuite.track_trends_filter_by_cohort_materialized
 4445.5±1.1e+02   5954.0±1.4e+03    ~1.34  benchmarks.QuerySuite.track_trends_filter_by_cohort_precalculated
      1979.0±36       2075.5±3.5     1.05  benchmarks.QuerySuite.track_trends_no_filter
 5607.5±1.1e+02   5258.5±1.3e+02     0.94  benchmarks.QuerySuite.track_trends_person_property_filter
      4872.0±77        5246.5±64     1.08  benchmarks.QuerySuite.track_trends_person_property_filter_materialized

@macobo macobo force-pushed the benchmarks-json-as-arrays branch from 6305e46 to 681d547 Compare January 3, 2022 13:58
@macobo macobo temporarily deployed to clickhouse-benchmarks January 3, 2022 13:58 Inactive
@macobo
Copy link
Contributor Author

macobo commented Jan 4, 2022

This was opened for analysis purposes only - see linked issue for details/conclusion!

@macobo macobo closed this Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Has to do with performance. For PRs, runs the clickhouse query performance suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant