Feature blocky benchmark #541

wilko77 · 2020-04-16T04:02:16Z

This extends the benchmark script to be able to run experiments which use blocking.

An experiment definition for blocking looks like this:

{
    "sizes": ["100K", "100K"],
    "use_blocking": true,
    "threshold": 0.80
  }

The corresponding 'clknblocks' files are uploaded to S3.
For now I haven't changed the default-experiements.json file, as the blocked experiments take a very long time and will most likely trigger a timeout.
Once we addressed that issue in the entity service, we can replace default-experiements.json with default-experiements-wawo-blocking.json.

hardbyte

The benchmarking code looks pretty good to me. I'd like to see some indication of the number and size of of blocks used in each experiment. Ideally we could configure the experiments to run with different sizes but I realize that would be tricky.

Please update docs/benchmarking.rst before merging, in particular it would be good to see more about how the blocks were created.

A feature request - we could show some form of progress in the benchmark containers output. If we printed the result token the user could attach a rest_client to watch progress if they really wanted.

The performance it reveals is another story 🤢

Running on my desktop I see uploading 100k clknblock takes ~45s versus ~6s for binary encodings!
During upload I see I log the size of each block (opps) and that most(?) block has just 1 element.
For the 100k x 100k experiment it creates 112551 chunks. Creating the chunks appears to take almost 3 minutes. I scaled up to using 10 workers to give it a chance of finishing. On my machine one chunk is taking as much as 50ms - although I saw some ~10ms. My cpu cores are all <20% active during this process :-/

hardbyte · 2020-04-16T05:05:11Z

benchmarking/benchmark.py

-    and `clk_{user}_{size_data}.json` where $user is a letter starting from `a` indexing the data owner, and `size_data`
-    is a integer representing the number of data rows in the dataset (e.g. 10000). Note that the csv usually has a header.
+    the 3 party linkage), and then a number a file following the format `PII_{user}_{size_data}.csv`,
+    `clk_{user}_{size_data}_v2.bin`, `clk_{user}_{size_data}.json` and `clknblocks_{user}_{size_data}.json` where $user


Suggested change

`clk_{user}_{size_data}_v2.bin`, `clk_{user}_{size_data}.json` and `clknblocks_{user}_{size_data}.json` where $user

`clk_{user}_{size_data}_v2.bin`, `clk_{user}_{size_data}.json` and `clknblocks_{user}_{size_data}.json` where `user`

hardbyte · 2020-04-16T05:06:02Z

benchmarking/benchmark.py

-    is a integer representing the number of data rows in the dataset (e.g. 10000). Note that the csv usually has a header.
+    the 3 party linkage), and then a number a file following the format `PII_{user}_{size_data}.csv`,
+    `clk_{user}_{size_data}_v2.bin`, `clk_{user}_{size_data}.json` and `clknblocks_{user}_{size_data}.json` where $user
+    is a letter starting from `a` indexing the data owner, and `size_data` is a integer representing the number of data


Suggested change

is a letter starting from `a` indexing the data owner, and `size_data` is a integer representing the number of data

is a letter starting from `a` indexing the data providers, and `size_data` is an integer representing the number of

hardbyte · 2020-04-16T20:52:06Z

I left it running last night with a 12 hour timeout, 6 workers and worker settings:

      - CELERYD_MAX_TASKS_PER_CHILD=2048
      - CELERYD_CONCURRENCY=4
      - CELERY_DB_MIN_CONNECTIONS=1
      - CELERY_DB_MAX_CONNECTIONS=8

Benchmark Logs

2020/04/16 05:56:53 Waiting for: tcp://db:5432
2020/04/16 05:56:53 Waiting for: tcp://nginx:8851/api/v1/status
2020/04/16 05:56:53 Connected to tcp://db:5432
2020/04/16 05:56:53 Connected to tcp://nginx:8851
INFO:loaded experiments: [{'sizes': ['100K', '100K'], 'threshold': 0.95, 'repetition': 1}, {'sizes': ['100K', '100K'], 'threshold': 0.8, 'repetition': 1}, {'sizes': ['10K', '10K'], 'threshold': 0.95, 'repetition': 1}, {'sizes': ['100K', '100K'], 'use_blocking': True, 'threshold': 0.95, 'repetition': 1}, {'sizes': ['100K', '100K'], 'use_blocking': True, 'threshold': 0.8, 'repetition': 1}, {'sizes': ['10K', '10K'], 'use_blocking': True, 'threshold': 0.95, 'repetition': 1}]
INFO:{'project_count': 1, 'rate': 1, 'status': 'ok'}
INFO:Downloading synthetic datasets from S3
INFO:Downloads complete
INFO:running experiment: {'sizes': ['100K', '100K'], 'threshold': 0.95, 'repetition': 1, 'rep': 1}
INFO:Starting time: Thu Apr 16 05:56:54 2020
INFO:Upload status: 201
INFO:uploading clks for a took 5.747
INFO:Upload status: 201
INFO:uploading clks for b took 5.727
INFO:waiting for run bf9bd1ef3e3b006d5808e8b5c0ad8f418c2b2aeb36598ff2 from the project 02be6854a0cb0731e06653fba86a063b348551a9ee6c97ad to finish
INFO:experiment successful. Evaluating results now...
INFO:cleaning up...
INFO:Ending time: Thu Apr 16 05:58:12 2020
INFO:running experiment: {'sizes': ['100K', '100K'], 'threshold': 0.8, 'repetition': 1, 'rep': 1}
INFO:Starting time: Thu Apr 16 05:58:12 2020
INFO:Upload status: 201
INFO:uploading clks for a took 5.523
INFO:Upload status: 201
INFO:uploading clks for b took 5.516
INFO:waiting for run d1a4cbbeed696d1d44974cd334ac4c6f3c8251856c3cc8e3 from the project 27b7283a5f7a1b6da4f86d8e5d208f74142ad7ffbdee7d32 to finish
INFO:experiment successful. Evaluating results now...
INFO:cleaning up...
INFO:Ending time: Thu Apr 16 05:59:46 2020
INFO:running experiment: {'sizes': ['10K', '10K'], 'threshold': 0.95, 'repetition': 1, 'rep': 1}
INFO:Starting time: Thu Apr 16 05:59:46 2020
INFO:Upload status: 201
INFO:uploading clks for a took 0.583
INFO:Upload status: 201
INFO:uploading clks for b took 0.587
INFO:waiting for run 4ed5b71259202072a1118dafc0bd02086e39600ffeec01e7 from the project 84b73cd77e203b44ee94ba829becea58048b7d974b83b478 to finish
INFO:experiment successful. Evaluating results now...
INFO:cleaning up...
INFO:Ending time: Thu Apr 16 05:59:52 2020
INFO:running experiment: {'sizes': ['100K', '100K'], 'use_blocking': True, 'threshold': 0.95, 'repetition': 1, 'rep': 1}
INFO:Starting time: Thu Apr 16 05:59:52 2020
INFO:upload result: {'message': 'Updated', 'receipt_token': 'e92adb33e186e59e694c82a893314925e493c9578063e02e'}
INFO:uploading clknblocks for a took 40.494
INFO:upload result: {'message': 'Updated', 'receipt_token': '7b57c43d49c2c722825398473525268b4c9a33ada19ad796'}
INFO:uploading clknblocks for b took 48.012
INFO:waiting for run 851488d30746e028bdaa43aa3406e20d61d929526f3b5888 from the project 324bc1f37c008814703560af8d5f4e8075b33e1edf4cd119 to finish
WARNING:experiment '{'sizes': ['100K', '100K'], 'use_blocking': True, 'threshold': 0.95, 'repetition': 1, 'rep': 1}' failed: Traceback (most recent call last):
  File "benchmark.py", line 384, in run_single_experiment
    raise RuntimeError('run did not finish!\n{}'.format(status))
RuntimeError: run did not finish!
{'current_stage': {'description': 'compute similarity scores', 'number': 2, 'progress': {'absolute': 3384967, 'description': 'number of already computed similarity scores', 'relative': 0.0003384967}}, 'stages': 3, 'state': 'error', 'time_added': '2020-04-16T06:01:21.132298+00:00'}

INFO:cleaning up...
INFO:Ending time: Thu Apr 16 09:25:54 2020
INFO:running experiment: {'sizes': ['100K', '100K'], 'use_blocking': True, 'threshold': 0.8, 'repetition': 1, 'rep': 1}
INFO:Starting time: Thu Apr 16 09:25:54 2020
INFO:upload result: {'message': 'Updated', 'receipt_token': 'e5c8a7d2029f0bac6f503b56d208c46f1623a5fb85fba549'}
INFO:uploading clknblocks for a took 39.736
INFO:upload result: {'message': 'Updated', 'receipt_token': 'dc89c764465cceac4df6cedcf41a9633011c05e1d108b9ef'}
INFO:uploading clknblocks for b took 47.232
INFO:waiting for run 0123fccd1f574adbf9f219a641f811da701a70404ba747cd from the project 25bacd878d7a27d556a9a225002f5a789af944c3c026bd42 to finish
INFO:experiment successful. Evaluating results now...
INFO:cleaning up...
INFO:Ending time: Thu Apr 16 12:25:02 2020
INFO:running experiment: {'sizes': ['10K', '10K'], 'use_blocking': True, 'threshold': 0.95, 'repetition': 1, 'rep': 1}
INFO:Starting time: Thu Apr 16 12:25:02 2020
INFO:upload result: {'message': 'Updated', 'receipt_token': 'f649a2189896fbf8d85e995e9f7ff494c116cc3409b61e66'}
INFO:uploading clknblocks for a took 7.232
INFO:upload result: {'message': 'Updated', 'receipt_token': '918c8fcb0b408ffa1bfb830c6a8cda211aeebf10d27468e3'}
INFO:uploading clknblocks for b took 7.729
INFO:waiting for run e5f0bd97d09349d23aaa1a0fe3611cfe279f482f6e23a2f0 from the project d39e4b8fb2642e829200ab44a02010cff6b397a07ff5262e to finish
INFO:experiment successful. Evaluating results now...
INFO:cleaning up...
INFO:Ending time: Thu Apr 16 12:54:38 2020
{'experiments': [{'experiment': {'rep': 1,
                                 'repetition': 1,
                                 'sizes': ['100K', '100K'],
                                 'threshold': 0.95},
                  'groups_results': {'accuracy': 0.43175537666706715,
                                     'false_positives': 0,
                                     'negatives': 28377,
                                     'positives': 21561},
                  'project_id': '02be6854a0cb0731e06653fba86a063b348551a9ee6c97ad',
                  'run_id': 'bf9bd1ef3e3b006d5808e8b5c0ad8f418c2b2aeb36598ff2',
                  'sizes': {'size_a': 100000, 'size_b': 100000},
                  'status': 'completed',
                  'threshold': 0.95,
                  'timings': {'added:': '2020-04-16T05:57:05.559457+00:00',
                              'completed': '2020-04-16T05:58:08.652419+00:00',
                              'runtime': 63.053858,
                              'started': '2020-04-16T05:57:05.598561+00:00'}},
                 {'experiment': {'rep': 1,
                                 'repetition': 1,
                                 'sizes': ['100K', '100K'],
                                 'threshold': 0.8},
                  'groups_results': {'accuracy': 0.9962553566422364,
                                     'false_positives': 7289,
                                     'negatives': 187,
                                     'positives': 49751},
                  'project_id': '27b7283a5f7a1b6da4f86d8e5d208f74142ad7ffbdee7d32',
                  'run_id': 'd1a4cbbeed696d1d44974cd334ac4c6f3c8251856c3cc8e3',
                  'sizes': {'size_a': 100000, 'size_b': 100000},
                  'status': 'completed',
                  'threshold': 0.8,
                  'timings': {'added:': '2020-04-16T05:58:23.275891+00:00',
                              'completed': '2020-04-16T05:59:43.332925+00:00',
                              'runtime': 80.04071,
                              'started': '2020-04-16T05:58:23.292215+00:00'}},
                 {'experiment': {'rep': 1,
                                 'repetition': 1,
                                 'sizes': ['10K', '10K'],
                                 'threshold': 0.95},
                  'groups_results': {'accuracy': 0.44051638530287984,
                                     'false_positives': 0,
                                     'negatives': 2817,
                                     'positives': 2218},
                  'project_id': '84b73cd77e203b44ee94ba829becea58048b7d974b83b478',
                  'run_id': '4ed5b71259202072a1118dafc0bd02086e39600ffeec01e7',
                  'sizes': {'size_a': 10000, 'size_b': 10000},
                  'status': 'completed',
                  'threshold': 0.95,
                  'timings': {'added:': '2020-04-16T05:59:47.435166+00:00',
                              'completed': '2020-04-16T05:59:48.897097+00:00',
                              'runtime': 1.42179,
                              'started': '2020-04-16T05:59:47.475307+00:00'}},
                 {'description': 'Traceback (most recent call last):\n'
                                 '  File "benchmark.py", line 384, in '
                                 'run_single_experiment\n'
                                 "    raise RuntimeError('run did not "
                                 "finish!\\n{}'.format(status))\n"
                                 'RuntimeError: run did not finish!\n'
                                 "{'current_stage': {'description': 'compute "
                                 "similarity scores', 'number': 2, 'progress': "
                                 "{'absolute': 3384967, 'description': 'number "
                                 "of already computed similarity scores', "
                                 "'relative': 0.0003384967}}, 'stages': 3, "
                                 "'state': 'error', 'time_added': "
                                 "'2020-04-16T06:01:21.132298+00:00'}\n",
                  'name': {'rep': 1,
                           'repetition': 1,
                           'sizes': ['100K', '100K'],
                           'threshold': 0.95,
                           'use_blocking': True},
                  'project_id': '324bc1f37c008814703560af8d5f4e8075b33e1edf4cd119',
                  'run_id': '851488d30746e028bdaa43aa3406e20d61d929526f3b5888',
                  'status': 'ERROR'},
                 {'experiment': {'rep': 1,
                                 'repetition': 1,
                                 'sizes': ['100K', '100K'],
                                 'threshold': 0.8,
                                 'use_blocking': True},
                  'groups_results': {'accuracy': 0.9740878689575073,
                                     'false_positives': 2507,
                                     'negatives': 1294,
                                     'positives': 48644},
                  'project_id': '25bacd878d7a27d556a9a225002f5a789af944c3c026bd42',
                  'run_id': '0123fccd1f574adbf9f219a641f811da701a70404ba747cd',
                  'sizes': {'size_a': 100000, 'size_b': 100000},
                  'status': 'completed',
                  'threshold': 0.8,
                  'timings': {'added:': '2020-04-16T09:27:21.801546+00:00',
                              'completed': '2020-04-16T12:24:57.448014+00:00',
                              'runtime': 10632.080275,
                              'started': '2020-04-16T09:27:45.367739+00:00'}},
                 {'experiment': {'rep': 1,
                                 'repetition': 1,
                                 'sizes': ['10K', '10K'],
                                 'threshold': 0.95,
                                 'use_blocking': True},
                  'groups_results': {'accuracy': 0.44051638530287984,
                                     'false_positives': 0,
                                     'negatives': 2817,
                                     'positives': 2218},
                  'project_id': 'd39e4b8fb2642e829200ab44a02010cff6b397a07ff5262e',
                  'run_id': 'e5f0bd97d09349d23aaa1a0fe3611cfe279f482f6e23a2f0',
                  'sizes': {'size_a': 10000, 'size_b': 10000},
                  'status': 'completed',
                  'threshold': 0.95,
                  'timings': {'added:': '2020-04-16T12:25:17.611094+00:00',
                              'completed': '2020-04-16T12:54:33.483004+00:00',
                              'runtime': 1753.61083,
                              'started': '2020-04-16T12:25:19.872174+00:00'}}],
 'server': 'http://nginx:8851',
 'version': {'anonlink': '0.12.5',
             'entityservice': 'v1.13.0-beta2',
             'python': '3.8.2'}}
2020/04/16 12:54:38 Command finished successfully.

The gist is one 100k x 100k run failed with a timeout, and the other took 10632 seconds.
See how the progress is incorrect: 'absolute': 3384967, 'relative': 0.0003384967, I've opened an issue for that #542

wilko77 · 2020-04-17T07:31:26Z

Re: binary encodings. we should definitely look into allowing to upload binary clks and block info in separate files for big jobs.
Re: performance. Poo.

wilko added 3 commits April 16, 2020 13:21

update anonlink-client dependency

175beee

default experiments with and without blocking

5c5cd02

extended benchmark to support experiments which use blocking

6bc2ccf

wilko77 requested a review from hardbyte April 16, 2020 04:02

hardbyte requested changes Apr 16, 2020

View reviewed changes

wilko added 2 commits April 16, 2020 17:20

updating benchmarking docs

109e2ca

dusting the docstring

527f707

hardbyte approved these changes Apr 16, 2020

View reviewed changes

wilko77 merged commit 012d9a0 into develop Apr 17, 2020

wilko77 deleted the feature-blocky-benchmark branch April 17, 2020 07:31

This was referenced Apr 30, 2020

Release v1.13.0 beta.2 #553

Merged

Merge develop into master for v1.13.0-beta2 #554

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature blocky benchmark #541

Feature blocky benchmark #541

wilko77 commented Apr 16, 2020

hardbyte left a comment

hardbyte Apr 16, 2020

hardbyte Apr 16, 2020

hardbyte commented Apr 16, 2020

wilko77 commented Apr 17, 2020

	`clk_{user}_{size_data}_v2.bin`, `clk_{user}_{size_data}.json` and `clknblocks_{user}_{size_data}.json` where $user
	`clk_{user}_{size_data}_v2.bin`, `clk_{user}_{size_data}.json` and `clknblocks_{user}_{size_data}.json` where `user`

	is a letter starting from `a` indexing the data owner, and `size_data` is a integer representing the number of data
	is a letter starting from `a` indexing the data providers, and `size_data` is an integer representing the number of

Feature blocky benchmark #541

Feature blocky benchmark #541

Conversation

wilko77 commented Apr 16, 2020

hardbyte left a comment

Choose a reason for hiding this comment

hardbyte Apr 16, 2020

Choose a reason for hiding this comment

hardbyte Apr 16, 2020

Choose a reason for hiding this comment

hardbyte commented Apr 16, 2020

wilko77 commented Apr 17, 2020