Closes #1300 - Improve Performance of DataFrame Display #1334

Ethan-DeBandi99 · 2022-04-27T15:32:28Z

Updates the _get_head_tail() method to function by creating a single server message instead of 1 per column in the DataFrame. This was done by adding _get_head_tail_server() which allows us to benchmark the new code against the old. The old function has been retained for benchmarking.
Adds a DataFrameIndexingMsg.chpl to parse and process the server message for indexing the columns. The message is configured so the the column type, column name (in the df), column object name (server-side) are sent to the server to indexing to the appropriate head/tail.
Benchmarking configured to test _rep_html() which now calls _get_head_tail_server(). The benchmark also checks _get_head_tail_server() and _get_head_tail() directly to allow for easy comparison.

Per the issue description, @joshmarshall1 and I did this without using aggregation because the datasets should be relatively small.

Benchmarking on a single node shows an performance already, but we will need to benchmark this on a multi-node system. Results from single node:

array size = 10,000
number of trials =  5
>>> arkouda dataframe display
numLocales = 1, N = 10,000
  _repr_html_ Average time = 0.0401 sec
  _repr_html_ Average rate = 0.03 GiB/sec
  _get_head_tail_server Average time = 0.0406 sec
  _get_head_tail_server Average rate = 0.03 GiB/sec
  _get_head_tail Average time = 0.0825 sec
  _get_head_tail Average rate = 0.01 GiB/sec

@reuster986 - I will leave it up to you if you would like to request an out of band benchmark.

…ion. Benchmarks configured to time _repr_html_(), and compare the head tail methods.

Ethan-DeBandi99 · 2022-04-27T15:40:50Z

Added @joshmarshall1 as a reviewer even though he helped write this code. There are some elements that I handled that would be good for him to review as well.

joshmarshall1

As a participant in developing this, I won't approve it, but after a second review I think we should go through and add comments throughout the chpl code, since there are almost none, just to make future maintenance easier. I also noticed a few places that TODO tags were left in and a stray #

arkouda/dataframe.py

src/DataFrameIndexingMsg.chpl

reuster986

Looks good overall! Just a couple requested changes in the benchmark, and a suggestion about when/how we switch to the new implementation.

benchmarks/dataframe.py

arkouda/dataframe.py

Ethan-DeBandi99 · 2022-04-28T11:41:21Z

Updated benchmark results on single node with requested updates from @reuster986 implemented.

array size = 10,000
number of trials =  5
>>> arkouda dataframe display
numLocales = 1, N = 10,000
  _get_head_tail_server Average time = 0.0261 sec
  _get_head_tail_server Average rate = 0.05 GiB/sec
  _get_head_tail Average time = 0.0547 sec
  _get_head_tail Average rate = 0.02 GiB/sec

reuster986

Looks great, thanks!

stress-tess

Couple comments, nothing major. The only thing that might need to be updated is "support uint for segarray values" question? But the logic all looks good to me

arkouda/dataframe.py

benchmarks/dataframe.py

src/DataFrameIndexingMsg.chpl

stress-tess

Looks good!

Ethan-DeBandi99 requested review from reuster986, stress-tess and mhmerrill April 27, 2022 15:32

Ethan-DeBandi99 and others added 5 commits April 27, 2022 11:39

Added benchmark to test improvements once server message is added.

fdf24f6

Server processing to index dataframes.

6dff1e8

Clean-up and outlining code to prep for josh to pull.

1f2862c

Pushing segarray updates for Ethans review

eab9c56

Code clean-up and configuration to properly utilize the server execut…

19b6916

…ion. Benchmarks configured to time _repr_html_(), and compare the head tail methods.

Ethan-DeBandi99 force-pushed the 1300_dataframe_display_perf branch from 8d07ef8 to 19b6916 Compare April 27, 2022 15:39

Ethan-DeBandi99 requested a review from joshmarshall1 April 27, 2022 15:39

joshmarshall1 reviewed Apr 27, 2022

View reviewed changes

arkouda/dataframe.py Outdated Show resolved Hide resolved

src/DataFrameIndexingMsg.chpl Outdated Show resolved Hide resolved

src/DataFrameIndexingMsg.chpl Outdated Show resolved Hide resolved

Ethan-DeBandi99 added 2 commits April 27, 2022 12:01

Cleanup

c430802

Fixing tabs to spaces.

fbe5b07

reuster986 requested changes Apr 27, 2022

View reviewed changes

benchmarks/dataframe.py Outdated Show resolved Hide resolved

benchmarks/dataframe.py Outdated Show resolved Hide resolved

benchmarks/dataframe.py Outdated Show resolved Hide resolved

arkouda/dataframe.py Show resolved Hide resolved

Updating benchmark per bills comments.

41f1479

Ethan-DeBandi99 requested a review from reuster986 April 28, 2022 11:41

reuster986 approved these changes Apr 28, 2022

View reviewed changes

stress-tess reviewed Apr 28, 2022

View reviewed changes

Addressing code review comments from Pierce.

1d7a15c

stress-tess approved these changes Apr 28, 2022

View reviewed changes

mhmerrill merged commit b11e22b into Bears-R-Us:master Apr 29, 2022

Ethan-DeBandi99 deleted the 1300_dataframe_display_perf branch May 2, 2022 17:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closes #1300 - Improve Performance of DataFrame Display #1334

Closes #1300 - Improve Performance of DataFrame Display #1334

Ethan-DeBandi99 commented Apr 27, 2022

Ethan-DeBandi99 commented Apr 27, 2022

joshmarshall1 left a comment

reuster986 left a comment

Ethan-DeBandi99 commented Apr 28, 2022

reuster986 left a comment

stress-tess left a comment

stress-tess left a comment

Closes #1300 - Improve Performance of DataFrame Display #1334

Closes #1300 - Improve Performance of DataFrame Display #1334

Conversation

Ethan-DeBandi99 commented Apr 27, 2022

Ethan-DeBandi99 commented Apr 27, 2022

joshmarshall1 left a comment

Choose a reason for hiding this comment

reuster986 left a comment

Choose a reason for hiding this comment

Ethan-DeBandi99 commented Apr 28, 2022

reuster986 left a comment

Choose a reason for hiding this comment

stress-tess left a comment

Choose a reason for hiding this comment

stress-tess left a comment

Choose a reason for hiding this comment