Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: DH-18351: Add CumCountWhere() and RollingCountWhere() features to UpdateBy #373

Open
deephaven-internal opened this issue Jan 21, 2025 · 0 comments

Comments

@deephaven-internal
Copy link

This issue was auto-generated

PR: deephaven/deephaven-core#6566
Author: lbooker42

Original PR Body

Groovy Examples

table = emptyTable(1000).update("key=randomInt(0,10)", "intCol=randomInt(0,1000)")

// zero-key
t_summary = table.updateBy([
    CumCountWhere("running_gt_500", "intCol > 500"),
    RollingCountWhere(50, "windowed_gt_500", "intCol > 500"),
    ])

// bucketed
t_summary = table.updateBy([
    CumCountWhere("running_gt_500", "intCol > 500"),
    RollingCountWhere(50, "windowed_gt_500", "intCol > 500"),
    ], "key")

Python Examples

from deephaven import empty_table
from deephaven.updateby import cum_count_where, rolling_count_where_tick

table = empty_table(1000).update(["key=randomInt(0,10)", "intCol=randomInt(0,1000)"])

# zero-key
t_summary = table.update_by([
    cum_count_where(col="running_gt_500", filters="intCol > 500"),
    rolling_count_where_tick(rev_ticks=50, col="windowed_gt_500", filters="intCol > 500"),
    ])

# bucketed
t_summary_bucketed = table.update_by([
    cum_count_where(col="running_gt_500", filters="intCol > 500"),
    rolling_count_where_tick(rev_ticks=50, col="windowed_gt_500", filters="intCol > 500"),
    ], by="key")

Performance Notes

TL:DR Performance compares very well.

RollingCountWhere() has near identical performance to the comparison benchmarks (can be faster depending on the complexity of the filter. CumCountWhere() also compares well to Ema()but can't catch up to zero-key CumSum(), which is is remarkably fast.

Comparing CumCountWhere to CumSum and Ema:

120000000
avg of 2

ZeroKey
CumSum	137.36250
Ema	449.5528125
CumCountWhereConstant	475.9980005
CumCountWhereMatch	649.9689995
CumCountWhereRange	654.322250
CumCountWhereMultiple	695.4477915
CumCountWhereMultipleOr	704.900583

Bucketed - 250 buckets
CumSum	2979.1730005
Ema	3024.152458
CumCountWhereConstant	2569.7280835
CumCountWhereMatch	3031.6534795
CumCountWhereRange	3030.5433335
CumCountWhereMultiple	3052.597625
CumCountWhereMultipleOr	3059.911729

Bucketed - 640 buckets
CumSum	3827.299833
Ema	3880.2538125
CumCountWhereConstant	3416.4387715
CumCountWhereMatch	3906.691333
CumCountWhereRange	3902.3064375
CumCountWhereMultiple	3967.1584795
CumCountWhereMultipleOr	3925.0775205

Comparing RollingCountWhere to RollingCount and RollingSum:

120000000
avg of 2

ZeroKey
RollingCount	1511.7957295
RollingSum	1513.6013545
RollingCountWhereConstant	1403.2817915
RollingCountWhereMatch	1453.9323125
RollingCountWhereRange	1764.2137915
RollingCountWhereMultiple	1576.4896255
RollingCountWhereMultipleOr	1541.5631455

Bucketed - 250 buckets
RollingCount	3468.7696665
RollingSum	3326.047792
RollingCountWhereConstant	2858.677771
RollingCountWhereMatch	3327.958604
RollingCountWhereRange	3347.961083
RollingCountWhereMultiple	3429.413562
RollingCountWhereMultipleOr	3364.244104

Bucketed - 640 buckets
RollingCount	4310.4265835
RollingSum	4286.427479
RollingCountWhereConstant	3869.1892705
RollingCountWhereMatch	4333.8479375
RollingCountWhereRange	4269.3454375
RollingCountWhereMultiple	4290.0618545
RollingCountWhereMultipleOr	4346.8478535
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant