-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrap rolling sum updateBy operation #3399
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@@ -267,3 +266,85 @@ def forward_fill(cols: Union[str, List[str]]) -> UpdateByOperation: | |||
return UpdateByOperation(j_updateby_op=_JUpdateByOperation.Fill(cols)) | |||
except Exception as e: | |||
raise DHError(e, "failed to create a forward fill UpdateByOperation.") from e | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing this made me wonder if we made an API mistake. For the rolling sum, ema, etc, we have separate methods for different flavors (e.g. ema_tick_decay
and ema_time_decay
). It seems like it might be more pythonic to have ema
and rolling_sum
, where the flavor is determined by the inputs. I'm not sure how strong my thoughts are, so this is more of a discussion item.
@jmao-denver @jjbrosnan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the benefits of folding them into one function, but I prefer to have separate functions (especially when the parameter sets are quite different), as it is more explicit, cleaner, less chance for users to make mistake as far as arguments are concerned, also easier for us to document/maintain the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind the explicitness of the ticks
/time
signatures but I have near-zero python experience in a professional role.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the parameter sets need to be that different. If these were combined into one function, you could have the first argument ts_col
be optional with a default value of None
. The rest of the arguments could look like they do in the time-based rolling sum functions, where each argument must be an int or string. In a tick case, you give ints. In the time case, if you give an int, it assumes seconds, and if you give a string, it can be whatever.
Then, the try/except block at the end can be separated into two parts, which depend on whether or not a timestamp column is given.
I think the benefits of combining each into one function outweigh the drawbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially, for the rolling sum, it would be (without comments):
def rolling_sum(ts_col: str = None, cols: Union[str, List[str]], rev_time: Union[int, str],
fwd_time: Union[int, str] = 0) -> UpdateByOperation:
if ts_col:
try:
cols = to_sequence(cols)
rev_time = _JDateTimeUtils.expressionToNanos(rev_time) if isinstance(rev_time, str) else rev_time
fwd_time = _JDateTimeUtils.expressionToNanos(fwd_time) if isinstance(fwd_time, str) else fwd_time
return UpdateByOperation(j_updateby_op=_JUpdateByOperation.RollingSum(ts_col, rev_time, fwd_time, *cols))
except Exception as e:
raise DHError(e, "failed to create a rolling sum (time) UpdateByOperation.") from e
else:
try:
cols = to_sequence(cols)
return UpdateByOperation(j_updateby_op=_JUpdateByOperation.RollingSum(rev_ticks, fwd_ticks, *cols))
except Exception as e:
raise DHError(e, "failed to create a rolling sum (tick) UpdateByOperation.") from e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jjbrosnan some extra testing in the ticks
case will be needed. Time strings need to be rejected as they make no sense for row-based computations. You will get an exception from the call, so as long as the error is clear this looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is doable (we'll need to add checks for invalid argument combinations) but to me the extra cost and the potential cognitive burden on a user to understand the difference outweighs the gains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also having ts_col as optional pretty much forces the users to use keyword arguments in the tick case, which may not be a bad thing but then it is not consistent with the time case.
@@ -267,3 +266,85 @@ def forward_fill(cols: Union[str, List[str]]) -> UpdateByOperation: | |||
return UpdateByOperation(j_updateby_op=_JUpdateByOperation.Fill(cols)) | |||
except Exception as e: | |||
raise DHError(e, "failed to create a forward fill UpdateByOperation.") from e | |||
|
|||
|
|||
def rolling_sum_tick(cols: Union[str, List[str]], rev_ticks: int, fwd_ticks: int = 0) -> UpdateByOperation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmao-denver thoughts on the argument ordering vs other methods in this module? e.g. did we do ema_tick_decay
in the wrong order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter ordering of ema_tick_decay/ema_time_decay matches the Java ones more closely, which is nice. But it is not something we can do with rolling sums. I have no objection to making them consistent if you feel strongly about it. Even though it is a breaking change, I doubt we'll affect users so much as our own docs/examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also would prefer a closer match with Java/Groovy for rolling_sum
, but I think the argument is about optional parameter(s). If we want to harmonize, it makes as much sense to me to update the Java signatures and move these optional parameters to the end of the signature.
Labels indicate documentation is required. Issues for documentation have been opened: How-to: https://github.com/deephaven/deephaven.io/issues/2155 |
Fixes #3388