Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap the Table slice op #3110

Merged
merged 2 commits into from
Nov 30, 2022
Merged

Conversation

jmao-denver
Copy link
Contributor

@jmao-denver jmao-denver commented Nov 23, 2022

Fixes #3089

rcaudy
rcaudy previously approved these changes Nov 28, 2022
py/server/tests/test_table.py Show resolved Hide resolved
Comment on lines +809 to +810
with self.assertRaises(DHError):
rt = t.slice(3, 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the behavior we want? It seems like we may want to just return an empty table. In fact, if an index is negative, that is what we do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Java doc is very explicit about this behavior. It does seem inconsistent though but I would rather have an exception than an empty table which would take up some resource and would not be of any use. @rcaudy your thoughts?

Copy link
Member

@rcaudy rcaudy Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not particularly interested in re-litigating the semantics of this operation while wrapping from Python, and I'm not unhappy with the current behavior.

We throw an exception for non-sensical inputs in 2 scenarios:

  • firstPositionInclusive < 0 && lastPositionExclusive > 0 : This is a range counted from the end for first and from the beginning for last.
  • Both positive or both negative and lastPositionExclusive < firstPositionInclusive : This is an "inverted" range that can never be non-empty.

We could just call these impossible ranges "empty", but that seems user-hostile in (likely, IMO) cases where the user made a mistake in preparing their inputs. In the current approach, they can choose to handle the exception.

The "exceptional case" is that we always allow:

  • firstPositionInclusive >= 0 && lastPositionExclusive < 0 : This is a range defined relative to the end and the beginning, but not necessarily an empty range. This is just as valid as ranges larger than the current table size in either direction, and the output is only empty if there aren't currently any rows in it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no relitigation going on. The API was never discussed with me in the first place. It is inconsistent with other data science software.

If you look at Python and NumPy, both gracefully handle all cases. They both return the empty case when the indexing range is a null set.
image
image

Matlab / Octave behaves the same as Python / NumPy and also returns the empty case.
image

Q takes a slightly different tact. It returns null values when indexing outside of the range. Yet Q still gracefully returns a result.
https://code.kx.com/q4m3/3_Lists/#343-indexing-domain

Copy link
Member

@rcaudy rcaudy Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't call giving someone a useless result when they almost certainly screwed up their inputs "graceful". It's also just strictly worse when you imagine a scenario with a ticking parent table; the user could proceed to set up a whole scaffold of child operations that wait forever for data that can never change. This is the kind of thing enterprise users hate us for when we implement them the way you're proposing.

Regardless, if you want to discuss changing this, it can be a separate issue targeting the query engine; the Python wrapper should stay very thin here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to TableTools.merge with no non-null inputs. We throw an exception, rather than giving the users an empty result that will never, ever tick.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the Python wrapper should have the same semantics as calling from Java, and I like the exception for inverted ranges, but I do think we should support the x[-7:8] case to be consistent with other software.

Returning empty for inverted ranges I think sets you up for confusion...returning null I could almost get behind, except I think in practice for us it just sets you up for an NPE down the road. Like if it were slicing an array in the query language, I'd say we should return null for those cases (since other functions down the road would handle the null), but if slicing a table, you really need a non-null result for it to be useful anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Raffi's position as stated above. I do see it as reasonable to support a start relative to end and a last relative to start.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Raffi.

  • I've never really viewed it as a benefit that Python returns an empty object when you give it bad indices to slice.
  • I have always viewed it as a benefit to be able to slice relative to the end of an object.

I don't think throwing an error when incorrectly attempting to slice a table will detract from the user experience, so long as the error makes it obvious.

Copy link
Member

@chipkent chipkent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaving this open because of the ongoing discussion item.

@jmao-denver jmao-denver merged commit 368a6d5 into deephaven:main Nov 30, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Nov 30, 2022
@deephaven-internal
Copy link
Contributor

@jmao-denver jmao-denver deleted the 3089-wrap-slice-op branch March 31, 2023 03:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Table slice operation to the Python Table wrapper
6 participants