Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap the Table slice op #3110

Merged
merged 2 commits into from
Nov 30, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions py/server/deephaven/table.py
Original file line number Diff line number Diff line change
Expand Up @@ -1625,6 +1625,38 @@ def update_by(self, ops: Union[UpdateByOperation, List[UpdateByOperation]],
except Exception as e:
raise DHError(e, "table update-by operation failed.") from e

def slice(self, start: int, stop: int) -> Table:
"""Extracts a subset of a table by row positions into a new Table.

If both the start and the stop are positive, then both are counted from the beginning of the table.
The start is inclusive, and the stop is exclusive. slice(0, N) is equivalent to :meth:`~Table.head(N)`
The start must be less than or equal to the stop.

If the start is positive and the stop is negative, then the start is counted from the beginning of the
table, inclusively. The stop is counted from the end of the table. For example, slice(1, -1) includes all
rows but the first and last. If the stop is before the start, the result is an empty table.

If the start is negative, and the stop is zero, then the start is counted from the end of the table,
and the end of the slice is the size of the table. slice(-N, 0) is equivalent to :meth:`~Table.tail(N)`.

If the start is negative and the stop is negative, they are both counted from the end of the
table. For example, slice(-2, -1) returns the second to last row of the table.

Args:
start (int): the first row position to include in the result
stop (int): the last row position to include in the result

Returns:
a new Table

Raises:
DHError
"""
try:
return Table(j_table=self.j_table.slice(start, stop))
except Exception as e:
raise DHError(e, "table slice operation failed.") from e


class PartitionedTable(JObjectWrapper):
"""A partitioned table is a table containing tables, known as constituent tables.
Expand Down
27 changes: 27 additions & 0 deletions py/server/tests/test_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -782,6 +782,33 @@ def test_python_field_access(self):
self.assertIn("AAPL-GOOG", html_output)
self.assertIn("2000", html_output)

def test_slice(self):
with ugp.shared_lock():
t = time_table("00:00:00.01")
rt = t.slice(0, 3)
self.assert_table_equals(t.head(3), rt)

self.wait_ticking_table_update(t, row_count=5, timeout=5)
with ugp.exclusive_lock():
rt = t.slice(t.size, -2)
self.assertEqual(0, rt.size)
self.wait_ticking_table_update(rt, row_count=1, timeout=5)
self.assertGreaterEqual(rt.size, 1)

rt = t.slice(-3, 0)
self.assert_table_equals(t.tail(3), rt)
jmao-denver marked this conversation as resolved.
Show resolved Hide resolved

rt = t.slice(-3, -2)
self.wait_ticking_table_update(rt, row_count=1, timeout=5)
self.assertEqual(1, rt.size)

rt = t.slice(1, 3)
self.wait_ticking_table_update(rt, row_count=2, timeout=5)
self.assertEqual(2, rt.size)

with self.assertRaises(DHError):
rt = t.slice(3, 2)
Comment on lines +809 to +810
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the behavior we want? It seems like we may want to just return an empty table. In fact, if an index is negative, that is what we do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Java doc is very explicit about this behavior. It does seem inconsistent though but I would rather have an exception than an empty table which would take up some resource and would not be of any use. @rcaudy your thoughts?

Copy link
Member

@rcaudy rcaudy Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not particularly interested in re-litigating the semantics of this operation while wrapping from Python, and I'm not unhappy with the current behavior.

We throw an exception for non-sensical inputs in 2 scenarios:

  • firstPositionInclusive < 0 && lastPositionExclusive > 0 : This is a range counted from the end for first and from the beginning for last.
  • Both positive or both negative and lastPositionExclusive < firstPositionInclusive : This is an "inverted" range that can never be non-empty.

We could just call these impossible ranges "empty", but that seems user-hostile in (likely, IMO) cases where the user made a mistake in preparing their inputs. In the current approach, they can choose to handle the exception.

The "exceptional case" is that we always allow:

  • firstPositionInclusive >= 0 && lastPositionExclusive < 0 : This is a range defined relative to the end and the beginning, but not necessarily an empty range. This is just as valid as ranges larger than the current table size in either direction, and the output is only empty if there aren't currently any rows in it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no relitigation going on. The API was never discussed with me in the first place. It is inconsistent with other data science software.

If you look at Python and NumPy, both gracefully handle all cases. They both return the empty case when the indexing range is a null set.
image
image

Matlab / Octave behaves the same as Python / NumPy and also returns the empty case.
image

Q takes a slightly different tact. It returns null values when indexing outside of the range. Yet Q still gracefully returns a result.
https://code.kx.com/q4m3/3_Lists/#343-indexing-domain

Copy link
Member

@rcaudy rcaudy Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't call giving someone a useless result when they almost certainly screwed up their inputs "graceful". It's also just strictly worse when you imagine a scenario with a ticking parent table; the user could proceed to set up a whole scaffold of child operations that wait forever for data that can never change. This is the kind of thing enterprise users hate us for when we implement them the way you're proposing.

Regardless, if you want to discuss changing this, it can be a separate issue targeting the query engine; the Python wrapper should stay very thin here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to TableTools.merge with no non-null inputs. We throw an exception, rather than giving the users an empty result that will never, ever tick.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the Python wrapper should have the same semantics as calling from Java, and I like the exception for inverted ranges, but I do think we should support the x[-7:8] case to be consistent with other software.

Returning empty for inverted ranges I think sets you up for confusion...returning null I could almost get behind, except I think in practice for us it just sets you up for an NPE down the road. Like if it were slicing an array in the query language, I'd say we should return null for those cases (since other functions down the road would handle the null), but if slicing a table, you really need a non-null result for it to be useful anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Raffi's position as stated above. I do see it as reasonable to support a start relative to end and a last relative to start.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Raffi.

  • I've never really viewed it as a benefit that Python returns an empty object when you give it bad indices to slice.
  • I have always viewed it as a benefit to be able to slice relative to the end of an object.

I don't think throwing an error when incorrectly attempting to slice a table will detract from the user experience, so long as the error makes it obvious.



def global_fn() -> str:
return "global str"
Expand Down