-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Table.add_group_number
, with operations Unique
and Equal_Count
#11818
Conversation
|
||
## Create the specified number of buckets with the same number of rows in | ||
each bucket (except possibly the last one). | ||
Equal_Count bucket_count:Integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@radeusgd Suggested we use the term 'bucket' here, such as Equal_Size_Bucket
. I agree that 'bucket' might be a good word here, but since this is just one of several grouping methods, I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this whole component should be talking about buckets instead of groups. I think group is too closely assocaited with aggregating,
GROUP Standard.Base.Values | ||
ICON column_add | ||
Adds a new column to the table enumerating groups of rows, assigning each | ||
row to one group number. All rows in each group will get the sane number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
row to one group number. All rows in each group will get the sane number. | |
row to one group number. All rows in each group will get the same number. |
Are we sure that Looking at #11782, it seems that it is only used in |
- name: The name of the new column. Defaults to "Group". | ||
- from: The starting value for the enumeration. Defaults to 0. | ||
- step: The amount to increment the enumeration by. Defaults to 1. | ||
- group_by: Specifies the columns to group by, for grouping methods that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make more sense to put this into Grouping_Method.Unique
rather than as a separate argument -- and perhaps the same with order_by
-- because each method has different requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to send this -- @radeusgd you also suggested it, so I did it.
Moving
But then it doesn't have the Is there an example of something like this? EDIT: never mind, I am looking at the |
@from (Widget.Numeric_Input display=..Always) | ||
@group_by (Widget_Helpers.make_column_name_multi_selector display=..When_Modified) | ||
@order_by (Widget_Helpers.make_order_by_selector display=..When_Modified) | ||
add_group_number self (grouping_method:Grouping_Method=..Unique) (name:Text="Group") (from:Integer=0) (step:Integer=1) (on_problems:Problem_Behavior=..Report_Warning) -> Table = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add_group_number self (grouping_method:Grouping_Method=..Unique) (name:Text="Group") (from:Integer=0) (step:Integer=1) (on_problems:Problem_Behavior=..Report_Warning) -> Table = | |
add_group_number self (grouping_method:Grouping_Method) (name:Text="Group") (from:Integer=0) (step:Integer=1) (on_problems:Problem_Behavior=..Report_Warning) -> Table = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to Missing_Argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to default to the Unique some grouping method?
Perhaps Equal_Count
is a good candidate for a default as it works regardless of what columns are there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do have a default, I would say Unique
because it's used much more.
In general, do we prefer to have defaults for things, or is it also ok to have no default, which forces the user to choose, but also it shows up as a warning, which the user might not want to see every time they use this method. I'm fine either way -- no default, or Unique [first column]
.
@AdRiley I had forgotten to push my widget changes. I'm not sure about bucket; I think it also connotes collapsing things into groups. This method just adds a column, so I think @jdunkerley I know this is step 1 in implementing something like Alteryx's "tile" tool, which sounds more like arranging and labeling then combining, what do you think? |
Renamed to |
I'm confused what the difference is between group and bucket? Is there a difference? |
This reverts commit 7ada84c.
Revered to "group" after discussion with @jdunkerley and @AdRiley |
_prepare_group_by table problem_builder group_by = | ||
table.columns_helper.select_columns_helper group_by Case_Sensitivity.Default True problem_builder . map c->c.java_column | ||
|
||
_prepare_ordering table problem_builder order_by = | ||
ordering = Table_Helpers.resolve_order_by table.columns order_by problem_builder | ||
ordering_columns = ordering.map c->c.column.java_column | ||
directions = ordering.map c->c.associated_selector.direction.to_sign | ||
[ordering_columns, directions] | ||
|
||
_illegal_if b msg ~cont = if b then Error.throw (Illegal_Argument.Error msg) else cont |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should either mark the whole module private
(if applicable) or mark the internal helper methods private
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marked module as private.
group_builder.specify "must specify group_by with Unique" <| | ||
t = table_builder_from_rows ['x', 'y', 'z'] [[1, 0, 2], [0, 1, 0], [1, 2, 0], [0, 1, 1], [1, 0, 1], [1, 2, 1]] | ||
|
||
t.add_group_number ..Unique "g" . should_fail_with Missing_Argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to also test edge case where on
is specified but is empty (on=[]
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
group_builder.specify "should add group number by equal counts" <| | ||
t = table_builder [['x', [1, 2, 3, 4, 5]], ['y', [5, 4, 3, 2, 1]], ['z', [1, 5, 4, 2, 3]]] | ||
|
||
g0 = t.add_group_number (..Equal_Count 3) "g" | ||
g0.at 'g' . to_vector . should_equal [0, 0, 1, 1, 2] | ||
|
||
g1 = t.add_group_number (..Equal_Count 3 order_by=['x']) "g" | ||
g1.at 'g' . to_vector . should_equal [0, 0, 1, 1, 2] | ||
|
||
g2 = t.add_group_number (..Equal_Count 3 order_by=['y']) "g" | ||
g2.at 'g' . to_vector . should_equal [2, 1, 1, 0, 0] | ||
|
||
g3 = t.add_group_number (..Equal_Count 3 order_by='z') "g" | ||
g3.at 'g' . to_vector . should_equal [0, 2, 1, 0, 1] | ||
|
||
g4 = t.add_group_number (..Equal_Count 2) "g" | ||
g4.at 'g' . to_vector . should_equal [0, 0, 0, 1, 1] | ||
|
||
g5 = t.add_group_number (..Equal_Count 2 order_by=['x']) "g" | ||
g5.at 'g' . to_vector . should_equal [0, 0, 0, 1, 1] | ||
|
||
g6 = t.add_group_number (..Equal_Count 2 order_by=['y']) "g" | ||
g6.at 'g' . to_vector . should_equal [1, 1, 0, 0, 0] | ||
|
||
g7 = t.add_group_number (..Equal_Count 2 order_by='z') "g" | ||
g7.at 'g' . to_vector . should_equal [0, 1, 1, 0, 0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to also test edge cases:
..Equal_Count 0
-> should fail with illegal argument I imagine..Equal_Count -1
-> also should fail, cannot be negative..Equal_Count 1
-> should work and just assign the same group number to all entries - not very useful on its own, but may be useful when the number of groups is a parameter coming from somewhere - splitting into 1 group is technically a valid thing to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
group_builder.specify "should rename existing column upon a name clash, and attach a warning" <| | ||
t = table_builder_from_rows ['x', 'y', 'z'] [[1, 0, 2], [0, 1, 0], [1, 2, 0], [0, 1, 1], [1, 0, 1], [1, 2, 1]] | ||
|
||
g0 = t.add_group_number (..Unique on=['x', 'y']) "y" | ||
g0.at 'y 1' . to_vector . should_equal [0, 1, 2, 1, 0, 2] | ||
Problems.expect_warning Duplicate_Output_Column_Names g0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the values in columns were something else the numbers 0, 1, 2 (e.g. a, b, c) it would be far easier to tell what this test does - as I wasn't really sure if it was checking if y 1
is the new column or existing column - both will have values 0, 1, 2 just in maybe different order.
Also it would still be good to check not only y 1
but also y
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth addressing Radek's test comments and lets make the helper private for now
distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Add_Group_Number.enso
Show resolved
Hide resolved
private static class StepIterator { | ||
private final long step; | ||
private long current; | ||
|
||
public StepIterator(long start, long step) { | ||
this.step = step; | ||
this.current = start; | ||
} | ||
|
||
public long next() { | ||
var toReturn = current; | ||
current = Math.addExact(current, step); | ||
return toReturn; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be more or less equivalent to AddRowNumber.RangeIterator
. Is there any reason to keep these 2 separate or maybe we shall merge them? I like that StepIterator
avoid boxing, perhaps that is enough to keep it separate - but then we may add a comment on it explaining the why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They behave differently -- RangeIterator
allows you to call currentValue
before next
and is perhaps in an uninitialized state -- and it seems like that was possibly done on purpose; and both next
and currentValue
return value.
Arrays.stream(orderingColumns).map(Column::getStorage).toArray(Storage[]::new); | ||
List<OrderedMultiValueKey> keys = | ||
new ArrayList<>( | ||
IntStream.range(0, (int) numRows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are doing casts like this all over the place, but I always feel a bit uneasy, shall we prefer Math.toIntExact
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
private static class EqualCountGenerator { | ||
private final long start; | ||
private final long step; | ||
private long current = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call this currentIndex
. Otherwise it is hard to say if this is current index or value and these are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
…al_Count` (#11818) * wip * one test * wip * wip * more tests * changelog * docs, more tests * switch in enso * counter overflow * run under db tests * fmt * merge * move group + order into method * wip * bucket * Revert "bucket" This reverts commit 7ada84c. * aliases * review (cherry picked from commit 739ee3a)
Pull Request Description
Important Notes
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
TypeScript,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
or the Snowflake database integration, a run of the Extra Tests has been scheduled.