[feature] Add 'expand_table' feature #1475

lavigne958 · 2024-06-02T22:39:17Z

Add a new feature that allows a user to expand a cell range
into a table.

the expand will look for the right most cell with adjacent value.
the expand will look for the bottom most cell with adjacent value.
the expand will table down from top left celle range to bottom right
value.

closes #1414

Signed-off-by: Alexandre Lavigne [email protected]

alifeee

this is a nice self-contained change. good work here :)

I think we need more tests, as I foresee a lot of edge cases, and it would be good to catch those before the feature is in gspread.

For example, what is the expectation for expanding this table from the top left?

1	1	-	1
1	1	1	1
-	-	1	-
1	1	1	1

This is not clear to me. Also the internal (private) utils expanding functions are unclear what they do.

As well, we should have a utils test and a worksheet test. Currently there is only utils

gspread/utils.py

gspread/worksheet.py

tests/utils_test.py

lavigne958 · 2024-06-13T22:00:04Z

this is a nice self-contained change. good work here :)

I think we need more tests, as I foresee a lot of edge cases, and it would be good to catch those before the feature is in gspread.

I thought about it and I thought:
we need to put limits to what we want to handle here:
Do we want to handle empty cells in the middle of the table ?
Do we want to handle cases when the given table is not square ?

For example, what is the expectation for expanding this table from the top left?
1	1	-	1
1	1	1	1
-	-	1	-
1	1	1	1

I would expect something like:

assuming the first row in the above example is data and part of the table, top left 1 is A1 coordinate:

| 1 | 1 |
| 1 | 1 |

assuming the first row are headers we don't take them:

| 1 | 1 |

This is not clear to me. Also the internal (private) utils expanding functions are unclear what they do.

As well, we should have a utils test and a worksheet test. Currently there is only utils

so far the method in Worksheet class does nothing particular, but I'll add test just in case, so we have a real test with data we get.

alifeee · 2024-06-18T00:41:15Z

I thought about it and I thought:
we need to put limits to what we want to handle here:
Do we want to handle empty cells in the middle of the table ?
Do we want to handle cases when the given table is not square ?

I think a reasonable implementation is "the same as what happens when you CTRL+RIGHT and CTRL+DOWN", i.e., to stop just before empty cells.

As for whether that is what other people will expect... that's another question... if people think it should work a certain way, they may think there is a bug... which is why very clear docstrings are good

assuming the first row are headers we don't take them:
| 1 | 1 |

wait, huh? why are we ignoring the headers? the function should just return a List[List]] with all items in the table starting from the called cell?

I will look back at the code now :)

alifeee

you are doing good work on this change :)

I have left some comments. I am a little worried that a new feature will result in something we do not consider, causing errors. But I think we will be ok now.

thanks!!!

gspread/utils.py

gspread/worksheet.py

alifeee · 2024-06-29T15:58:56Z

I have added some new tests. In particular, they are "what should happen" tests. They are:

1 - missing header item

        values = [
            ["A1", "", "C1", ""],
            ["A2", "B2", "C2", ""],
            ["A3", "B3", "C3", ""],
            ["", "", "", ""],
        ]
        expected_table = [
            ["A1"],
            ["A2"],
            ["A3"],
        ]

2 - missing initial cell

        values = [
            ["", "B1", "C1", ""],
            ["A2", "B2", "C2", ""],
            ["A3", "B3", "C3", ""],
            ["", "", "", ""],
        ]
        expected_table = [
            ["", "B1", "C1"],
            ["A2", "B2", "C2"],
            ["A3", "B3", "C3"],
        ]

3 - missing first column item

        values = [
            ["A1", "B1", "C1", ""],
            ["", "B2", "C2", ""],
            ["A3", "B3", "C3", ""],
            ["", "", "", ""],
        ]
        expected_table = [
            ["A1", "B1", "C1"],
            ["", "B2", "C2"],
            ["A3", "B3", "C3"],
        ]

4 - missing last column item

        values = [
            ["A1", "B1", "C1", ""],
            ["A2", "B2", "", ""],
            ["A3", "B3", "C3", ""],
            ["", "", "", ""],
        ]
        expected_table = [
            ["A1", "B1", "C1"],
        ]

Conc.

Are these what you expect? Personally I might want case 4 to end up in the entire table, as I reckon I am more likely to have a full first column, and empty gaps in the final column.

However, if I use the CTRL+RIGHT and CTRL+DOWN shortcuts in most spreadsheet software, then the tables above are the same. So I am happy with the behaviour above.

I am ready to merge this 👍. However, we should also add it to the list of examples and documentation somewhere.

alifeee

include in examples docs. otherwise approve :)

lavigne958 · 2024-07-03T22:27:21Z

thank you for the extra tests ! ⭐

I am a bit confused with the tests and what I see when I try it online using keyboard controls 😞

in this case, I agree with you, when I run the same data set with CTRL+RIGHT the cell jumps to C1... which is not expected at all.... I believe here we should keep our behavior it makes more sense to me according to our documentation and what I would expect from the function
I disagree, the first cell is empty, then the array stops here, and returns an empty array 🙄 I checked online with keyboard controls and for some reasons it jumps to celle B1 🤦 then when you go down it goes to B3 (as expected)... I don't know what choice is the best here 😞
I completely agree with it, as expected, like the documentation says: we can't scan for empty cells in the middle of the table.
I completely agree with it, though online when I try, it get to C1, nice, then I use CTRL+DOWN then it jumps to C3 🤦 I believe our example is more logic and provides the expected result.

so for 1., 3. and 4. I agree with your tests and the result is what I would expect.
what is left is case 2., do you think if we start on a blank cell we should start looking right/down for the next cell if it has a value or look right/down for the first non empty cell and start from here ? 🤔

alifeee · 2024-07-05T14:00:57Z

I have made this sheet

https://docs.google.com/spreadsheets/d/1huMwgagFCGTMVBf5mONk_W74-qHZUnpASHuVHnU6Hsk/edit?usp=sharing

I think maybe we do not "do the same as CTRL+arrow keys does", as this is confusing. I think we find another rule to follow (perhaps "the same as what xlwings does")

lavigne958 · 2024-07-05T22:01:50Z

Hey thanks for the working sample ! I completely agree with you some behaviors are completely unexpected 😞

To me we should go for what feels logical to us, which seems to be what is described in the xlwing library:

going right until first empty cell is found
going then down until first empty cell is found.
ignore empty starting cell (like you mentioned)
table direction should be: down then right as mention in their documentation here: expand parameters

does that sounds good to you ? 🤔
after all we can chose the implementation we like as well, regardless of what the Spreadsheet UI in a web browser does 😆

gspread/utils.py

alifeee · 2024-07-09T00:29:28Z

I believe my preference is to "find a table" by "complete row 1 and column 1"

so, to look at these h columns

h	h	h	h
h	-	-	-
h	-	-	-
h	-	-	-

...and if they are empty, terminate the table there.

I have updated the tests to do this (please have a look at the tests to see if you agree). What do you think?

lavigne958 · 2024-07-09T21:50:31Z

I believe my preference is to "find a table" by "complete row 1 and column 1"

so, to look at these h columns

h h h h
h - - -
h - - -
h - - -
...and if they are empty, terminate the table there.

I have updated the tests to do this (please have a look at the tests to see if you agree). What do you think?

I agree with this, the only detail would be: look down first then right.
to follow what xlwings does. it's completely arbitrary but it makes us ISO with the xlwings behavior.

lavigne958 · 2024-07-31T18:16:29Z

I did quite a big rework on the iteration functions.

all tests now pass, though the tests already here needed to be adjusted:

we now look down first
then right
so the expected result changes.

it works as expected, like xlwings.

alifeee · 2024-08-03T09:28:22Z

why does xlwings use the left column and bottom row?

it makes more sense to me that one would desire the top row?

lavigne958 · 2024-08-03T09:33:29Z

I know, that's what I did first, but we agreed on making it ISO with xlwings.

I don't mind changing it, I just want us to agree on one implementation.

alifeee · 2024-08-03T11:56:46Z

I don't think that's how xlwings works. I think xlwings uses the top row and left column.

See here for their implementation: https://github.com/xlwings/xlwings/blob/a159cc9cb7ef5f168ad0d902934404bce4a76386/xlwings/expansion.py#L31-L52

(I can't parse it with my mind fully but it looks to use the top row, not the bottom)

see also this comment: xlwings/xlwings#557 (comment)

lavigne958 · 2024-08-05T16:38:03Z

I don't think that's how xlwings works. I think xlwings uses the top row and left column.

See here for their implementation: https://github.com/xlwings/xlwings/blob/a159cc9cb7ef5f168ad0d902934404bce4a76386/xlwings/expansion.py#L31-L52

(I can't parse it with my mind fully but it looks to use the top row, not the bottom)

see also this comment: xlwings/xlwings#557 (comment)

you're right !
same I can't parse the code with my mind, but I understand the same thing and it makes sense. to start from the given cell, go right until empty cell, from starting cell again, go down until empty cell.

this provides the right and down boundaries then the rest is just to extract the matrix from here.

I'll update the code, expect potentially a big code change/refactoring then !

lavigne958 · 2024-08-19T19:24:09Z

Update

reworked the way we look for table, look right from starting cell, then look down from starting cell
updated the tests suite to reflect the behavior
- empty starting cell will return an empty list []
- empty cell on the way right or down will stop the search
- empty cell in the middle of the matrix will be ignored
- in the assert functions in the test, iterate over the expected result and expect the values to match
  - I notice tests passing when it should not because we iterate over the result and it's empty so we don't compare anything
updated the doc a bit
added extra checks on the size of the matrix and the starting cell coordinates in order to raise Exception if starting cell is outside the range of the given matrix.

I hope this covers any wrong inputs, the expected behavior from all (most?) users.

I'll rebase my branch before merge

alifeee · 2024-08-21T13:08:49Z

this looks all good to me!

only final thing is that I would expect a table like

	1	2	3
a	0.342	0.145	0.166
b	0.982	0.576	0.600
c	0.340	0.285	0.585

to expand properly, whereas it seems you suggest with

empty starting cell will return an empty list []

that it would return as an empty list

lavigne958 · 2024-08-21T17:21:30Z

Yes to me this function will return the table starting from the given coordinates. The starting cell is part of the resulting table. so if the starting cell is indeed empty then the table will be empty.

So far that seems to be the last key point where we can't find seems to be best. Should we just put it as an option like "ignore empty starting cell" ? Of true then we follow your choice of false we follow mine. This way the user decides the behavior.

What do you think?

alifeee · 2024-08-30T13:58:44Z

apologies for the delay

while I think a special case for a blank row 0 column 0 would be nice (e.g., for a labelled adjacency matrix), it is probably not worth the complication (e.g., if the row 0 column 0 is empty, but also row 1 column 0 and row 0 column 1 are empty, it should return [] not [[""]])

Thus, I think we return [] as you say. If people want their table to be parsed, they should fill the first cell with e.g., "-", or only call the function on the table contents, and use get_row and get_column (e.g.,) to get the headings.

However, there should be a test to validate this behaviour.

After that, I think this is ready to merge!

lavigne958 · 2024-09-08T16:19:47Z

I added a new commit with a test to validate the top left cell empty scenario.
In case we find a top left empty cell we return an empty list [].
Is this the test you referred to in you example ? if not let me know I'll add extra tests if need be.

I added an extra test too in the basic test for find_table to check when we reach the end of the list and the end of the matrix. just in case.

alifeee · 2024-09-10T18:34:02Z

this is good with me!

with the amount of work spent on a new feature... let's put reference to it in all the relevant places in the documentation :)

lavigne958 · 2024-09-22T22:26:04Z

this is good with me!

with the amount of work spent on a new feature... let's put reference to it in all the relevant places in the documentation :)

great call, it's done ✔️
I tried to make it minimal with all the important details.

Add a new feature that allows a user to expand a cell range into a table. the expand will look for the right most cell with adjacent value. the expand will look for the bottom most cell with adjacent value. the expand will table down from top left celle range to bottom right value. closes #1414 Signed-off-by: Alexandre Lavigne <[email protected]>

Signed-off-by: Alexandre Lavigne <[email protected]>

lavigne958 · 2024-09-22T22:29:55Z

before merging I rebased the whole thing in order to solve conflicts of any.

alifeee

seems good to me ! I am assuming you have manually tested the worksheet function (as there is only a utils function)

I hope you have had fun with this one ;]

I say we can merge this.

Signed-off-by: Alexandre Lavigne <[email protected]>

lavigne958 · 2024-09-25T16:05:32Z

yes I did 😉 it works like a charm. I just pushed a quick fix on the docstring to make the link to the raised exception work in the documentation.

lavigne958 self-assigned this Jun 2, 2024

lavigne958 requested a review from alifeee June 2, 2024 22:39

lavigne958 force-pushed the feature/find_table branch from 3df98d7 to 93ffc1e Compare June 2, 2024 22:41

alifeee requested changes Jun 3, 2024

View reviewed changes

alifeee reviewed Jun 18, 2024

View reviewed changes

alifeee added the Feature Request label Jun 29, 2024

alifeee added this to the 6.2.0 milestone Jun 29, 2024

alifeee requested changes Jun 29, 2024

View reviewed changes

lavigne958 commented Jul 6, 2024

View reviewed changes

gspread/utils.py Outdated Show resolved Hide resolved

lavigne958 requested a review from alifeee August 19, 2024 19:25

lavigne958 and others added 5 commits September 23, 2024 00:29

add some more tests for "edge" cases

c4ed289

update tests to reflect expected behaviour (failing)

4d7c2e9

update find_table to return correct behaviour

3296c59

doc: add documentation on how to use worksheet.expand().

858d77d

Signed-off-by: Alexandre Lavigne <[email protected]>

lavigne958 force-pushed the feature/find_table branch from 532f7aa to 858d77d Compare September 22, 2024 22:29

alifeee approved these changes Sep 24, 2024

View reviewed changes

fix docstring

6bb5a1a

Signed-off-by: Alexandre Lavigne <[email protected]>

lavigne958 merged commit c6001a2 into master Sep 25, 2024
10 checks passed

lavigne958 deleted the feature/find_table branch September 25, 2024 16:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] Add 'expand_table' feature #1475

[feature] Add 'expand_table' feature #1475

lavigne958 commented Jun 2, 2024

alifeee left a comment

lavigne958 commented Jun 13, 2024 •

edited by alifeee

Loading

alifeee commented Jun 18, 2024

alifeee left a comment

alifeee commented Jun 29, 2024

alifeee left a comment

lavigne958 commented Jul 3, 2024

alifeee commented Jul 5, 2024

lavigne958 commented Jul 5, 2024

alifeee commented Jul 9, 2024

lavigne958 commented Jul 9, 2024

lavigne958 commented Jul 31, 2024

alifeee commented Aug 3, 2024

lavigne958 commented Aug 3, 2024

alifeee commented Aug 3, 2024

lavigne958 commented Aug 5, 2024

lavigne958 commented Aug 19, 2024 •

edited

Loading

alifeee commented Aug 21, 2024

lavigne958 commented Aug 21, 2024

alifeee commented Aug 30, 2024

lavigne958 commented Sep 8, 2024

alifeee commented Sep 10, 2024

lavigne958 commented Sep 22, 2024

lavigne958 commented Sep 22, 2024

alifeee left a comment

lavigne958 commented Sep 25, 2024

[feature] Add 'expand_table' feature #1475

[feature] Add 'expand_table' feature #1475

Conversation

lavigne958 commented Jun 2, 2024

alifeee left a comment

Choose a reason for hiding this comment

lavigne958 commented Jun 13, 2024 • edited by alifeee Loading

alifeee commented Jun 18, 2024

alifeee left a comment

Choose a reason for hiding this comment

alifeee commented Jun 29, 2024

1 - missing header item

2 - missing initial cell

3 - missing first column item

4 - missing last column item

Conc.

alifeee left a comment

Choose a reason for hiding this comment

lavigne958 commented Jul 3, 2024

alifeee commented Jul 5, 2024

lavigne958 commented Jul 5, 2024

alifeee commented Jul 9, 2024

lavigne958 commented Jul 9, 2024

lavigne958 commented Jul 31, 2024

alifeee commented Aug 3, 2024

lavigne958 commented Aug 3, 2024

alifeee commented Aug 3, 2024

lavigne958 commented Aug 5, 2024

lavigne958 commented Aug 19, 2024 • edited Loading

Update

alifeee commented Aug 21, 2024

lavigne958 commented Aug 21, 2024

alifeee commented Aug 30, 2024

lavigne958 commented Sep 8, 2024

alifeee commented Sep 10, 2024

lavigne958 commented Sep 22, 2024

lavigne958 commented Sep 22, 2024

alifeee left a comment

Choose a reason for hiding this comment

lavigne958 commented Sep 25, 2024

lavigne958 commented Jun 13, 2024 •

edited by alifeee

Loading

lavigne958 commented Aug 19, 2024 •

edited

Loading