feat: implement row and cell model classes #753

daniel-sanche · 2023-03-16T17:01:33Z

This PR implements the RowResponse and CellResponse classes, used to represent the data returned by read_rows queries

RowResponse implements Sequence, allowing it to be treated like a list of Cells (len, iteration, indexing, etc)

It also implements some properties of Mapping (allowing it to be indexed by family and (family, qualifier) keys, retrieving the list of keys, values, items, etc)

Note: This is merging into a new v3 branch, not main. I plan on making all PRs related to this rewrite to v3 and then merging v3 into main when development is complete

daniel-sanche · 2023-03-16T17:03:08Z

google/cloud/bigtable/row_response.py

    ) -> list[CellResponse]:
        """
        Returns cells sorted in Bigtable native order:
            - Family lexicographically ascending
-            - Qualifier lexicographically ascending
+            - Qualifier ascending


I had in my notes that Qualifier should be sorted lexicographicaly, but I assume this is a mistake, since qualifier is bytes?
I implemented this by directly comparing the bytes values instead

java-bigtable uses a byteStringComaprator: https://github.com/googleapis/java-bigtable/blob/ef7d7419293f9f046dcf941148a86da713feae17/google-cloud-bigtable/src/main/java/com/google/cloud/bigtable/data/v2/internal/ByteStringComparator.java

https://github.com/googleapis/java-bigtable/blob/ef7d7419293f9f046dcf941148a86da713feae17/google-cloud-bigtable/src/main/java/com/google/cloud/bigtable/data/v2/models/RowCell.java#L51

Ok, so yeah it looks like it is comparing them as unsigned byte arrays, which I believe is the same operation as Python's byte comparison. I'll have to add some tests to make sure they are consistent though

mutianf · 2023-03-24T18:13:55Z

google/cloud/bigtable/row_response.py

    ) -> list[CellResponse]:
        """
        Returns cells sorted in Bigtable native order:
            - Family lexicographically ascending
-            - Qualifier lexicographically ascending
+            - Qualifier ascending


java-bigtable uses a byteStringComaprator: https://github.com/googleapis/java-bigtable/blob/ef7d7419293f9f046dcf941148a86da713feae17/google-cloud-bigtable/src/main/java/com/google/cloud/bigtable/data/v2/internal/ByteStringComparator.java

https://github.com/googleapis/java-bigtable/blob/ef7d7419293f9f046dcf941148a86da713feae17/google-cloud-bigtable/src/main/java/com/google/cloud/bigtable/data/v2/models/RowCell.java#L51

google/cloud/bigtable/row_response.py

mutianf · 2023-03-24T18:22:59Z

google/cloud/bigtable/row_response.py

        """
-        raise NotImplementedError
+        Returns a list of (family, qualifier) pairs associated with the cells


I would expect keys() to return the row key of the response, maybe there's a better name for this? Or maybe I misunderstand something?

Yeah, so for convenience, I made RowResponse conform to the Sequence base type, so it can be treated as a standard sequence:

len(row_response)

for cell in row_response:,

first = row_response[0]

sample = row_response[0:5]

etc

I also wanted to make it partially compatible with the Mapping base type, to make indexing convenient. In this context, the row_response can be treated as a key/value dictionary, where the keys are families or family/qualifier pairs, and the values are the associated cells:

cells = row_response["my_family"]

cells = row_response["my-family, "my-qualifier"]

I added also implementations for Mapping's keys() values() and items() functions to make iteration easier:

for family,qualifier in row_response.keys(): cells = row_response[family,qualifier] for cells in row_response.values(): print(cells) for (family,qualifier), cells in row_response.items(): print(cells)

So TL;DR: keys here refers to dictionary.keys(), to make iteration over stored data easier. But I can see how that may be confusing, and I'm ok with making changes here

gotcha, thanks for the explanation! maybe we can leave the implementations for Mapping's keys() values() and items() functions part out for now? I can see how it's convenient, but maybe we can keep it simple at the first iteration?

Yeah, sounds good. I changed keys() to get_column_components, and removed the others

google/cloud/bigtable/row_response.py

mutianf · 2023-03-24T18:27:09Z

google/cloud/bigtable/row_response.py

+        for cell in sorted(cells):
+            if cell.row_key != self.row_key:
+                raise ValueError(
+                    f"CellResponse row_key ({cell.row_key!r}) does not match RowResponse key ({self.row_key!r})"


Is this check necessary?

Our backend should never create rows that hit this, but end users could attempt to

And it doesn't hurt to include it as a sanity check either way

as discussed, endusers should never construct these

Co-authored-by: Mattie Fu <[email protected]>

google/cloud/bigtable/row_response.py

igorbernstein2 · 2023-03-31T20:24:30Z

google/cloud/bigtable/row_response.py

+        cells: list[CellResponse]
+        | dict[tuple[family_id, qualifier], list[dict[str, Any]]],


I think there should only be one way to represent data in a response

maybe make this internal as well

igorbernstein2 · 2023-03-31T20:30:27Z

google/cloud/bigtable/row_response.py

+        self._cells_map: dict[
+            family_id, dict[qualifier, list[CellResponse]]


I think a better data structure would be: OrderedDict[family, list[Cell]) and use bsearch to find a qualifier in the cell list

Can you remind me the motivation for this?

igorbernstein2 · 2023-03-31T20:31:31Z

google/cloud/bigtable/row_response.py

+        for cell in sorted(cells):
+            if cell.row_key != self.row_key:
+                raise ValueError(
+                    f"CellResponse row_key ({cell.row_key!r}) does not match RowResponse key ({self.row_key!r})"


as discussed, endusers should never construct these

igorbernstein2 · 2023-03-31T20:37:20Z

google/cloud/bigtable/row_response.py

+        this_ordering = (
+            self.family,
+            self.column_qualifier,
+            -self.timestamp_micros,


please dont fight native ordering

My understanding was that newer cells should come first? By default, older timestamps would come first.

That said, comparison is less relevant if we just use cells as they come from the backend though. If we don't have a way to determine proper ordering from the client side, maybe we should remove these comparison implementations entirely?

* feat: add new v3.0.0 API skeleton (#745) * feat: improve rows filters (#751) * feat: read rows query model class (#752) * feat: implement row and cell model classes (#753) * feat: add pooled grpc transport (#748) * feat: implement read_rows (#762) * feat: implement mutate rows (#769) * feat: literal value filter (#767) * feat: row_exists and read_row (#778) * feat: read_modify_write and check_and_mutate_row (#780) * feat: sharded read rows (#766) * feat: ping and warm with metadata (#810) * feat: mutate rows batching (#770) * chore: restructure module paths (#816) * feat: improve timeout structure (#819) * fix: api errors apply to all bulk mutations * chore: reduce public api surface (#820) * feat: improve error group tracebacks on < py11 (#825) * feat: optimize read_rows (#852) * chore: add user agent suffix (#842) * feat: optimize retries (#854) * feat: add test proxy (#836) * chore(tests): add conformance tests to CI for v3 (#870) * chore(tests): turn off fast fail for conformance tets (#882) * feat: add TABLE_DEFAULTS enum for table method arguments (#880) * fix: pass None for retry in gapic calls (#881) * feat: replace internal dictionaries with protos in gapic calls (#875) * chore: optimize gapic calls (#863) * feat: expose retryable error codes to users (#879) * chore: update api_core submodule (#897) * chore: merge main into experimental_v3 (#900) * chore: pin conformance tests to v0.0.2 (#903) * fix: bulk mutation eventual success (#909) --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>

daniel-sanche added 2 commits March 16, 2023 09:48

implemented row and cell response

ec47f91

added tests for row and cell response

a40c00c

daniel-sanche requested review from a team as code owners March 16, 2023 17:01

product-auto-label bot added size: xl Pull request size is extra large. api: bigtable Issues related to the googleapis/python-bigtable API. labels Mar 16, 2023

daniel-sanche commented Mar 16, 2023

View reviewed changes

daniel-sanche added 4 commits March 16, 2023 10:59

removed explicit Mapping inheritance relationship

2dbd4ad

improved comments

58024b0

fixed values implementation

d4cfd28

removed python7 incompatible statement

587945c

daniel-sanche added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 23, 2023

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 23, 2023

mutianf reviewed Mar 24, 2023

View reviewed changes

fixed docstring

2f316ce

Co-authored-by: Mattie Fu <[email protected]>

Mariatta approved these changes Mar 30, 2023

View reviewed changes

daniel-sanche added 4 commits March 30, 2023 16:06

remvoed keys, values, items

8873e9d

removed nanosecond timestamps

ff7dcbb

ran black

39da24d

removed from_dict

3a6fff1

igorbernstein2 reviewed Mar 31, 2023

View reviewed changes

daniel-sanche and others added 5 commits April 2, 2023 10:56

renamed RowResponse and CellResponse to Row and Cell

9429244

fixed tests

1aa7424

simplified row construction

a603649

Merge branch 'v3' into v3_row_response

d9765fd

updated import paths

c662d94

daniel-sanche merged commit c55099f into googleapis:v3 Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement row and cell model classes #753

feat: implement row and cell model classes #753

daniel-sanche commented Mar 16, 2023 •

edited

Loading

daniel-sanche Mar 16, 2023 •

edited

Loading

mutianf Mar 24, 2023

daniel-sanche Mar 27, 2023

mutianf Mar 24, 2023

mutianf Mar 24, 2023

daniel-sanche Mar 27, 2023

daniel-sanche Mar 29, 2023

mutianf Mar 30, 2023

daniel-sanche Mar 30, 2023

mutianf Mar 24, 2023

daniel-sanche Mar 27, 2023 •

edited

Loading

igorbernstein2 Mar 31, 2023

igorbernstein2 Mar 31, 2023

igorbernstein2 Mar 31, 2023

igorbernstein2 Mar 31, 2023

daniel-sanche Apr 3, 2023

igorbernstein2 Mar 31, 2023

igorbernstein2 Mar 31, 2023

daniel-sanche Apr 3, 2023

		cells: list[CellResponse]
		\| dict[tuple[family_id, qualifier], list[dict[str, Any]]],

		self._cells_map: dict[
		family_id, dict[qualifier, list[CellResponse]]

feat: implement row and cell model classes #753

feat: implement row and cell model classes #753

Conversation

daniel-sanche commented Mar 16, 2023 • edited Loading

daniel-sanche Mar 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniel-sanche Mar 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniel-sanche commented Mar 16, 2023 •

edited

Loading

daniel-sanche Mar 16, 2023 •

edited

Loading

daniel-sanche Mar 27, 2023 •

edited

Loading