Introduce simple reporting client with support for single metric historical data #16

cwasicki · 2024-03-15T20:49:19Z

This initial version of the reporting API client offers focused functionality to retrieve historical data for a single metric from a single component.

Features of this version:

Pagination Handling: Seamlessly request time series data of a single metric from a single component, with integrated pagination management.
Data Transformation: Utilize a wrapper data class that retains the raw protobuf response while offering transformation capabilities. A generator function is provided for iterating over individual page values.
Structured Data: Streamline data representation through named tuples for timestamp and value pairs, eliminating ambiguity for single component and metric scenarios.
Usage Examples: Code examples demonstrate usage, along with experimental code for upcoming features (considered for removal).
Unit Testing: Basic unit tests are included.

Limitations of this version:

Single Metric Focus: Initially supporting queries for individual metrics to support most common use cases, with an extensible design for multiple metrics/components in future. ->
- Support multiple components and microgrids #19
States and bounds: In line with focus on widely used functionalities, the approach to integrating states and bounds within the data structures is still under exploration. ->
- Support request option for bounds #20
- Support request include option for states #25
Metric Sample Variants: Currently supports SimpleMetricSample exclusively, with decision on how to integrate AggregatedMetricSample pending. ->
- Support for full metric sample variant #21.
Resampling: not yet exposed (service-side). ->
- Support for resampling option #22
Streaming: Functions not yet available (service-side). Current generator implementations on pages and entries can be aligned with streaming output in future. ->
- Support real-time streaming requests #23
Aggregation: Support for formulas still missing (service-side) ->
- Support user-defined metric formulas #24

Signed-off-by: cwasicki <[email protected]>

This is the initial release of the Reporting API client, streamlined for retrieving single metric historical data for a single component. It incorporates pagination handling and utilizes a wrapper data class that retains the raw protobuf response while offering transformation capabilities limited here to generators of structured data representation via named tuples. Current limitations include a single metric focus with plans for extensibility, ongoing development for states and bounds integration, as well as support for service-side features like resampling, streaming, and formula aggregations. Signed-off-by: cwasicki <[email protected]>

Signed-off-by: cwasicki <[email protected]>

cwasicki · 2024-03-15T20:53:06Z

pyproject.toml

@@ -29,6 +29,8 @@ requires-python = ">= 3.11, < 4"
 # TODO(cookiecutter): Remove and add more dependencies if appropriate
 dependencies = [
  "typing-extensions >= 4.5.0, < 5",
+  "frequenz-api-reporting >= 0.1.1, < 1",
+  "frequenz-client-common @ git+https://github.com/frequenz-floss/[email protected]",


This needs to be updated with the upcoming release.

Signed-off-by: cwasicki <[email protected]>

flora-hofmann-frequenz

LGTM 👍

examples/client.py

src/frequenz/client/reporting/_client.py

tiyash-basu-frequenz

LGTM

llucax

I arrived quite late to the party, sorry, but it is actually a good think, I wouldn't have wanted to block the PR with this feedback, at least some of it is more long term.

I will create separate issues for things that need further discussion.

llucax · 2024-04-02T15:29:03Z

src/frequenz/client/reporting/_client.py

+Sample = namedtuple("Sample", ["timestamp", "value"])
+"""Type for a sample of a time series."""
+
+MetricSample = namedtuple(
+    "MetricSample", ["timestamp", "microgrid_id", "component_id", "metric", "value"]
+)
+"""Type for a sample of a time series incl. metric type, microgrid and component ID"""


Why not a dataclass? Performance reasons or just because it is more idiomatic for data science? If the reason is the former, it would be good to provide a benchmark to show the difference; if it is the later, it might be good to add a comment, and eventually explain what are the design principles for this library in the README or something, but for now, as we keep learning about what we really need, I would just put a comment in the code here.

This one I think should be answered here as it is not worth a separate issue, unless this triggers a bigger discussion.

The idea was to keep it as low-level as possible, since this is the fundamental data structure that is repeated for every single sample. So for me the question is, why should we use a dataclass instead? For further analysis the data is typically transformed into an array-like (e.g. pandas data frame) data type anyway, which nicely works with this data type (originally it was just a tuple until @shsms passed by).

Yeah, agreed, that was my main question. As for why dataclasses, just because they are a higher level abstraction, and are more flexible, but if this is targeting data science and for that use case tuples make more sense, then I agree we should use that. I just think we should make it more clear, as said above, a comment in the code should be enough.

llucax · 2024-04-02T15:30:31Z

src/frequenz/client/reporting/_client.py

+    _data_pb: PBListMicrogridComponentsDataResponse
+    """The underlying protobuf message."""


I would call this _pb_data instead (or _pb_message). We call it _raw in the SDK. Maybe it would be nice to agree on some standard so all clients use the same, although it is not exposed so it is not that important.

Standard name for raw pb message/data in wrapper classes frequenz-api-common#215

Good idea to standardize, I am fine with all suggestions.

llucax · 2024-04-02T15:30:45Z

src/frequenz/client/reporting/_client.py

+            return True
+        return False
+
+    def iterate_metric_samples(self) -> Generator[MetricSample, None, None]:


I would just make the class iterable directly (also Iterator[MetricSample]¹ is a shorter and more idiomatic alias for Generator[MetricSample, None, None]²)

def __iter__(self) -> Iterator[MetricSample]: ...

Then you can just do for sample in page: instead of for sample in page.iterate_metric_samples():.

Footnotes

Technically a Generator is more than an Iterator:

extends iterators with the send(), throw() and close() methods

But in this context I think we only care about the iterator part, we are using a generator out of convenience, not because we are actually using the extra methods provided by a generator. ↩

From typing.Generator:

Alternatively, annotate your generator as having a return type of either Iterable[YieldType] or Iterator[YieldType]:

↩

This is a small change, so we can also briefly discuss here and could be quickly changed in the current repo. The broader discussion is now here:

Iterating over paginated API methods frequenz-client-base-python#41

I would just make the class iterable directly (also Iterator[MetricSample]1 is a shorter and more idiomatic alias for Generator[MetricSample, None, None]2)

Agree, will change this. In general a generator was used for lazy iteration, but that doesn't need to be reflected in the type annotation.

Iterators also provides lazy iteration. Generators are just more powerful as they also let you pass messages between the iterator and the outside world.

llucax · 2024-04-02T15:32:32Z

src/frequenz/client/reporting/_client.py

+                    )
+
+    @property
+    def next_page_token(self) -> Any:


This should return str, right? If you want it to return an opaque type, then use object instead, as Any will completely disable type checking, doing wrong stuff like t = p.next_page_token; t.i_dont_exist() will not be detected by mypy.

I suggest also discussing this one here, and only create a separate issue if it is not trivial.

You are right, should be string or None: https://github.com/frequenz-floss/frequenz-api-common/blob/20cdc478555f3ba55bd86ccfd19dbc0beaf80edc/proto/frequenz/api/common/v1/pagination/pagination_info.proto#L26.

OK, should we create an issue or do you plan to do a quick fix directly?

llucax · 2024-04-02T15:35:01Z

src/frequenz/client/reporting/_client.py

+        return self._data_pb.pagination_info.next_page_token
+
+
+class ReportingClient:


We should probably also standarize this name scheme. I think in the microgrid API I used just ApiClient, as it is expected to be used as microgrid.ApiClient. I don't mind using xxx.ApiClient or XxxApiClient but I think we should call it ApiClient because otherwise it might be confusing for some APIs if the term client have some other meaning.

I suggest discussing this in a meeting, I guess it should be quicker in a more interactive way.

Decision was to use XxxApiClient, right?

llucax · 2024-04-02T15:39:19Z

src/frequenz/client/reporting/_client.py

+        self._stub = ReportingStub(self._grpc_channel)
+
+    # pylint: disable=too-many-arguments
+    async def iterate_single_metric(


I wouldn't name these methods iterate, not sure if they are a common idiom in data science but otherwise I'd say it is not a common idiom in general in python. I think client methods should map more or less one to one to the gRPC API methods. In this case I think the name should be: list_microgrid_components_data(). I think it is fine to do the page unwrapping and still use the same name as the gRPC API method.

Again, I think we should have a common approach in all APIs.

PS: I guess the extra explicitness in names might come from untyped python, in that case it makes more sense, but when using type hints you can already see you are iterating (AsyncIterator) over a single metric ([Sample]).

IMHO it is pretty clear that Python API client method names should match the gRPC API, so I suggest to create a separate issue about this only if anyway thinks differently.

As for how to iterate, please refer to:

Iterating over paginated API methods frequenz-client-base-python#41

I wouldn't name these methods iterate,

PS: I guess the extra explicitness in names might come from untyped python, in that case it makes more sense, but when using type hints you can already see you are iterating (AsyncIterator) over a single metric ([Sample]).

Agree, can remove this. Actually the explicitness is to distinguish from different access methods that I considered, e.g. alternative to the iterator a dictionary representing all nesting levels. At the moment I am not sure whether it will come.

In this case I think the name should be: list_microgrid_components_data()

Not sure about this. The method offers only a small subset of the potential functionality of the gRPC function. And the current name reflects this limitation. Also, since the current version is sufficient for many applications already, I am not sure whether this method will be extended or another method will be added, that provides full support of the gRPC.

Mmmm, OK, I see. In general I think clients should only be a thin wrapper over the gRPC calls, and other more convenient functions could be built on top. I mean, I'm completely fine to keep it as it is for an early version, especially if we need something fast, but in the long term I don't think it is the right approach. If the API is more flexible, then some users must need that flexibility, otherwise the gRPC API we have is more complex than needed.

If the API is more flexible, then some users must need that flexibility, otherwise the gRPC API we have is more complex than needed.

Yes, the flexibility of the API will be exposed, there are issues for all of the missing features. Question is whether we have only one method that exposes all or multiple. I would postpone the decision until we address some of the open issues, e.g. #19.

Partly addressed here: #33.

On whether to keep the existing single component function, see #34.

On the iterator, see #31.

llucax · 2024-04-02T15:50:23Z

src/frequenz/client/reporting/_client.py

+                yield Sample(timestamp=entry.timestamp, value=entry.value)
+
+    # pylint: disable=too-many-arguments
+    async def _iterate_components_data_pages(


I would probably expose this publicly, in case someone wants to get pages for some reason, I don't see a reason to hide it.

Even more, now with the code in front of my eyes, I think the best approach would be to list_microgrid_components_data() return an object with a few iterable properties:

async for page in client.list_microgrid_components_data(...).pages: ... async for sample in client.list_microgrid_components_data(...).samples: ...

So list_microgrid_components_data() could return a Response object with properties like (pseudo-code using the current method names):

@property async def samples(self) -> AsyncIterator[Sample]: return self.client._iterate_single_metric(...) @property async def pages(self) -> AsyncIterator[ComponentsDataPage]: return self.client._iterate_components_data_pages(...)

This design might need a bit more thought, so maybe we should wait on it and discuss it in a separate issue.

Iterating over paginated API methods frequenz-client-base-python#41

I would probably expose this publicly, in case someone wants to get pages for some reason, I don't see a reason to hide it.

Honestly I don't see a reason to expose it now. If we realize this becomes important, it's easy to expose. Because I don't see a use-case where you would want the pages that you cannot achieve with the iterator.

Even more, now with the code in front of my eyes, I think the best approach would be to list_microgrid_components_data()

That could be an option. But do you have a good example where having pages are better than an iterator over the samples?

Because I don't see a use-case where you would want the pages that you cannot achieve with the iterator.

Performance. We tested this and iterating over items was considerably slower than going through pages.

Again, maybe because of data is used when doing data science, at the end is the same for that use case, but I think the API client should be pretty flexible and general purpose. And again, I'm thinking more long term, not just about what we need right now. For the right now I agree it is fine and we can improve later, but if we agree about this, I would create issues about it so:

We don't forget

It is clear for everyone what's the future direction we want to take, so issues with that future direction can also be raised early if anyone sees any.

llucax · 2024-04-02T15:52:56Z

src/frequenz/client/reporting/_client.py

+    async def close(self) -> None:
+        """Close the client and cancel any pending requests immediately."""
+        await self._grpc_channel.close(grace=None)
+
+    async def __aenter__(self) -> "ReportingClient":
+        """Enter the async context."""
+        return self
+
+    async def __aexit__(
+        self,
+        _exc_type: Type[BaseException] | None,
+        _exc_val: BaseException | None,
+        _exc_tb: Any | None,
+    ) -> bool | None:
+        """
+        Exit the asynchronous context manager.
+
+        Note that exceptions are not handled here, but are allowed to propagate.
+
+        Args:
+            _exc_type: Type of exception raised in the async context.
+            _exc_val: Exception instance raised.
+            _exc_tb: Traceback object at the point where the exception occurred.
+
+        Returns:
+            None, allowing any exceptions to propagate.
+        """
+        await self.close()
+        return None


This might be a good foundation for a GrpcApiClient class that can live in https://github.com/frequenz-floss/frequenz-client-base-python/!

It could also standarize the constructor, I think right now some clients take a grpc channel and some take a connection string. I think we should go with the later to remove the explicit dependency on grpc for downstream projects.

Create a GrpcApiClient base class frequenz-client-base-python#42

Sounds good

From #16 (comment)

Raised here: #16 (comment)

From #16 (comment)

cwasicki requested a review from a team as a code owner March 15, 2024 20:49

github-actions bot added part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests part:tooling Affects the development tooling (CI, deployment, dependency management, etc.) labels Mar 15, 2024

cwasicki added 2 commits March 15, 2024 21:50

Add common client and reporting API dependencies

d6460ac

Signed-off-by: cwasicki <[email protected]>

cwasicki self-assigned this Mar 15, 2024

cwasicki added 2 commits March 15, 2024 22:00

Add code example for reporting client usage

8697782

Signed-off-by: cwasicki <[email protected]>

Add initial unit tests

d2a98f6

Signed-off-by: cwasicki <[email protected]>

cwasicki force-pushed the client branch from 0a63a48 to d2a98f6 Compare March 15, 2024 21:04

cwasicki commented Mar 15, 2024

View reviewed changes

Update release notes

1190207

Signed-off-by: cwasicki <[email protected]>

cwasicki force-pushed the client branch from 8b20fac to 1190207 Compare March 15, 2024 21:12

flora-hofmann-frequenz approved these changes Mar 18, 2024

View reviewed changes

tiyash-basu-frequenz reviewed Mar 18, 2024

View reviewed changes

examples/client.py Show resolved Hide resolved

src/frequenz/client/reporting/_client.py Show resolved Hide resolved

tiyash-basu-frequenz approved these changes Mar 20, 2024

View reviewed changes

cwasicki merged commit 79726f9 into frequenz-floss:v0.x.x Mar 20, 2024
14 checks passed

cwasicki deleted the client branch March 20, 2024 14:52

llucax reviewed Apr 2, 2024

View reviewed changes

llucax mentioned this pull request Apr 2, 2024

Create a GrpcApiClient base class frequenz-floss/frequenz-client-base-python#42

Closed

cwasicki mentioned this pull request Apr 12, 2024

Fix return type for next_page_token method #28

Merged

github-merge-queue bot pushed a commit that referenced this pull request Apr 12, 2024

Fix return type for next_page_token method (#28)

0281586

From #16 (comment)

This was referenced Apr 12, 2024

Rename ReportingClient to ReportingApiClient #29

Merged

Rename raw pb message attribute #32

Open

Make ComponentsDataPage an iterator #36

Merged

github-merge-queue bot pushed a commit that referenced this pull request Apr 12, 2024

Make ComponentsDataPage an iterator (#36)

267955a

Raised here: #16 (comment)

github-merge-queue bot pushed a commit that referenced this pull request Apr 18, 2024

Rename ReportingClient to ReportingApiClient (#29)

4baac8b

From #16 (comment)

		_data_pb: PBListMicrogridComponentsDataResponse
		"""The underlying protobuf message."""

		return self._data_pb.pagination_info.next_page_token


		class ReportingClient:

Introduce simple reporting client with support for single metric historical data #16

Introduce simple reporting client with support for single metric historical data #16

Conversation

cwasicki commented Mar 15, 2024 • edited Loading

Choose a reason for hiding this comment

flora-hofmann-frequenz left a comment

Choose a reason for hiding this comment

tiyash-basu-frequenz left a comment

Choose a reason for hiding this comment

llucax left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cwasicki Apr 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llucax Apr 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cwasicki commented Mar 15, 2024 •

edited

Loading

cwasicki Apr 12, 2024 •

edited

Loading

llucax Apr 2, 2024 •

edited

Loading