-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce simple reporting client with support for single metric historical data #16
Conversation
Signed-off-by: cwasicki <[email protected]>
This is the initial release of the Reporting API client, streamlined for retrieving single metric historical data for a single component. It incorporates pagination handling and utilizes a wrapper data class that retains the raw protobuf response while offering transformation capabilities limited here to generators of structured data representation via named tuples. Current limitations include a single metric focus with plans for extensibility, ongoing development for states and bounds integration, as well as support for service-side features like resampling, streaming, and formula aggregations. Signed-off-by: cwasicki <[email protected]>
Signed-off-by: cwasicki <[email protected]>
Signed-off-by: cwasicki <[email protected]>
@@ -29,6 +29,8 @@ requires-python = ">= 3.11, < 4" | |||
# TODO(cookiecutter): Remove and add more dependencies if appropriate | |||
dependencies = [ | |||
"typing-extensions >= 4.5.0, < 5", | |||
"frequenz-api-reporting >= 0.1.1, < 1", | |||
"frequenz-client-common @ git+https://github.com/frequenz-floss/[email protected]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be updated with the upcoming release.
Signed-off-by: cwasicki <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I arrived quite late to the party, sorry, but it is actually a good think, I wouldn't have wanted to block the PR with this feedback, at least some of it is more long term.
I will create separate issues for things that need further discussion.
Sample = namedtuple("Sample", ["timestamp", "value"]) | ||
"""Type for a sample of a time series.""" | ||
|
||
MetricSample = namedtuple( | ||
"MetricSample", ["timestamp", "microgrid_id", "component_id", "metric", "value"] | ||
) | ||
"""Type for a sample of a time series incl. metric type, microgrid and component ID""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not a dataclass? Performance reasons or just because it is more idiomatic for data science? If the reason is the former, it would be good to provide a benchmark to show the difference; if it is the later, it might be good to add a comment, and eventually explain what are the design principles for this library in the README or something, but for now, as we keep learning about what we really need, I would just put a comment in the code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one I think should be answered here as it is not worth a separate issue, unless this triggers a bigger discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to keep it as low-level as possible, since this is the fundamental data structure that is repeated for every single sample. So for me the question is, why should we use a dataclass instead? For further analysis the data is typically transformed into an array-like (e.g. pandas data frame) data type anyway, which nicely works with this data type (originally it was just a tuple until @shsms passed by).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed, that was my main question. As for why dataclasses, just because they are a higher level abstraction, and are more flexible, but if this is targeting data science and for that use case tuples make more sense, then I agree we should use that. I just think we should make it more clear, as said above, a comment in the code should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_data_pb: PBListMicrogridComponentsDataResponse | ||
"""The underlying protobuf message.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this _pb_data
instead (or _pb_message
). We call it _raw
in the SDK. Maybe it would be nice to agree on some standard so all clients use the same, although it is not exposed so it is not that important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea to standardize, I am fine with all suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return True | ||
return False | ||
|
||
def iterate_metric_samples(self) -> Generator[MetricSample, None, None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just make the class iterable directly (also Iterator[MetricSample]
1 is a shorter and more idiomatic alias for Generator[MetricSample, None, None]
2)
def __iter__(self) -> Iterator[MetricSample]:
...
Then you can just do for sample in page:
instead of for sample in page.iterate_metric_samples():
.
Footnotes
-
Technically a
Generator
is more than anIterator
:extends iterators with the send(), throw() and close() methods
But in this context I think we only care about the iterator part, we are using a generator out of convenience, not because we are actually using the extra methods provided by a generator. ↩
-
From
typing.Generator
:Alternatively, annotate your generator as having a return type of either Iterable[YieldType] or Iterator[YieldType]:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a small change, so we can also briefly discuss here and could be quickly changed in the current repo. The broader discussion is now here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterators also provides lazy iteration. Generators are just more powerful as they also let you pass messages between the iterator and the outside world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) | ||
|
||
@property | ||
def next_page_token(self) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should return str
, right? If you want it to return an opaque type, then use object
instead, as Any
will completely disable type checking, doing wrong stuff like t = p.next_page_token; t.i_dont_exist()
will not be detected by mypy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest also discussing this one here, and only create a separate issue if it is not trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, should we create an issue or do you plan to do a quick fix directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return self._data_pb.pagination_info.next_page_token | ||
|
||
|
||
class ReportingClient: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably also standarize this name scheme. I think in the microgrid API I used just ApiClient
, as it is expected to be used as microgrid.ApiClient
. I don't mind using xxx.ApiClient
or XxxApiClient
but I think we should call it ApiClient
because otherwise it might be confusing for some APIs if the term client have some other meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest discussing this in a meeting, I guess it should be quicker in a more interactive way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decision was to use XxxApiClient
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._stub = ReportingStub(self._grpc_channel) | ||
|
||
# pylint: disable=too-many-arguments | ||
async def iterate_single_metric( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't name these methods iterate
, not sure if they are a common idiom in data science but otherwise I'd say it is not a common idiom in general in python. I think client methods should map more or less one to one to the gRPC API methods. In this case I think the name should be: list_microgrid_components_data()
. I think it is fine to do the page unwrapping and still use the same name as the gRPC API method.
Again, I think we should have a common approach in all APIs.
PS: I guess the extra explicitness in names might come from untyped python, in that case it makes more sense, but when using type hints you can already see you are iterating (AsyncIterator
) over a single metric ([Sample]
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO it is pretty clear that Python API client method names should match the gRPC API, so I suggest to create a separate issue about this only if anyway thinks differently.
As for how to iterate, please refer to:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't name these methods iterate,
PS: I guess the extra explicitness in names might come from untyped python, in that case it makes more sense, but when using type hints you can already see you are iterating (AsyncIterator) over a single metric ([Sample]).
Agree, can remove this. Actually the explicitness is to distinguish from different access methods that I considered, e.g. alternative to the iterator a dictionary representing all nesting levels. At the moment I am not sure whether it will come.
In this case I think the name should be: list_microgrid_components_data()
Not sure about this. The method offers only a small subset of the potential functionality of the gRPC function. And the current name reflects this limitation. Also, since the current version is sufficient for many applications already, I am not sure whether this method will be extended or another method will be added, that provides full support of the gRPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmm, OK, I see. In general I think clients should only be a thin wrapper over the gRPC calls, and other more convenient functions could be built on top. I mean, I'm completely fine to keep it as it is for an early version, especially if we need something fast, but in the long term I don't think it is the right approach. If the API is more flexible, then some users must need that flexibility, otherwise the gRPC API we have is more complex than needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the API is more flexible, then some users must need that flexibility, otherwise the gRPC API we have is more complex than needed.
Yes, the flexibility of the API will be exposed, there are issues for all of the missing features. Question is whether we have only one method that exposes all or multiple. I would postpone the decision until we address some of the open issues, e.g. #19.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yield Sample(timestamp=entry.timestamp, value=entry.value) | ||
|
||
# pylint: disable=too-many-arguments | ||
async def _iterate_components_data_pages( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably expose this publicly, in case someone wants to get pages for some reason, I don't see a reason to hide it.
Even more, now with the code in front of my eyes, I think the best approach would be to list_microgrid_components_data()
return an object with a few iterable properties:
async for page in client.list_microgrid_components_data(...).pages:
...
async for sample in client.list_microgrid_components_data(...).samples:
...
So list_microgrid_components_data()
could return a Response
object with properties like (pseudo-code using the current method names):
@property
async def samples(self) -> AsyncIterator[Sample]:
return self.client._iterate_single_metric(...)
@property
async def pages(self) -> AsyncIterator[ComponentsDataPage]:
return self.client._iterate_components_data_pages(...)
This design might need a bit more thought, so maybe we should wait on it and discuss it in a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably expose this publicly, in case someone wants to get pages for some reason, I don't see a reason to hide it.
Honestly I don't see a reason to expose it now. If we realize this becomes important, it's easy to expose. Because I don't see a use-case where you would want the pages that you cannot achieve with the iterator.
Even more, now with the code in front of my eyes, I think the best approach would be to list_microgrid_components_data()
That could be an option. But do you have a good example where having pages are better than an iterator over the samples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I don't see a use-case where you would want the pages that you cannot achieve with the iterator.
Performance. We tested this and iterating over items was considerably slower than going through pages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, maybe because of data is used when doing data science, at the end is the same for that use case, but I think the API client should be pretty flexible and general purpose. And again, I'm thinking more long term, not just about what we need right now. For the right now I agree it is fine and we can improve later, but if we agree about this, I would create issues about it so:
- We don't forget
- It is clear for everyone what's the future direction we want to take, so issues with that future direction can also be raised early if anyone sees any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async def close(self) -> None: | ||
"""Close the client and cancel any pending requests immediately.""" | ||
await self._grpc_channel.close(grace=None) | ||
|
||
async def __aenter__(self) -> "ReportingClient": | ||
"""Enter the async context.""" | ||
return self | ||
|
||
async def __aexit__( | ||
self, | ||
_exc_type: Type[BaseException] | None, | ||
_exc_val: BaseException | None, | ||
_exc_tb: Any | None, | ||
) -> bool | None: | ||
""" | ||
Exit the asynchronous context manager. | ||
|
||
Note that exceptions are not handled here, but are allowed to propagate. | ||
|
||
Args: | ||
_exc_type: Type of exception raised in the async context. | ||
_exc_val: Exception instance raised. | ||
_exc_tb: Traceback object at the point where the exception occurred. | ||
|
||
Returns: | ||
None, allowing any exceptions to propagate. | ||
""" | ||
await self.close() | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a good foundation for a GrpcApiClient
class that can live in https://github.com/frequenz-floss/frequenz-client-base-python/!
It could also standarize the constructor, I think right now some clients take a grpc channel and some take a connection string. I think we should go with the later to remove the explicit dependency on grpc for downstream projects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This initial version of the reporting API client offers focused functionality to retrieve historical data for a single metric from a single component.
Features of this version:
Limitations of this version:
SimpleMetricSample
exclusively, with decision on how to integrateAggregatedMetricSample
pending. ->