Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce simple reporting client with support for single metric historical data #16

Merged
merged 5 commits into from
Mar 20, 2024

Conversation

cwasicki
Copy link
Contributor

@cwasicki cwasicki commented Mar 15, 2024

This initial version of the reporting API client offers focused functionality to retrieve historical data for a single metric from a single component.

Features of this version:

  • Pagination Handling: Seamlessly request time series data of a single metric from a single component, with integrated pagination management.
  • Data Transformation: Utilize a wrapper data class that retains the raw protobuf response while offering transformation capabilities. A generator function is provided for iterating over individual page values.
  • Structured Data: Streamline data representation through named tuples for timestamp and value pairs, eliminating ambiguity for single component and metric scenarios.
  • Usage Examples: Code examples demonstrate usage, along with experimental code for upcoming features (considered for removal).
  • Unit Testing: Basic unit tests are included.

Limitations of this version:

@cwasicki cwasicki requested a review from a team as a code owner March 15, 2024 20:49
@github-actions github-actions bot added part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests part:tooling Affects the development tooling (CI, deployment, dependency management, etc.) labels Mar 15, 2024
This is the initial release of the Reporting API client, streamlined for
retrieving single metric historical data for a single component. It incorporates
pagination handling and utilizes a wrapper data class that retains the raw
protobuf response while offering transformation capabilities limited here
to generators of structured data representation via named tuples.

Current limitations include a single metric focus with plans for extensibility,
ongoing development for states and bounds integration, as well as support for
service-side features like resampling, streaming, and formula aggregations.

Signed-off-by: cwasicki <[email protected]>
@cwasicki cwasicki self-assigned this Mar 15, 2024
@@ -29,6 +29,8 @@ requires-python = ">= 3.11, < 4"
# TODO(cookiecutter): Remove and add more dependencies if appropriate
dependencies = [
"typing-extensions >= 4.5.0, < 5",
"frequenz-api-reporting >= 0.1.1, < 1",
"frequenz-client-common @ git+https://github.com/frequenz-floss/[email protected]",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated with the upcoming release.

Signed-off-by: cwasicki <[email protected]>
Copy link
Contributor

@flora-hofmann-frequenz flora-hofmann-frequenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link

@tiyash-basu-frequenz tiyash-basu-frequenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cwasicki cwasicki merged commit 79726f9 into frequenz-floss:v0.x.x Mar 20, 2024
14 checks passed
@cwasicki cwasicki deleted the client branch March 20, 2024 14:52
Copy link

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I arrived quite late to the party, sorry, but it is actually a good think, I wouldn't have wanted to block the PR with this feedback, at least some of it is more long term.

I will create separate issues for things that need further discussion.

Comment on lines +34 to +40
Sample = namedtuple("Sample", ["timestamp", "value"])
"""Type for a sample of a time series."""

MetricSample = namedtuple(
"MetricSample", ["timestamp", "microgrid_id", "component_id", "metric", "value"]
)
"""Type for a sample of a time series incl. metric type, microgrid and component ID"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a dataclass? Performance reasons or just because it is more idiomatic for data science? If the reason is the former, it would be good to provide a benchmark to show the difference; if it is the later, it might be good to add a comment, and eventually explain what are the design principles for this library in the README or something, but for now, as we keep learning about what we really need, I would just put a comment in the code here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I think should be answered here as it is not worth a separate issue, unless this triggers a bigger discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to keep it as low-level as possible, since this is the fundamental data structure that is repeated for every single sample. So for me the question is, why should we use a dataclass instead? For further analysis the data is typically transformed into an array-like (e.g. pandas data frame) data type anyway, which nicely works with this data type (originally it was just a tuple until @shsms passed by).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed, that was my main question. As for why dataclasses, just because they are a higher level abstraction, and are more flexible, but if this is targeting data science and for that use case tuples make more sense, then I agree we should use that. I just think we should make it more clear, as said above, a comment in the code should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#35

Comment on lines +47 to +48
_data_pb: PBListMicrogridComponentsDataResponse
"""The underlying protobuf message."""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this _pb_data instead (or _pb_message). We call it _raw in the SDK. Maybe it would be nice to agree on some standard so all clients use the same, although it is not exposed so it is not that important.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to standardize, I am fine with all suggestions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#32

return True
return False

def iterate_metric_samples(self) -> Generator[MetricSample, None, None]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just make the class iterable directly (also Iterator[MetricSample]1 is a shorter and more idiomatic alias for Generator[MetricSample, None, None]2)

    def __iter__(self) -> Iterator[MetricSample]:
        ...

Then you can just do for sample in page: instead of for sample in page.iterate_metric_samples():.

Footnotes

  1. Technically a Generator is more than an Iterator:

    extends iterators with the send(), throw() and close() methods

    But in this context I think we only care about the iterator part, we are using a generator out of convenience, not because we are actually using the extra methods provided by a generator.

  2. From typing.Generator:

    Alternatively, annotate your generator as having a return type of either Iterable[YieldType] or Iterator[YieldType]:

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a small change, so we can also briefly discuss here and could be quickly changed in the current repo. The broader discussion is now here:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just make the class iterable directly (also Iterator[MetricSample]1 is a shorter and more idiomatic alias for Generator[MetricSample, None, None]2)

Agree, will change this. In general a generator was used for lazy iteration, but that doesn't need to be reflected in the type annotation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterators also provides lazy iteration. Generators are just more powerful as they also let you pass messages between the iterator and the outside world.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#36

)

@property
def next_page_token(self) -> Any:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return str, right? If you want it to return an opaque type, then use object instead, as Any will completely disable type checking, doing wrong stuff like t = p.next_page_token; t.i_dont_exist() will not be detected by mypy.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest also discussing this one here, and only create a separate issue if it is not trivial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, should we create an issue or do you plan to do a quick fix directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#28

return self._data_pb.pagination_info.next_page_token


class ReportingClient:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably also standarize this name scheme. I think in the microgrid API I used just ApiClient, as it is expected to be used as microgrid.ApiClient. I don't mind using xxx.ApiClient or XxxApiClient but I think we should call it ApiClient because otherwise it might be confusing for some APIs if the term client have some other meaning.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest discussing this in a meeting, I guess it should be quicker in a more interactive way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decision was to use XxxApiClient, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#29

self._stub = ReportingStub(self._grpc_channel)

# pylint: disable=too-many-arguments
async def iterate_single_metric(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't name these methods iterate, not sure if they are a common idiom in data science but otherwise I'd say it is not a common idiom in general in python. I think client methods should map more or less one to one to the gRPC API methods. In this case I think the name should be: list_microgrid_components_data(). I think it is fine to do the page unwrapping and still use the same name as the gRPC API method.

Again, I think we should have a common approach in all APIs.

PS: I guess the extra explicitness in names might come from untyped python, in that case it makes more sense, but when using type hints you can already see you are iterating (AsyncIterator) over a single metric ([Sample]).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it is pretty clear that Python API client method names should match the gRPC API, so I suggest to create a separate issue about this only if anyway thinks differently.

As for how to iterate, please refer to:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't name these methods iterate,

PS: I guess the extra explicitness in names might come from untyped python, in that case it makes more sense, but when using type hints you can already see you are iterating (AsyncIterator) over a single metric ([Sample]).

Agree, can remove this. Actually the explicitness is to distinguish from different access methods that I considered, e.g. alternative to the iterator a dictionary representing all nesting levels. At the moment I am not sure whether it will come.

In this case I think the name should be: list_microgrid_components_data()

Not sure about this. The method offers only a small subset of the potential functionality of the gRPC function. And the current name reflects this limitation. Also, since the current version is sufficient for many applications already, I am not sure whether this method will be extended or another method will be added, that provides full support of the gRPC.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmm, OK, I see. In general I think clients should only be a thin wrapper over the gRPC calls, and other more convenient functions could be built on top. I mean, I'm completely fine to keep it as it is for an early version, especially if we need something fast, but in the long term I don't think it is the right approach. If the API is more flexible, then some users must need that flexibility, otherwise the gRPC API we have is more complex than needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the API is more flexible, then some users must need that flexibility, otherwise the gRPC API we have is more complex than needed.

Yes, the flexibility of the API will be exposed, there are issues for all of the missing features. Question is whether we have only one method that exposes all or multiple. I would postpone the decision until we address some of the open issues, e.g. #19.

Copy link
Contributor Author

@cwasicki cwasicki Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partly addressed here: #33.

On whether to keep the existing single component function, see #34.

On the iterator, see #31.

yield Sample(timestamp=entry.timestamp, value=entry.value)

# pylint: disable=too-many-arguments
async def _iterate_components_data_pages(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably expose this publicly, in case someone wants to get pages for some reason, I don't see a reason to hide it.

Even more, now with the code in front of my eyes, I think the best approach would be to list_microgrid_components_data() return an object with a few iterable properties:

async for page in client.list_microgrid_components_data(...).pages:
    ...

async for sample in client.list_microgrid_components_data(...).samples:
    ...

So list_microgrid_components_data() could return a Response object with properties like (pseudo-code using the current method names):

@property
async def samples(self) -> AsyncIterator[Sample]:
    return self.client._iterate_single_metric(...)

@property
async def pages(self) -> AsyncIterator[ComponentsDataPage]:
    return self.client._iterate_components_data_pages(...)

This design might need a bit more thought, so maybe we should wait on it and discuss it in a separate issue.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably expose this publicly, in case someone wants to get pages for some reason, I don't see a reason to hide it.

Honestly I don't see a reason to expose it now. If we realize this becomes important, it's easy to expose. Because I don't see a use-case where you would want the pages that you cannot achieve with the iterator.

Even more, now with the code in front of my eyes, I think the best approach would be to list_microgrid_components_data()

That could be an option. But do you have a good example where having pages are better than an iterator over the samples?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I don't see a use-case where you would want the pages that you cannot achieve with the iterator.

Performance. We tested this and iterating over items was considerably slower than going through pages.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, maybe because of data is used when doing data science, at the end is the same for that use case, but I think the API client should be pretty flexible and general purpose. And again, I'm thinking more long term, not just about what we need right now. For the right now I agree it is fine and we can improve later, but if we agree about this, I would create issues about it so:

  1. We don't forget
  2. It is clear for everyone what's the future direction we want to take, so issues with that future direction can also be raised early if anyone sees any.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#31

Comment on lines +261 to +289
async def close(self) -> None:
"""Close the client and cancel any pending requests immediately."""
await self._grpc_channel.close(grace=None)

async def __aenter__(self) -> "ReportingClient":
"""Enter the async context."""
return self

async def __aexit__(
self,
_exc_type: Type[BaseException] | None,
_exc_val: BaseException | None,
_exc_tb: Any | None,
) -> bool | None:
"""
Exit the asynchronous context manager.

Note that exceptions are not handled here, but are allowed to propagate.

Args:
_exc_type: Type of exception raised in the async context.
_exc_val: Exception instance raised.
_exc_tb: Traceback object at the point where the exception occurred.

Returns:
None, allowing any exceptions to propagate.
"""
await self.close()
return None
Copy link

@llucax llucax Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a good foundation for a GrpcApiClient class that can live in https://github.com/frequenz-floss/frequenz-client-base-python/!

It could also standarize the constructor, I think right now some clients take a grpc channel and some take a connection string. I think we should go with the later to remove the explicit dependency on grpc for downstream projects.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests part:tooling Affects the development tooling (CI, deployment, dependency management, etc.)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants