Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: SDK support for model monitoring #1249

Merged
merged 83 commits into from
Jul 28, 2022
Merged

feat: SDK support for model monitoring #1249

merged 83 commits into from
Jul 28, 2022

Conversation

rosiezou
Copy link
Contributor

@rosiezou rosiezou commented May 23, 2022

This patch only adds model monitoring implementation for models deployed to an endpoint. The batch prediction use case will be addressed separately in future PRs.

To-do list:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes b/231988321 🦕

@product-auto-label product-auto-label bot added size: l Pull request size is large. api: vertex-ai Issues related to the googleapis/python-aiplatform API. labels May 23, 2022
@rosiezou rosiezou marked this pull request as draft May 23, 2022 04:02
@rosiezou rosiezou force-pushed the model-monitoring branch from 6481291 to 68b6fa4 Compare May 26, 2022 17:34
@rosiezou rosiezou force-pushed the model-monitoring branch from a154c08 to 327611c Compare June 7, 2022 18:44
@sasha-gitg sasha-gitg added do not merge Indicates a pull request not ready for merge, due to either quality or timing. and removed do not merge Indicates a pull request not ready for merge, due to either quality or timing. labels Jun 9, 2022
@rosiezou rosiezou marked this pull request as ready for review June 15, 2022 18:33
@rosiezou rosiezou requested review from a team as code owners June 16, 2022 01:33


class RandomSampleConfig(_SamplingStrategy):
def __init__(self, sample_rate: Optional[float] = 1):
Copy link
Contributor

@dizcology dizcology Jul 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change this or clarify the behavior when sample_rate is None.

google/cloud/aiplatform/jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/jobs.py Show resolved Hide resolved
return (
gca_model_deployment_monitoring_job.ModelDeploymentMonitoringScheduleConfig(
monitor_interval=duration_pb2.Duration(
seconds=self.monitor_interval * 3600
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversion can be surprising. If the original schedule config (defined in the service protocol) expresses this in seconds, why not keep using seconds (and rename the variable as something like monitor_interval_seconds) instead of int hours? How will the user express "every 10 minutes" using the ScheduleConfig class here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model monitoring only supports hourly schedules. even if the user specifies something like 1.6, it'll be rounded up to the next hour

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so under the original protocol, even if the user passes something like seconds = 3500, it'll get rounded up to 3600 behind the scenes

Copy link
Contributor

@dizcology dizcology Jul 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the current behavior of the service (rounding to hours), this makes sense. Do we know that the service will not be updated to support more fine-grained monitor interval? If that happens do we have an easy path of updating the library to support that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have sync'd with @qijing93 offline and there's no additional support planned for fine-grained monitor intervals.

@rosiezou rosiezou requested a review from a team as a code owner July 22, 2022 23:49
@nayaknishant nayaknishant added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jul 27, 2022
@jaycee-li jaycee-li removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jul 27, 2022
@rosiezou rosiezou added the owlbot:run Add this label to trigger the Owlbot post processor. label Jul 27, 2022
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jul 27, 2022
@rosiezou rosiezou merged commit 18c88d1 into main Jul 28, 2022
@rosiezou rosiezou deleted the model-monitoring branch July 28, 2022 00:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API. size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants