Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat][vSphere] Support for configurable IntervalId for performance API #40678

Conversation

kush-elastic
Copy link
Collaborator

@kush-elastic kush-elastic commented Sep 3, 2024

  • Enhancement

The vSphere Performance API offers support for Intervals and Levels, allowing users to collect various types of counters and metrics from vSphere environments. Depending on the configured Interval, the API provides detailed or aggregated performance data from vSphere.

For more information, refer to the following resources:

We can add support for configuration that help users collect the specific performance data they need.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 3, 2024
@kush-elastic kush-elastic self-assigned this Sep 3, 2024
Copy link
Contributor

mergify bot commented Sep 3, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @kush-elastic? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@kush-elastic kush-elastic added enhancement module Metricbeat Metricbeat Module:beat The beat Metricbeat module Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team labels Sep 3, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 3, 2024
@kush-elastic kush-elastic marked this pull request as ready for review September 5, 2024 05:06
@kush-elastic kush-elastic requested a review from a team as a code owner September 5, 2024 05:06
@kush-elastic kush-elastic requested a review from a team as a code owner September 5, 2024 05:42
CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved
metricbeat/docs/modules/vsphere.asciidoc Outdated Show resolved Hide resolved
m.Logger().Errorf("failed to convert performance data to metric series: %v", err)
}

for _, result := range results[0].Value {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make sure that results has at least one value like we have done for samples. Or it could be a potential panic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another check before it if there are zero samples for results then it will return from there itself and if there is any issue while converting it to series we will return with error.

Copy link
Contributor

@devamanv devamanv Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got your point. However, looking at the definition of ToMetricSeries, I still see two potential issues with it

  • If the type assertion s, ok := series[i].(*types.PerfEntityMetric) fails, the function panics. This could lead to a crash unless you want this behavior. Should we return an error instead of panicking to handle unexpected input more gracefully?
  • The line v := s.Value[j].(*types.PerfMetricIntSeries) assumes that all metrics in s.Value are of type *types.PerfMetricIntSeries. If that's not guaranteed (or there are other possible types), this will panic.

Both cases could potentially result in a crash. Unless we intend to abort the execution or expect this to happen, we should be handling it.

Copy link
Member

@shmsr shmsr Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Aman. The only case it is returning an error is when CounterInfoByKey fails. Even if it passes; it doesn't guarantee that the returned slice from ToMetricSeries will have at least 1 element.

So, we should ideally handle it here just to be on the safe side.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right about panic.
We should probably handle it.
Either we handle panic or we need to manually convert to the series.
I can create new method as well because we are already doing CounterInfoByName. I can just reuse it.
And for other stuff we can return error instead of Panic.
What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding recover to make sure code does not fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kush-elastic I'm thinking if instead of just logging, would setting the returned error value to this would be more useful? From the official Go blog about recover:

The convention in the Go libraries is that even when a package uses panic internally, its external API still presents explicit error return values

One such example would be this. Here it explicitly sets the error value for the caller to handle it.
Either way, use of recover looks like a good way to handle. I will let you decide on how to treat this error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it.

Copy link
Member

@shmsr shmsr Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked this pretty quick; will request you to verify as well as I am busy in some other work:

Query: https://github.com/vmware/govmomi/blob/c1151f859ba649771c4ab60992953019904355fe/performance/manager.go#L268

Query calls QueryPerf and that makes sure that the types are right: https://github.com/vmware/govmomi/blob/c1151f859ba649771c4ab60992953019904355fe/simulator/performance_manager.go#L219

Except for length, I don't think we need to check for types.

Copy link
Member

@shmsr shmsr Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can only iff https://github.com/vmware/govmomi/blob/c1151f859ba649771c4ab60992953019904355fe/simulator/performance_manager.go#L249 is done. But you are not setting the format in here:

spec := types.PerfQuerySpec{
		Entity:     hst.Reference(),
		MetricId:   metricIds,
		MaxSample:  1,
		IntervalId: refreshRate,
	}

So safe.


# Real-time data collection – An ESXi Server collects data for each performance counter every 20 seconds by default.
# Supported Periods:
# The Datastore and Host metricsets support performance data collection using the vSphere performance API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we also mention that this will not impact the metrics collection other than perf metrics.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea. let me do that.

Copy link
Contributor

mergify bot commented Sep 10, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 36-vsphere-support-for-configurable-intervalid-for-performance-api upstream/36-vsphere-support-for-configurable-intervalid-for-performance-api
git merge upstream/main
git push upstream 36-vsphere-support-for-configurable-intervalid-for-performance-api

Copy link
Contributor

@ishleenk17 ishleenk17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you incorporate the documentation change, GTG

@kush-elastic kush-elastic merged commit c75a7a4 into elastic:main Sep 11, 2024
29 checks passed
mergify bot pushed a commit that referenced this pull request Sep 11, 2024
…ance API (#40678)

* initial commit for intervalId supports for performance metrics

* update docs and fix CI

* Add changelog entry

* fix CI

* resolve review comments

* fix loggers

* resolved review comments

* update versions

* update UTs

* update integration tests

* 10s -> 20s

* Update CHANGELOG.next.asciidoc

Co-authored-by: Aman <[email protected]>

* Update metricbeat/docs/modules/vsphere.asciidoc

Co-authored-by: Aman <[email protected]>

* make update

* add recover for ToMetricSeries panic

* return error instead just logging it.

* remove restriction of interval IDs

* remove unnecessary validations

* remove recover and add empty condition

* update changelog entry

* Fix wrapping of errors in loggers

* update data.json

* update data.json

* fix CI and loggers

* update changelog entries

* make update

* fix changelog entries

* update changelog entry

---------

Co-authored-by: Aman <[email protected]>
(cherry picked from commit c75a7a4)
@ishleenk17 ishleenk17 added the backport-8.15 Automated backport to the 8.15 branch with mergify label Sep 15, 2024
mergify bot pushed a commit that referenced this pull request Sep 15, 2024
…ance API (#40678)

* initial commit for intervalId supports for performance metrics

* update docs and fix CI

* Add changelog entry

* fix CI

* resolve review comments

* fix loggers

* resolved review comments

* update versions

* update UTs

* update integration tests

* 10s -> 20s

* Update CHANGELOG.next.asciidoc

Co-authored-by: Aman <[email protected]>

* Update metricbeat/docs/modules/vsphere.asciidoc

Co-authored-by: Aman <[email protected]>

* make update

* add recover for ToMetricSeries panic

* return error instead just logging it.

* remove restriction of interval IDs

* remove unnecessary validations

* remove recover and add empty condition

* update changelog entry

* Fix wrapping of errors in loggers

* update data.json

* update data.json

* fix CI and loggers

* update changelog entries

* make update

* fix changelog entries

* update changelog entry

---------

Co-authored-by: Aman <[email protected]>
(cherry picked from commit c75a7a4)
ishleenk17 added a commit that referenced this pull request Sep 15, 2024
…e IntervalId for performance API (#40833)

* [Metricbeat][vSphere] Support for configurable IntervalId for performance API (#40678)

* initial commit for intervalId supports for performance metrics

* update docs and fix CI

* Add changelog entry

* fix CI

* resolve review comments

* fix loggers

* resolved review comments

* update versions

* update UTs

* update integration tests

* 10s -> 20s

* Update CHANGELOG.next.asciidoc

Co-authored-by: Aman <[email protected]>

* Update metricbeat/docs/modules/vsphere.asciidoc

Co-authored-by: Aman <[email protected]>

* make update

* add recover for ToMetricSeries panic

* return error instead just logging it.

* remove restriction of interval IDs

* remove unnecessary validations

* remove recover and add empty condition

* update changelog entry

* Fix wrapping of errors in loggers

* update data.json

* update data.json

* fix CI and loggers

* update changelog entries

* make update

* fix changelog entries

* update changelog entry

---------

Co-authored-by: Aman <[email protected]>
(cherry picked from commit c75a7a4)

* Update CHANGELOG.next.asciidoc

---------

Co-authored-by: Kush Rana <[email protected]>
Co-authored-by: Ishleen Kaur <[email protected]>
@kush-elastic kush-elastic added the backport-8.x Automated backport to the 8.x branch with mergify label Sep 18, 2024
kush-elastic added a commit that referenced this pull request Oct 1, 2024
… IntervalId for performance API (#40897)

* [Metricbeat][vSphere] Support for configurable IntervalId for performance API (#40678)

* initial commit for intervalId supports for performance metrics

* update docs and fix CI

* Add changelog entry

* fix CI

* resolve review comments

* fix loggers

* resolved review comments

* update versions

* update UTs

* update integration tests

* 10s -> 20s

* Update CHANGELOG.next.asciidoc

Co-authored-by: Aman <[email protected]>

* Update metricbeat/docs/modules/vsphere.asciidoc

Co-authored-by: Aman <[email protected]>

* make update

* add recover for ToMetricSeries panic

* return error instead just logging it.

* remove restriction of interval IDs

* remove unnecessary validations

* remove recover and add empty condition

* update changelog entry

* Fix wrapping of errors in loggers

* update changelog entry

---------

Co-authored-by: Aman <[email protected]>
(cherry picked from commit c75a7a4)

* fix changelog entry

* fix changelog entry

* remove extra entries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-8.15 Automated backport to the 8.15 branch with mergify enhancement Metricbeat Metricbeat Module:beat The beat Metricbeat module module Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants