Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add server OS information to telemetry stats #23793

Merged
merged 5 commits into from
Oct 20, 2018

Conversation

joshdover
Copy link
Contributor

@joshdover joshdover commented Oct 3, 2018

Summary

Fixes #21978

Adds 4 new keys to the kibana_stats.os object in the monitoring index that tracks the Kibana server OS. I'm using the getos package that the Reporting plugin also uses to get more detailed distribution data when running on Linux.

"platform": "linux",
"platformRelease": "4.4.0-134-generic",

// only present when running on linux
"distro": "Ubuntu Linux",
"distroRelease": "Ubuntu Linux-12.04"

The release keys include both the name and the version number. I put these in the same key so that when we rollup these values for telemetry purposes the data makes sense. Rolling up just the version number by itself is meaningless at best, and could lead to inaccurate analysis at worst. If two Kibana instances are both running on version 10 of two different OSes (eg. windows and linux), rolling these up to say there are two 10.0 versions running doesn't really mean what we want.

Also adds new os key to the kibana_stats key of data included in telemetry stats. This represents rolled up values similar to how we roll up version counts.

{ 
  "_source": {
    "kibana_stats": {
      // ...
      "os": {
        "platforms": [{"platform": "linux", "count": 1}],
        "platformReleases": [{"platformRelease": "linux-4.0", "count": 1}],
        "distros": [{"distros": "Ubuntu Linux", "count": 1}]
        "distroReleases": [{"distroRelease": "Ubuntu Linux-14.04", "count": 1}]
      }
    }
  }
}

@joshdover joshdover added WIP Work in progress Team:Operations Team label for Operations Team v7.0.0 v6.5.0 labels Oct 3, 2018
@elasticmachine
Copy link
Contributor

💔 Build Failed

@AlonaNadler
Copy link

AlonaNadler commented Oct 3, 2018

In general LGTM,
How will non linux be mapped, example Microsoft Server 2008 R2?

I want to make sure, when its not Linux, we cover all information (and if not what will not be captured)

  • Microsoft Windows 7 (64-bit)
  • Microsoft Server 2008 R2
  • iMac/MacBook computers 2009
  • OSX 10.11

@joshdover
Copy link
Contributor Author

joshdover commented Oct 4, 2018

@AlonaNadler Yes, you'll be able to tell which Windows they're running by referencing the "release" version against this table: https://en.wikipedia.org/wiki/Comparison_of_Microsoft_Windows_versions for Mac, use: https://en.wikipedia.org/wiki/Darwin_(operating_system)

After looking through this more I'm going to pare down what we track to simply be this:

    "platform": "linux",
    "platformRelease": "linux-4.4.0-134-generic",

    // only present when running on linux
    "distro": "Ubuntu Linux",
    "distroRelease": "Ubuntu Linux-12.04"

@joshdover joshdover added review and removed WIP Work in progress labels Oct 4, 2018
@joshdover joshdover requested review from tsullivan and pickypg October 4, 2018 20:51
@elasticmachine
Copy link
Contributor

💚 Build Succeeded

fetch: () => {
return buffer.flush();
fetch: async () => {
return await buffer.flush();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If buffer.flush() returns a promise, then isn't this fundamentally the same as the previous code? A returned promise will have to be awaited by the caller either way, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spencer got me in this habit of awaiting to make it more clear what was going on and to prevent bugs when you want to catch any exceptions that an async function might throw. If you were to add try/catch around this without awaiting you would never catch anything.


if (os) {
incrementByKey(cluster.os.platform, os.platform);
incrementByKey(cluster.os.distro, os.distro);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this when the function is processing Logstash and Beats info? LMK if you have questions about how to do that

Copy link
Contributor Author

@joshdover joshdover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with Beats and Logstash data and things look good. The os key does not appear on those parts of the telemetry payload and nothing breaks 😄

fetch: () => {
return buffer.flush();
fetch: async () => {
return await buffer.flush();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spencer got me in this habit of awaiting to make it more clear what was going on and to prevent bugs when you want to catch any exceptions that an async function might throw. If you were to add try/catch around this without awaiting you would never catch anything.

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💔 Build Failed

@joshdover
Copy link
Contributor Author

retest

@elasticmachine
Copy link
Contributor

💔 Build Failed

@joshdover
Copy link
Contributor Author

retest

Copy link
Member

@tsullivan tsullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending green build!

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@joshdover joshdover merged commit 3fc8b35 into elastic:master Oct 20, 2018
joshdover added a commit to joshdover/kibana that referenced this pull request Oct 20, 2018
* Add server OS data to monitoring collector and telemetry

* Fixup naming

* Fix functional tests
joshdover added a commit that referenced this pull request Oct 20, 2018
* Add server OS data to monitoring collector and telemetry

* Fixup naming

* Fix functional tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review Team:Operations Team label for Operations Team v6.5.0 v7.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants