Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage Collection] Add description field to schema #89685

Closed
afharo opened this issue Jan 29, 2021 · 15 comments · Fixed by #92701
Closed

[Usage Collection] Add description field to schema #89685

afharo opened this issue Jan 29, 2021 · 15 comments · Fixed by #92701
Assignees
Labels
Feature:Telemetry Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@afharo
Copy link
Member

afharo commented Jan 29, 2021

As we increase the number of usage collectors in the platform and the fields they report, it's becoming harder to understand what's the purpose of each collected field.

We could enable support to a description field in the schema to let developers add some documentation to them.

Ideally, every field should come with a description, but we can start off by making it optional and transitioning to required once all the plugins have documented their schemas.

The scope of this issue might include documenting the Platform collectors as they can help to show other teams some best practices.

@afharo afharo added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Telemetry labels Jan 29, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@afharo
Copy link
Member Author

afharo commented Jan 29, 2021

cc @alexfrancoeur & @thesmallestduck

@Bamieh
Copy link
Member

Bamieh commented Jan 29, 2021

It would be awesome if we start working on this feature in 7.13.

@alexfrancoeur
Copy link

+1, we're having some early discussions around introducing descriptions and a view in Kibana that describes fields. Assuming this gets in, it could also be picked up to provide context across Kibana (tooltips, etc.). Ideally, Elastisearch provides the default descriptions but I don't believe we have an issue opened there yet. If this were to get picked up, I could see some fun automation from @mindbat and the team on updating index patterns with new descriptions for fields in use #89726

@Bamieh
Copy link
Member

Bamieh commented Feb 2, 2021

It would be great to have those description available in tooltips etc in kibana on our clusters. But i think we'd have great value in having the source of those descriptions living in kibana itself.

Mainly because devs will be the ones populating those descriptions while they are adding metrics instead of retroactively chasing people to provide field descriptions after the work has been done. Similar to the field mappings mouse-chase we had previously.

We can also add validation and checks via our telemetry schema checker.

We had a previous effort to describe the fields we have in an excel sheet and that initiative almost failed due to the lack of activity.

@alexfrancoeur
Copy link

++ I wholeheartedly agree that we want these descriptions living in Kibana itself in the schemas defined. I was thinking more along the lines from an analyst perspective if we ever committed to building this feature in Kibana. As @thesmallestduck and I go through this auditing exercise, I think it's evident that we need to push ownership of these descriptions out. I was simply sharing that in the future, we may be able to leverage these descriptions in product and not necessarily in an internal data dictionary.

We had a previous effort to describe the fields we have in an excel sheet and that initiative almost failed due to the lack of activity.

That effort is ongoing 😄

@joshdover
Copy link
Contributor

@afharo Should we also be coordinating with the Elasticsearch team here so that we could get descriptions for telemetry fields that are populated from ES?

@afharo
Copy link
Member Author

afharo commented Feb 9, 2021

@joshdover I think so, yes! While working on #90273 I noticed that most of the fields coming from _cluster/stats are documented. But we'll need their input on anything coming from GET _xpack/usage.

I was waiting for that PR to be merged to kick off the conversations re the descriptions. Just in case it was pushed back in the end 🙃

@TinaHeiligers
Copy link
Contributor

TinaHeiligers commented Feb 18, 2021

As a first step towards supporting the data dictionary effort, we'll use this issue to add support for a field level description.

To keep the scope manageable and kick start the effort of backfilling descriptions for existing fields, this issue will be limited to adding support for optional descriptions at the field level and provide an example of how to add a description.

We foresee a future need to decorate fields with more meta-like information and to reduce the potential noise schema-level changes will have on all the plugins/services that report usage data, we propose the following format:

const arbitraryCollectorSchema = {
  foo: {
    bar_counts_7_days: {
      type: 'long',
      _meta: {
        description: 'Total number of times bar was called over the last 7 days',
      }
    },
    bar_baz: {
      type: 'keyword',
      _meta: {
        description: 'bar baz given, indicates xyz',
      },
    },
  },
};

We're using a _meta property to indicate that the data gives semantic information about the field, as this structure appears to be gaining popularity. See Semantic data for fields and Add _meta information to fields where this applied.

I envisage follow up issues to:

  • backfill the descriptions that we've already captured
  • [Meta] issue for teams to add descriptions to their collector fields with a deadline for when the description will change from being optional to mandatory.

@alexfrancoeur @thesmallestduck Do we plan on enforcing the field level descriptions within the 7.x timeframe and use 8.0 as a target to introduce an internal breaking change?

@TinaHeiligers
Copy link
Contributor

TinaHeiligers commented Feb 19, 2021

I've added support for _meta: { description: 'some descriptive text goes here }and now need to know what we want to do about the_meta` property with schema extraction (telemetry tools).

As an initial test, I ran the telemetry_check, hoping it would fail but it worked fine!

However, as is, the _meta property is extracted and added to a collector's mapping. As an example, I've added descriptions to the localization usage collector:

export function registerLocalizationUsageCollector(
  usageCollection: UsageCollectionSetup,
  i18n: I18nServiceSetup
) {
  const collector = usageCollection.makeUsageCollector<UsageStats>({
    type: 'localization',
    isReady: () => true,
    fetch: createCollectorFetch(i18n),
    schema: {
      locale: {
        type: 'keyword',
        _meta: {
          description: 'The default locale set on the Kibana system',
        },
      },
      integrities: {
        DYNAMIC_KEY: {
          type: 'text',
          _meta: {
            description:
              'Translation file hash. If the hash is different it indicates that a custom translation file is used',
          },
        },
      },
      labelsCount: {
        type: 'long',
        _meta: {
          description: 'The number of translated labels',
        },
      },
    },
  });

  usageCollection.registerCollector(collector);
}

and after running the telemetry tools to check and 'fix' the file, we get:

    "localization": {
      "properties": {
        "locale": {
          "type": "keyword",
          "_meta": {
            "description": "The default locale set on the Kibana system"
          }
        },
        "integrities": {
          "properties": {
            "DYNAMIC_KEY": {
              "type": "text",
              "_meta": {
                "description": "Translation file hash. If the hash is different it indicates that a custom translation file is used"
              }
            }
          }
        },
        "labelsCount": {
          "type": "long",
          "_meta": {
            "description": "The number of translated labels"
          }
        }
      }
    },

AFAIK, we're not using the mapping files to automatically update mappings, so is it a problem that these are added?

If it isn't a problem, then it would be great to have all the descriptions embedded in the .json files for us to parse out at some later point.

If it is a problem and we don't want to extract the _meta: { description: '....' } fields into the mappings files, I'm not sure exactly where in the telemetry tools we need to add the logic to exclude it.

@Bamieh @afharo it's been a while since I've looked at that code and need some help figuring out where the best place to start will be.
Thanks!

@Bamieh
Copy link
Member

Bamieh commented Feb 22, 2021

@TinaHeiligers I believe we do want to keep the _meta fields in the mapping json files. Having them there will allow us to use these fields. The json files are the only 'contract' we provide about the collectors.

As far as the telemetry parser to update the mappings: it should not care about the meta object since it specifically chooses the fields it uses.

@TinaHeiligers
Copy link
Contributor

@Bamieh I like the idea of keeping the _meta: { description: 'Some descriptive text about this field} in the mapping.json files otherwise they'll still be scattered across the individual collector schemas. We could think of adding an enhancement to the telemetry_tool to only extract the descriptions and the fields themselves (including the collectors that are reporting those fields) at a later stage when we start thinking about automation.

@TinaHeiligers
Copy link
Contributor

As far as the telemetry parser to update the mappings: it should not care about the meta object since it specifically chooses the fields it uses.

@Bamieh Do you mind explaining this a bit more please?

  • I don't quite know what you mean by "should not care about the meta object". Does this mean we should just make sure it parses the _meta object correctly?
  • "...it specifically chooses the fields it uses" Does that mean we need to make sure we don't or do include the descriptions?

@Bamieh
Copy link
Member

Bamieh commented Feb 23, 2021

We could think of adding an enhancement to the telemetry_tool to only extract the descriptions and the fields themselves (including the collectors that are reporting those fields) at a later stage when we start thinking about automation.

Since the mapping json files contain all the collectors data: We'd play around with the automation tools responsible for extracting those files rather than tinkering with the kibana telemetry_tool.

As far as the telemetry parser to update the mappings: it should not care about the meta object since it specifically chooses the fields it uses.
@Bamieh Do you mind explaining this a bit more please?

  • I don't quite know what you mean by "should not care about the meta object". Does this mean we should just make sure it parses the _meta object correctly?
  • "...it specifically chooses the fields it uses" Does that mean we need to make sure we don't or do include the descriptions?

The telemetry automation script to update the mappings (both in telemetry-next and legacy telemetry) choose the type fields and items fields to grab the correct ES mapping of the field. Adding _meta should not affect any of those automation scripts we have.

For now the automation scripts will not look at the _meta fields. we can ask infra to create a new script to parse the _meta_.description fields and create the appropriate actions.

TLDR; adding the _meta field to the json files we provide in kibana should not break any automation script implementations form the telemetry cluster side. To parse the descriptions and use them the best action is to create new automation scripts to do so.

@afharo
Copy link
Member Author

afharo commented Feb 25, 2021

Since we clearly defined the scope of this issue to add support for the new field and to describe the Platform collectors as an example and proof that it works, I've created this meta issue to track all the other follow-ups that may come in future releases: #92781

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Telemetry Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants