Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mysql docs #313

Merged
merged 4 commits into from
Mar 31, 2020
Merged

Add mysql docs #313

merged 4 commits into from
Mar 31, 2020

Conversation

mtojek
Copy link
Contributor

@mtojek mtojek commented Mar 31, 2020

Changes:

Resolves: #278

@mtojek mtojek requested a review from ruflin March 31, 2020 11:14
@mtojek mtojek self-assigned this Mar 31, 2020

| Field | Description | Type |
|---|---|---|
| nginx.access.agent | Alias for field "user_agent.original" | alias |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about all the ECS fields that are used as part of the access dataset? They should also be documented here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue created for this.

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing missing here is the updating the fields.yml files. The problem becomes obvious with the nginx access logs. Historically there was just one massive fields.yml for Filebeat where all the fields were shipped together and we had a tree of definitions: Global, ECS, Module, Fileset. Now that we have a template per Dataset, all the fields used must be defined on the Dataset level. This has the advantage that the template becomes much more compact but brings the challenge, there is no easy way to tell which fields from Global, ECS etc. are used in the dataset. We can't just take all of ECS as this is too many fields and not all are used.

Having it documented separately also solves an other issue: It allows us to document how an ECS fields is exactly used in a dataset. What does source.ip exactly mean in the context of nginx.access logs. So far, we only had a generic doc but now we can fill in the details.

@@ -8,7 +8,13 @@ streams:
- default:
- root:secret@tcp(127.0.0.1:3306)/
name: hosts
- default: secret
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing here all the additional fields which are need to make the UI nicer. See https://github.com/elastic/package-registry/blob/master/dev/packages/example/nginx-1.2.0/dataset/access/manifest.yml#L17 for an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is mostly manual work here. Once I fix it, it would be harder to automatically import beats. Should all options have these properties?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most fields will need it, at least part of the options. And yes, this will probably rely on manual work as it is not something we have today.

@@ -4,5 +4,17 @@ release: beta
type: metrics
streams:
- input: mysql/metrics
vars:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same like below for the UI part

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded above

}
```

{{fields "stubstatus"}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to see such a feature. I wonder if this should become part of the overall build step instead of the import part. I'm good having it here at the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep it here at the moment, later on we can think about refactoring this part.

An example event for nginx looks as following:

```$json
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did you take this even from? Existing from Beat? This will probably look slightly different when shipped through the agent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will refresh the samples once I run the agent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| Field | Description | Type |
|---|---|---|
| nginx.access.agent | Alias for field "user_agent.original" | alias |
| nginx.access.body_sent.bytes | Alias for field "http.response.body.bytes" | alias |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These alias fields are not part of the even anymore. These were only used for migration from 6.x to 7.x and are not needed anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the offline discussion - I will skip all the alias fields with the migration: true property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the PR as I think this is already a big step forward and we can further iterate on it.

@ruflin
Copy link
Contributor

ruflin commented Mar 31, 2020

@mtojek An other thing that just came to my mind is that as part of the migration we should also start thinking about optimising the dashboards. For example the query for event.dataset should become a query for stream.dataset which is a constant keyword and much more efficient. The gent should always add these fields (@michalpristas will know more if this already happens).

I don't think we need to do all these optimisations directly during the migration but could be done later on. But we should start a checklist with these already now so we don't forget.

@mtojek
Copy link
Contributor Author

mtojek commented Mar 31, 2020

@mtojek An other thing that just came to my mind is that as part of the migration we should also start thinking about optimising the dashboards. For example the query for event.dataset should become a query for stream.dataset which is a constant keyword and much more efficient. The gent should always add these fields (@michalpristas will know more if this already happens).

Ok, but this is a new requirement that I wasn't aware before. Probably will have to adjust the dashboard generating process. I will create an issue for this.

EDIT:
Issue: #315

@mtojek
Copy link
Contributor Author

mtojek commented Mar 31, 2020

One thing missing here is the updating the fields.yml files. The problem becomes obvious with the nginx access logs. Historically there was just one massive fields.yml for Filebeat where all the fields were shipped together and we had a tree of definitions: Global, ECS, Module, Fileset. Now that we have a template per Dataset, all the fields used must be defined on the Dataset level. This has the advantage that the template becomes much more compact but brings the challenge, there is no easy way to tell which fields from Global, ECS etc. are used in the dataset. We can't just take all of ECS as this is too many fields and not all are used.

Having it documented separately also solves an other issue: It allows us to document how an ECS fields is exactly used in a dataset. What does source.ip exactly mean in the context of nginx.access logs. So far, we only had a generic doc but now we can fill in the details.

#316

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Template functions inside README file
2 participants