Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep different registry entry per container stream #7281

Merged
merged 7 commits into from
Jun 12, 2018

Conversation

exekias
Copy link
Contributor

@exekias exekias commented Jun 6, 2018

This PR adds a metadata map to prospectors state. Adding metadata to states allows having a different state per unique metadata for the same file.

This change is used by the docker prospector to have different states per stream, as some users configure a container prospector per stream (stdout & stderr).

Probably fixes #7045

@exekias exekias added enhancement in progress Pull request is currently in progress. Filebeat Filebeat labels Jun 6, 2018
@exekias exekias added bug needs_backport PR is waiting to be backported to other branches. v6.3.0 and removed enhancement labels Jun 6, 2018
@elastic elastic deleted a comment from houndci-bot Jun 6, 2018
@exekias
Copy link
Contributor Author

exekias commented Jun 6, 2018

Hound wanted me to rename Id field to ID, but that would spawn a broader change, I don't think it's worth it, as this field wasn't added on this PR.

@exekias exekias added review and removed in progress Pull request is currently in progress. labels Jun 7, 2018
Timestamp time.Time `json:"timestamp"`
TTL time.Duration `json:"ttl"`
Type string `json:"type"`
Id string `json:"-"` // local unique id to make comparison more efficient

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct field Id should be ID

@exekias exekias changed the title Filebeat registry meta Keep different registry entry per container stream Jun 7, 2018
@exekias exekias force-pushed the filebeat-registry-meta branch 4 times, most recently from 36773a7 to 6d23c58 Compare June 7, 2018 11:13
}
}

// ID returns a unique id for the state as a string
func (s *State) ID() string {
// Generate id on first request. This is needed as id is not set when converting back from json
if s.Id == "" {
s.Id = s.FileStateOS.String()
h, _ := hashstructure.Hash(s.Meta, nil)
s.Id = s.FileStateOS.String() + fmt.Sprintf("-%d", h)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ouch... we optimized ID generation by removing Sprintf and reduce allocations. The ID generation used to be one of the worst offenders in CPU profiles. This change makes it worse again. How about:

if s.Meta == nil {
  s.Id = s.FileStateOS.String()
} else {
  hashValue, _ := hashstructure.Hash(s.Meta, nil)
  var hashBuf [17]byte
  hash, _ := strconv.AppendUint(h[:0], hashValue, 16)
  hash = append(hash, '-')

  fileID := s.FileStateOS.String()  

  var b strings.Builder
  b.Grow(len(hash) + len(fileID))
  b.Write(hash)
  b.WriteString(fileID)

  s.Id = b.String()
}

This still does execute an extra alloc, but only in the docker case and should be less expensive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like a good idea, pushing this change

@@ -49,5 +56,5 @@ func (s *State) IsEqual(c *State) bool {

// IsEmpty returns true if the state is empty
func (s *State) IsEmpty() bool {
return *s == State{}
return reflect.DeepEqual(*s, State{})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need DeepEqual?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about tmp := *s; tmp.Meta = nil; return tmp == State{} && len(s.Meta) == 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler still says it cannot compare maps, are your concerns about performance? I could do manual comparison for all fields, or use 2 structs like:

type State struct {
   Common
   Meta        map[string]string `json:"meta"`
}

then:

if len(s.Meta) > 0 { return false }
return *s.Common == Common{}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a little performance, but I'd also prefer to keep out reflect from most packages.

I wonder if checking for FileStateOS or the timestamp only would be enough. This method is used to determine if we have a valid/usable state or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have enough experience with all the possible situations with the registry so I'll trust you here on what to do 😇

Copy link
Contributor Author

@exekias exekias Jun 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this to use s.FileStateOS == file.StateOS{} && s.Timestamp.IsZero() ..., also pinging @ruflin for awareness

@exekias exekias force-pushed the filebeat-registry-meta branch 3 times, most recently from e23da83 to 3c9e268 Compare June 7, 2018 14:42
@exekias
Copy link
Contributor Author

exekias commented Jun 8, 2018

This should be ready for another review, thanks for all the help @urso!

@exekias exekias force-pushed the filebeat-registry-meta branch 2 times, most recently from 447d970 to 966866d Compare June 8, 2018 09:35
@exekias
Copy link
Contributor Author

exekias commented Jun 8, 2018

jenkins, retest this please

Carlos Pérez-Aradros Herce added 3 commits June 8, 2018 13:18
This allows us to tag a source file with different metadata, to keep
independent read streams.
Configs using multiple streams won't share state, which was causing
issues.
@exekias exekias force-pushed the filebeat-registry-meta branch from 966866d to f66d075 Compare June 8, 2018 11:18
@exekias
Copy link
Contributor Author

exekias commented Jun 8, 2018

jenkins, retest this please

@urso urso requested a review from ruflin June 8, 2018 15:22

self.render_config_template(
type='docker',
input_raw='''
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw. check out the textwrap package. This allows you to indent multiline strings within the code more properly.

@urso
Copy link

urso commented Jun 8, 2018

LGTM and filebeat CI is green. Waiting for review from @ruflin

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a minor question.

The change LGTM. The part I didn't fully understand yet is why in same log file data from stderr and stdout end up. Is that if 2 different services are sending data out of a single container?

@@ -49,5 +72,8 @@ func (s *State) IsEqual(c *State) bool {

// IsEmpty returns true if the state is empty
func (s *State) IsEmpty() bool {
return *s == State{}
return s.FileStateOS == file.StateOS{} &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share some details here on why not only add s.Meta but also the other chnages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I added a map to the State struct, a simple comparison is no longer possible (Go doesn't implement map compare), I had to update the code to do an individual comparison on the struct fields

@exekias
Copy link
Contributor Author

exekias commented Jun 12, 2018

Docker json-file format stores both stderr and stdout in the same file. They use a JSON object per line, with the following schema:

{"log":"example line\n","stream":"stdout","time":"2018-04-13T08:38:08.395077018Z"}
{"log":"example line\n","stream":"stderr","time":"2018-04-13T08:38:08.395077018Z"}

Our docker input can parse that and demultiplex the 2 different streams, as some times you want to do different processing on them (for instance, pass them to 2 different filesets).

Some containers use this to send different log per stream, for instance nginx: https://github.com/nginxinc/docker-nginx/blob/f603bb3632ea6df0bc9da2179c18eb322c286298/mainline/stretch/Dockerfile#L92-L93

@ruflin ruflin merged commit d37c114 into elastic:master Jun 12, 2018
exekias added a commit to exekias/beats that referenced this pull request Jun 12, 2018
This PR adds a metadata map to prospectors state. Adding metadata to states allows having a different state per unique metadata for the same file.

This change is used by the `docker` prospector to have different states per stream, as some users configure a container prospector per stream (`stdout` & `stderr`).

Probably fixes elastic#7045

(cherry picked from commit d37c114)
@exekias exekias added v6.3.0 and removed needs_backport PR is waiting to be backported to other branches. labels Jun 12, 2018
urso pushed a commit that referenced this pull request Jun 14, 2018
… stream (#7300)

Cherry-pick of PR #7281 to 6.3 branch. Original message: 

This PR adds a metadata map to prospectors state. Adding metadata to states allows having a different state per unique metadata for the same file.

This change is used by the `docker` prospector to have different states per stream, as some users configure a container prospector per stream (`stdout` & `stderr`).

Probably fixes #7045
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…ntainer stream (elastic#7300)

Cherry-pick of PR elastic#7281 to 6.3 branch. Original message: 

This PR adds a metadata map to prospectors state. Adding metadata to states allows having a different state per unique metadata for the same file.

This change is used by the `docker` prospector to have different states per stream, as some users configure a container prospector per stream (`stdout` & `stderr`).

Probably fixes elastic#7045
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

filebeat docker input stores wrong offset
4 participants