-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial support for configurable file identity tracking #18748
Add initial support for configurable file identity tracking #18748
Conversation
Pinging @elastic/integrations-services (Team:Services) |
💔 Build FailedExpand to view the summary
Build stats
Steps errorsExpand to view the steps failures
Log outputExpand to view the last 100 lines of log output
|
7396cf7
to
717253d
Compare
❕ Build Aborted
Expand to view the summary
Build stats
Test stats 🧪
Test errorsExpand to view the tests failures
Steps errorsExpand to view the steps failures
Log outputExpand to view the last 100 lines of log output
|
jenkins run tests |
filebeat/input/file/comparator.go
Outdated
// IsSameFile determines if two states belong to the same file. | ||
IsSameFile(*State, *State) bool | ||
// IsEmptyState checks if state information is initialized. | ||
IsEmptyState(*State) bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking all places where IsEmptyState
is called, I think we can remove it. The IsEmpty predicate is used to check if 'States' does hold an entry matching the ID. This could be solved by adapting the interface of FindPrevious
.
e3673a9
to
7193221
Compare
filebeat/registrar/registrar.go
Outdated
if !exists { | ||
idx[id] = state | ||
idx[state.Id] = state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the IdentifierName
is changed we also change the ID. Do we need to compensate for this in the registrar? E.g. by having a map of maps like map[identifier]map[id]state
?
What could happen if we don't adjust for it? Maybe we end up with duplicate entries for files because the ID was changed after the identifier type was changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am removing the outdated state in the log
input: https://github.com/elastic/beats/pull/18748/files#diff-8f0e85354c10fc4b55e2f144249abc02R185 Do you think we need more cleaning up besides this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the removal can lead to problems. By removing we rely on another state update event. Otherwise the entry is gone for good and a filebeat restart will force recollection.
Even with state update event, the remove+reinsert operation is not atmoic. The state update event can even be blocked for an undetermined amount of time (or even get lost if the input is stopped via autodiscovery), which leaves us with a file without state in the registry. => recollection
ebf0a1e
to
c2ef667
Compare
94015cd
to
ce2da44
Compare
bd46142
to
11fef2c
Compare
@jsoref Thanks for the review! I've updated the docs with your suggestions. |
I did run the tests locally on windows 10 and Linux with success. Last commit ID I tested: 75b7725 |
Filebeat tests on windows with commit a9b61f2 passed for me locally |
…18748) This PR adds a new option to the `log` input of Filebeat named `file_identity`. The option lets users configure file identity for state tracking. 1. `native` (default): Filebeat identifies files based on their inode and device id. 2. `path`: Files are considered different if they have different paths. 3. `inode_marker`: A special marker file and the inode is used to tell apart files. It is not supported on Windows. State IDs previously were not saved to the registry file. Now, these are persisted on the disk. I introduced a new interface: `file.StateIdentifier`. The responsibility of `StateIdentifier` is to generate an identifier for a `file.State` based on the configuration. If someone wants to implement their own `StateIdentifier` method, all they need is to create a struct which satisfies this interface. ```golang // StateIdentifier generates an ID for a State. type StateIdentifier interface { // GenerateID generates and returns the ID of the state GenerateID(State) (stateId, identifierType string) } ``` As every state has an ID, Filebeat just compares the IDs of the two states to decide if they belong to the same file or not. The scope of the PR does not include strategies which include fingerprinting the contents of the file. (cherry picked from commit 8ff6894)
@dedemorton Could you please review the documentation of this feature again? |
…19885) This PR adds a new option to the `log` input of Filebeat named `file_identity`. The option lets users configure file identity for state tracking. 1. `native` (default): Filebeat identifies files based on their inode and device id. 2. `path`: Files are considered different if they have different paths. 3. `inode_marker`: A special marker file and the inode is used to tell apart files. It is not supported on Windows. State IDs previously were not saved to the registry file. Now, these are persisted on the disk. I introduced a new interface: `file.StateIdentifier`. The responsibility of `StateIdentifier` is to generate an identifier for a `file.State` based on the configuration. If someone wants to implement their own `StateIdentifier` method, all they need is to create a struct which satisfies this interface. ```golang // StateIdentifier generates an ID for a State. type StateIdentifier interface { // GenerateID generates and returns the ID of the state GenerateID(State) (stateId, identifierType string) } ``` As every state has an ID, Filebeat just compares the IDs of the two states to decide if they belong to the same file or not. The scope of the PR does not include strategies which include fingerprinting the contents of the file. (cherry picked from commit 8ff6894)
…18748) ## What does this PR do? This PR adds a new option to the `log` input of Filebeat named `file_identity`. The option lets users configure file identity for state tracking. ### Available strategies 1. `native` (default): Filebeat identifies files based on their inode and device id. 2. `path`: Files are considered different if they have different paths. 3. `inode_marker`: A special marker file and the inode is used to tell apart files. It is not supported on Windows. State IDs previously were not saved to the registry file. Now, these are persisted on the disk. ### Architecture I introduced a new interface: `file.StateIdentifier`. The responsibility of `StateIdentifier` is to generate an identifier for a `file.State` based on the configuration. If someone wants to implement their own `StateIdentifier` method, all they need is to create a struct which satisfies this interface. ```golang // StateIdentifier generates an ID for a State. type StateIdentifier interface { // GenerateID generates and returns the ID of the state GenerateID(State) (stateId, identifierType string) } ``` As every state has an ID, Filebeat just compares the IDs of the two states to decide if they belong to the same file or not. The scope of the PR does not include strategies which include fingerprinting the contents of the file.
What does this PR do?
This PR adds a new option to the
log
input of Filebeat namedfile_identity
. The option lets users configure file identity for state tracking.Available strategies
native
(default): Filebeat identifies files based on their inode and device id.path
: Files are considered different if they have different paths.inode_marker
: A special marker file and the inode is used to tell apart files. It is not supported on Windows.State IDs previously were not saved to the registry file. Now these are persisted on the disk.
Architecture
I introduced a new interface:
file.StateIdentifier
. The responsibility ofStateIdentifier
is to generate an identifier for afile.State
based on the configuration. If someone wants to implement their ownStateIdentifier
method, all they need is to create a struct which satisfies this interface.As every state has an ID, Filebeat just compares the IDs of the two states to decide if they belong to the same file or not.
The scope of the PR does not include strategies which include fingerprinting the contents of the file.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test locally
Test cases
file_identity.path: ~
to the input configuration