-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Filebeat] Error with registry and truncated file #35571
Comments
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
In TestFilestreamTruncateBlockedOutput we see occasional errors with this test where the expected offset is never equal to the offset in the registry. after changing the test a little, I can see that the registry still has the offset of the original write, it doesn't have the offset after the second write or the truncate. Also the file does have a size of 0 on disk so the truncate function seems to have worked. This looks like a real bug in the registry. It can be easily reproduced on arm64 Linux under UTM on OS X with:
On OS X itself it rarely happens.
|
I got something, there are two possible code paths that "detect" a file truncation, that depends on whether the harvester is running or not. Here is the output of the test running on a successful and failed run:
The test is run with the following script: run_test.sh
#!/bin/bash
while
echo "=================================================="
go test -count 1 -run TestFilestreamTruncateBlockedOutput -tags=integration
do true ; done
In the first case the truncation is detected by the This is not the code path that is reading the file. In the second case (that leads to test failure) is the "reading the file" codepath, when It seems that by the time the prospector runs again, the file already contains more data so it does not perceive it as a truncation. The exact commit I used with extra debug messages: belimawr@342f93a |
I believe I understood the root of the problem: The file watcher relies on the modification time to identify a truncate/write event: beats/filebeat/input/filestream/fswatch.go Lines 157 to 171 in 4af3e60
However if the file is written to, read by Filebeat than truncated all of this within the same second (which is more than enough time to read a few log lines on a test), the file watcher will not detect it, leading to no registry change, hence the test fails with a timeout while waiting to read the |
This definitely seems like a real bug then, one that has been there for quite a while looking at the last time this code block was touched. |
💔 Build Failed
Expand to view the summary
Build stats
Start Time: 2023-05-24T19:59:22.849+0000
Duration: 132 min 20 sec
Test stats 🧪
Steps errors
Expand to view the steps failures
filebeat-goIntegTest - mage goIntegTest
mage goIntegTest
filebeat-goIntegTest - mage goIntegTest
mage goIntegTest
x-pack/metricbeat-goIntegTest - mage goIntegTest
mage goIntegTest
Dump mage variables
mage dumpVariables
Error signal
Error "hudson.AbortException: script returned exit code 1"
The text was updated successfully, but these errors were encountered: