-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: sincedb_write issue on Windows machines #283
Conversation
Thanks Rishie, interesting issue - however the PR is not a good solution (ideally we want to fix the underlying issue). Believe this code (only used on Windows): https://github.com/logstash-plugins/logstash-input-file/pull/183/files#diff-30ed4826858d9eb6361d311895e1024281a256c05ae31c25592820758d86e074R210 might lead to issues (and potentially exposes a bug in the underlying JRuby library). But I do not think we need the low-level Could you try running with the following patch: diff --git lib/filewatch/sincedb_collection.rb lib/filewatch/sincedb_collection.rb
index b7dc19b..62881b7 100644
--- lib/filewatch/sincedb_collection.rb
+++ lib/filewatch/sincedb_collection.rb
@@ -232,7 +232,7 @@ module FileWatch
# @return expired keys
def non_atomic_write(time)
- IO.open(IO.sysopen(@full_path, "w+")) do |io|
+ IO.open(@full_path, "w+") do |io|
@serializer.serialize(@sincedb, io, time.to_f)
end
end
and report back whether you've seen the issue ... Also if you could pls clarify how often the issue surfaces and what kind of Windows + FS storage are you using. |
Hi! Of course! I don't have that much time anymore because of the baby but I tried your change in our test server and it crashed with "Error: no implicit conversion of String into Integer". I guess open takes an Integer? I have never done any Ruby programming so I'm pretty lost here. Right now I would say it happens a few times a month. But when we had a lot of inputs with empty paths (path => []) it happened everyday. We found out that was not how you're supposed to do it and have removed the inputs with empty paths since then but it still happens every now and then. One obvious problem with having several inputs with path => [] is that they all get the same sincedb filename generated which I guess causes the issue to occur more often, I have no idea. So in the test server I have artificially added a lot of empty inputs with empty paths to try and test this issue. We have different versions of Windows Server running so I have seen it happen on Windows Server 2012 R2 and Windows Server 2016 Standard. We're slightly above my pay grade here but they're using some sort of virtualized drives with NTFS. |
Hey, congrats on the 👶 ! Regarding the patch, I am sorry (got interrupted in the middle of my response) the correct one should have been: diff --git lib/filewatch/sincedb_collection.rb lib/filewatch/sincedb_collection.rb
index b7dc19b..d053b27 100644
--- lib/filewatch/sincedb_collection.rb
+++ lib/filewatch/sincedb_collection.rb
@@ -232,7 +232,7 @@ module FileWatch
# @return expired keys
def non_atomic_write(time)
- IO.open(IO.sysopen(@full_path, "w+")) do |io|
+ File.open(@full_path, "w+") do |io|
@serializer.serialize(@sincedb, io, time.to_f)
end
end |
Thank you! No I agree it would be better to find the root cause. Trying the new patch now in our test server, no immediate crashes. Will run it for a few days to see the effect. As a reference point it happened 3 times between midnight and noon today (12 hours) with the old code in the modified test server where it happens more often. I guess we are kind of rare because we're running on Windows servers and we are using start_position => beginning. If we use start_position => end, which is default, then of course there's no problem when the plugin restarts. But we chose beginning because if for example Logstash is down, or just stopped, for 5 hours for some reason then we would lose 5 hours of logs, at least that was our reasoning, maybe it's incorrect. |
On our Windows servers we occasionally get exceptions with classes IOError and Java::JavaLang:RuntimeException when writing to sincedb file which crashes the plugin and somehow messes with the sincedb file which results in old log files being reprocessed causing duplicates. This change eliminates those exceptions.
The exceptions haven't occurred for over 24 hours! Looks very promising. I changed the pull request to your change instead and will update all our servers with the patch. I'll report back on Monday (if I don't forget) with results. |
Great, thanks for the update. Let's wait a bit than and we shall include the updated PR in a next patch release. |
Looked through the logs and found no issues on any of our servers! The best proof is that it hasn't happened for 5 days on the test server where it used to happen several times a day with the old code. 👍 |
That's great, thanks for confirming. |
We have experienced the same issue in our Windows Server environment since January 2021, after an upgrade of logstash-input-file plugin (bin/logstash-plugin update logstash-input-file) it looks to fix an issue. Previously pipelines crashed after few (1-2) hours of running with error message "The handle is invalid". Right now logstash is stable for over 20 hours. Thanks! I noticed this issue first after an upgrading from LS 7.8.2 to 7.10.2 |
On our Windows servers we for some reason, occasionally, get exceptions with classes IOError and Java::JavaLang:RuntimeException when writing to sincedb file which crashes the plugin and somehow messes with the sincedb file which results in old log files being reprocessed causing duplicates. If we rescue the errors then everything is fine which is the reason for this change.
This is an example of the IOError we get:
This is an example of the Java::JavaLang:RuntimeException we get:
With the change we get:
and