Sane flush #91

nukemberg · 2012-01-27T08:09:23Z

Added flush cycle at configurable interval and inactive files close cycle every 10 seconds.

Logstash is blocking so flushing many files together in the event handler thread may be a problem for heavy setups that write a lot of data to a lot of files. I didn't think this edge case is worth complicating the code so i didn't handle it.

Conflicts: lib/logstash/outputs/amqp.rb

jordansissel · 2012-01-27T23:27:09Z

lib/logstash/outputs/file.rb

+    if gzip
+      @files[path] = Zlib::GzipWriter.new(@files[path])
+    end
+    class << @files[path]


Eek, monkeypatching. Bad mojo. I'd much prefer something that was state held in the plugin instead of in a monkeypatched part of the File or GZipWriter.

If you're going to wrap things and track state like activity, it might be simpler to write your own IO-wrapper class

class MyFile def initialize(io) @io = io end def write(*args) @last_write = TIme.now @io.write(*args) end ... end

And delegate flush/etc to @io, but add activity checks and stuff

jordansissel · 2012-01-27T23:41:29Z

overall this looks pretty good, I added some in-line comments on specific issues.

Thanks for your work! :)

nukemberg · 2012-01-28T07:47:20Z

Erk!!! my mailbox feels like messanger!

Thanks for the review. The reason for the flushes is C habits (although
i would have synced, not flushed before close). As for io.flush this is
because the docs and reality don't agree:

$ ruby test_flush.rb
size betfore write: 0
size before flush: 39194
size after flush 47264
size after io flush 47264
size after close 47274

$ jruby test_flush.rb
size before write: 0
size before flush: 35380
size after flush 35380
size after io flush 38984
size after close 47209

my guess is that Java is doing some dark IO magic somewhere...
As for the monkey patching, i felt it was cleaner not to create a new
object for the state, it takes more memory and the code looks dirtier.
Monkey patching is mostly a problem when you are replacing existing methods.

On Sat 28 Jan 2012 01:41:29 AM IST, Jordan Sissel wrote:

overall this looks pretty good, I added some in-line comments on
specific issues.

Thanks for your work! :)

Reply to this email directly or view it on GitHub:
#91 (comment)

jordansissel · 2012-01-28T10:09:28Z

Righto then. Can you add a big comment explaining what exactly you're monkeypatching and why, then for every method you invoke later (fd.active, etc) put a comment there saying explicitly that you are invoking a monkeypatched method - this'll help future debugging efforts :)

nukemberg · 2012-01-30T11:44:55Z

I removed the monkey patching as per your suggestions.

Add flush intervals and gzip support.

jordansissel · 2012-01-30T17:24:32Z

Thanks a bunch for this :)

Adds a `config.field_reference.escape_style` option and a companion command-line flag `--field-reference-escape-style` allowing a user to opt into one of two proposed escape-sequence implementations for field reference parsing: - `PERCENT`: URI-style `%`+`HH` hexadecimal encoding of UTF-8 bytes - `AMPERSAND`: HTML-style `&#`+`DD`+`;` encoding of decimal Unicode code-points The default is `NONE`, which does _not_ proccess escape sequences. With this setting a user effectively cannot reference a field whose name contains FieldReference-reserved characters. | ESCAPE STYLE | `[` | `]` | | ------------ | ------- | ------- | | `NONE` | _N/A_ | _N/A_ | | `PERCENT` | `%5B` | `%5D` | | `AMPERSAND` | `&elastic#91;` | `&elastic#93;` |

* add failing tests for Event.new with field that look like field references * fix: correctly handle FieldReference-special characters in field names. Keys passed to most methods of `ConvertedMap`, based on `IdentityHashMap` depend on identity and not equivalence, and therefore rely on the keys being _interned_ strings. In order to avoid hitting the JVM's global String intern pool (which can have performance problems), operations to normalize a string to its interned counterpart have traditionally relied on the behaviour of `FieldReference#from` returning a likely-cached `FieldReference`, that had an interned `key` and an empty `path`. This is problematic on two points. First, when `ConvertedMap` was given data with keys that _were_ valid string field references representing a nested field (such as `[host][geo][location]`), the implementation of `ConvertedMap#put` effectively silently discarded the path components because it assumed them to be empty, and only the key was kept (`location`). Second, when `ConvertedMap` was given a map whose keys contained what the field reference parser considered special characters but _were NOT_ valid field references, the resulting `FieldReference.IllegalSyntaxException` caused the operation to abort. Instead of using the `FieldReference` cache, which sits on top of objects whose `key` and `path`-components are known to have been interned, we introduce an internment helper on our `ConvertedMap` that is also backed by the global string intern pool, and ensure that our field references are primed through this pool. In addition to fixing the `ConvertedMap#newFromMap` functionality, this has three net effects: - Our ConvertedMap operations still use strings from the global intern pool - We have a new, smaller cache of individual field names, improving lookup performance - Our FieldReference cache no longer is flooded with fragments and therefore is more likely to remain performant NOTE: this does NOT create isolated intern pools, as doing so would require a careful audit of the possible code-paths to `ConvertedMap#putInterned`. The new cache is limited to 10k strings, and when more are used only the FIRST 10k strings will be primed into the cache, leaving the remainder to always hit the global String intern pool. NOTE: by fixing this bug, we alow events to be created whose fields _CANNOT_ be referenced with the existing FieldReference implementation. Resolves: #13606 Resolves: #11608 * field_reference: support escape sequences Adds a `config.field_reference.escape_style` option and a companion command-line flag `--field-reference-escape-style` allowing a user to opt into one of two proposed escape-sequence implementations for field reference parsing: - `PERCENT`: URI-style `%`+`HH` hexadecimal encoding of UTF-8 bytes - `AMPERSAND`: HTML-style `&#`+`DD`+`;` encoding of decimal Unicode code-points The default is `NONE`, which does _not_ proccess escape sequences. With this setting a user effectively cannot reference a field whose name contains FieldReference-reserved characters. | ESCAPE STYLE | `[` | `]` | | ------------ | ------- | ------- | | `NONE` | _N/A_ | _N/A_ | | `PERCENT` | `%5B` | `%5D` | | `AMPERSAND` | `[` | `]` | * fixup: no need to double-escape HTML-ish escape sequences in docs * Apply suggestions from code review Co-authored-by: Karol Bucek <[email protected]> * field-reference: load escape style in runner * docs: sentences over semiciolons * field-reference: faster shortcut for PERCENT escape mode * field-reference: escape mode control downcase * field_reference: more s/experimental/technical preview/ * field_reference: still more s/experimental/technical preview/ Co-authored-by: Karol Bucek <[email protected]>

Avishai Ish-Shalom added 19 commits November 30, 2011 23:37

queue_name wasn't defined anywhere and had to die

e5ade17

Merge branch 'master' of https://github.com/logstash/logstash

64b54d4

Merge branch 'master' of https://github.com/logstash/logstash

01b945e

Merge branch 'master' of https://github.com/logstash/logstash

aa08aa5

Merge branch 'master' of https://github.com/logstash/logstash

380d0c2

Merge branch 'master' of https://github.com/logstash/logstash

1a6b3c3

Conflicts: lib/logstash/outputs/amqp.rb

Merge branch 'master' of https://github.com/logstash/logstash

538796c

Merge branch 'master' of https://github.com/logstash/logstash

33ece8c

Merge branch 'master' of https://github.com/logstash/logstash

4e44a89

Merge branch 'master' of https://github.com/logstash/logstash

1a303a8

Merge branch 'master' of https://github.com/logstash/logstash

1a5532e

Merge branch 'master' of https://github.com/logstash/logstash

cbf37c2

Fixed insane flushing on every event and added closing of unused files

d8b2687

Fixed typo

371630f

Explicit return value

43a32fc

use instance variable to avoid scoping issues

8edca1f

Java FileWriter doesn't have closed? method. fixed logging levels

d868623

Added gzip support

d7e85b0

Improved flushing for gz files

7b2ef9c

jordansissel reviewed Jan 27, 2012
View reviewed changes

Code style cleanup

4eb89ef

jordansissel added a commit that referenced this pull request Jan 30, 2012

Merge pull request #91 from avishai-ish-shalom/sane_flush

f8efb31

Add flush intervals and gzip support.

jordansissel merged commit f8efb31 into elastic:master Jan 30, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sane flush #91

Sane flush #91

nukemberg commented Jan 27, 2012

jordansissel Jan 27, 2012

jordansissel Jan 27, 2012

jordansissel commented Jan 27, 2012

nukemberg commented Jan 28, 2012

jordansissel commented Jan 28, 2012

nukemberg commented Jan 30, 2012

jordansissel commented Jan 30, 2012

Sane flush #91

Sane flush #91

Conversation

nukemberg commented Jan 27, 2012

jordansissel Jan 27, 2012

Choose a reason for hiding this comment

jordansissel Jan 27, 2012

Choose a reason for hiding this comment

jordansissel commented Jan 27, 2012

nukemberg commented Jan 28, 2012

jordansissel commented Jan 28, 2012

nukemberg commented Jan 30, 2012

jordansissel commented Jan 30, 2012