Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early flow packet loss when using input-tcp with codec-netflow #106

Closed
hammerstefan opened this issue Nov 6, 2017 · 1 comment
Closed

Comments

@hammerstefan
Copy link

Hi, I am trying to use Logstash with input-tcp and codec-netflow plugin to receive IPFIX (netflow v10) from a device.

The device I am using that is generating the IPFIX data has the following behavior:

  • The first time it is to send a data record, it opens a TCP connection to the IPFIX collector (Logstash)
  • Once the TCP connection is established it sends all template records in the first IPFIX message.
  • Immediately afterwards, it send the data record (as the second IPFIX message)
  • Keeps the TCP connection open
  • Uses open TCP connection to send future data records without resending template

In my testing I have observed that any data-record that Logstash received within the first ~50-200ms of template-record, get's silently dropped (never shows up in any log). This means the Logstash is losing important data from the device (which transmits at long intervals, so losing the first record is a huge detriment).

I have simulated the IPFIX stream to help troubleshoot:
Example packet capture: https://drive.google.com/file/d/0B3VctabAy1c9bDBUNHJNdnNZeXM/view?usp=sharing

Stdout rubydebug:

{
      "@version" => "1",
          "host" => "172.21.0.1",
       "netflow" => {
               "octetTotalCount" => 3,
                       "version" => 10,
        "observationTimeSeconds" => 1509458051
    },
    "@timestamp" => 2017-10-31T13:54:11.000Z,
          "port" => 35416
}
{
      "@version" => "1",
          "host" => "172.21.0.1",
       "netflow" => {
               "octetTotalCount" => 4,
                       "version" => 10,
        "observationTimeSeconds" => 1509458052
    },
    "@timestamp" => 2017-10-31T13:54:12.000Z,
          "port" => 35416
}

In the pcap, frame 4 is the template record, and 6,8,10,12 are the data records. As can be seen from the rubydebug stdout, only frames 10 and 12 make it through, frames 6 and 8 are dropped.

Relevant information:

Logstash version:

bash-4.2$ logstash --version
logstash 5.6.3

Plugin version:

bash-4.2$ logstash-plugin list --verbose | grep "input-tcp\|codec-netflow"
logstash-codec-netflow (3.7.0)
logstash-input-tcp (4.2.4)

Running docker image docker.elastic.co/logstash/logstash:5.6.3
Host OS: Ubuntu 16.04

logstash.conf

bash-4.2$ cat /usr/local/logstash/logstash.conf                                           
input {
    tcp {
        port  => 4739
        codec => netflow
    }
}

output {
    file { 
        path => "/tmp/ipfix_events-%{+YYYY-MM-dd}.log" 
    }
    stdout { codec => rubydebug  }
}

logstash.yml

bash-4.2$ cat /usr/share/logstash/config/logstash.yml 
http.host: "0.0.0.0"
path.config: /usr/share/logstash/pipeline
xpack.monitoring.enabled: false

Lastly, --log.level=debug crashes logstash, so I cannot post the debug log output.

[2017-10-31T14:21:43,889][FATAL][logstash.runner          ] An unexpected error occurred! {:error=>#<NoMethodError: undefined method `to_hash' for []:Array>, :backtrace=>["(eval):22:in `filter_func'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:398:in `filter_batch'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:379:in `worker_loop'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:342:in `start_workers'"]}

Please let me know if there is any other data that might be useful, or anything I should try.

Thanks,
Stefan

@jorritfolmer
Copy link
Contributor

Thanks for taking the time to draft such a detailed issue report! Like!

I think this issue can be attributed to the fact that we don't implement a "MAY" requirement from the IPFIX RFC. (See chapter 8 template management. The entire IPFIX RFC compliance is tracked in #83):

"...the Collecting Process MAY buffer Data Records for which it has no Templates..."

Currently we already don't implement a high number of MUST and SHOULD requirements, so this won't get much attention for the foreseeable future.
Obviously, you're very much welcome to contribute code and tests to resolve this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants