Skip to content

Combine architecture

Kyle Maxwell edited this page Jun 25, 2014 · 3 revisions

Concept

Combine (emphasis on the first syllable, as in a combine harvester) is a tool to collect and process threat data from various feeds before outputting it in a form usable by SIEM or other analysis tools.

Processes

  1. Reaper gathers the threat data directly from feeds or other data sources.
  2. Thresher normalizes it into a simplistic data model.
  3. Winnower optionally performs basic validation on the data, such as removing RFC 1918 addresses and the like.
  4. Baler uses output plugins to transform the data from the normalized model into various output forms. This should likely include CybOX, CSV, JSON, and CIM. The first pass will only use the CSV schema defined for the DEFCON presentation, however.

Flow

  1. Reaper pulls URLs from a list, grabs the listed documents, and places them into a queue for processing.
  2. Thresher processes each document into a normalized data model (see DEFCON preso schema). It will require a number of plugins or maps to know how to interpret different sorts of feeds.
  3. If enabled, Winnower checks each address to verify it is not reserved. Do we need any other validation steps here?
  4. Baler will yield data based on specific parameters (e.g. time frame or autonomous system) in a requested format. It should also provide the final pipe into the MLSec processing infrastructure.
Clone this wiki locally