-
Notifications
You must be signed in to change notification settings - Fork 126
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #106 from mitre/feature-celery
Update MultiScanner documentation
- Loading branch information
Showing
84 changed files
with
1,309 additions
and
541 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line. | ||
SPHINXOPTS = | ||
SPHINXBUILD = sphinx-build | ||
SPHINXPROJ = MultiScanner | ||
SOURCEDIR = . | ||
BUILDDIR = _build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
/* override table width restrictions */ | ||
@media screen and (min-width: 767px) { | ||
|
||
.wy-table-responsive table td { | ||
/* !important prevents the common CSS stylesheets from overriding | ||
this as on RTD they are loaded after this stylesheet */ | ||
white-space: normal !important; | ||
} | ||
|
||
.wy-table-responsive { | ||
overflow: visible !important; | ||
} | ||
} |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
Architecture | ||
============ | ||
|
||
High-level Architecture | ||
----------------------- | ||
There are seven primary components of the MultiScanner architecture, as described below and illustrated in the associated diagram. | ||
|
||
.. figure:: _static/img/arch1.png | ||
:align: center | ||
:scale: 45 % | ||
:alt: MultiScanner Architecture | ||
|
||
MultiScanner Architecture | ||
.. | ||
**Web Frontend** | ||
|
||
The web application runs on `Flask <http://flask.pocoo.org/>`_, uses `Bootstrap <https://getbootstrap.com/>`_ and `jQuery <https://jquery.com/>`_, and is served via Apache. It is essentially an aesthetic wrapper around the REST API. All data and services provided are also available by querying the REST API. | ||
|
||
|
||
**REST API** | ||
|
||
The REST API is also powered by Flask and served via Apache. It has an underlying PostgreSQL database to facilitate task tracking. Additionally, it acts as a gateway to the backend Elasticsearch document store. Searches entered into the web UI will be routed through the REST API and passed to the Elasticsearch cluster. This abstracts the complexity of querying Elasticsearch and gives the user a simple web interface to work with. | ||
|
||
**Task Queue** | ||
|
||
We use Celery as our distributed task queue. | ||
|
||
**Task Tracking** | ||
|
||
PostgreSQL is our task management database. It is here that we keep track of scan times, samples, and the status of tasks (pending, complete, failed). | ||
|
||
**Distributed File System** | ||
|
||
GlusterFS is our distributed file system. Each component that needs access to the raw samples mounts the share via FUSE. We selected GlusterFS because it is more performant in our use case -- storing a large number of small samples -- than a technology like HDFS would be. | ||
|
||
**Worker Nodes** | ||
|
||
The worker nodes are Celery clients running the MultiScanner Python application. Additionally, we implemented some batching within Celery to improve the performance of our worker nodes (which operate better at scale). | ||
|
||
A worker node will wait until there are 100 samples in its queue or 60 seconds have passed (whichever happens first) before kicking off its scan (these values are configurable). All worker nodes have the GlusterFS mounted, which gives access to the samples for scanning. In our setup, we co-locate the worker nodes with the GlusterFS nodes in order to reduce the network load of workers pulling samples from GlusterFS. | ||
|
||
**Report Storage** | ||
|
||
We use Elasticsearch to store the results of our file scans. This is where the true power of this system lies. Elasticsearch allows for performant, full text searching across all our reports and modules. This allows fast access to interesting details from your malware analysis tools, pivoting between samples, and powerful analytics on report output. | ||
|
||
.. _complete-workflow: | ||
|
||
Complete Workflow | ||
----------------- | ||
Each step of the MultiScanner workflow is described below the diagram. | ||
|
||
.. figure:: _static/img/arch2.png | ||
:align: center | ||
:scale: 50 % | ||
:alt: MultiScanner Workflow | ||
|
||
MultiScanner Workflow | ||
.. | ||
1. The user submits a sample file through the Web UI (or REST API) | ||
|
||
2. The Web UI (or REST API): | ||
|
||
a. Stores the file in the distributed file system (GlusterFS) | ||
b. Places the task on the task queue (Celery) | ||
c. Adds an entry to the task management database (PostgreSQL) | ||
|
||
3. A worker node: | ||
|
||
a. Pulls the task from the Celery task queue | ||
b. Retrieves the corresponding sample file from the GlusterFS via its SHA256 value | ||
c. Analyzes the file | ||
d. Generates a JSON blob and indexes it into Elasticsearch | ||
e. Updates the task management database with the task status ("complete") | ||
|
||
4. The Web UI (or REST API): | ||
|
||
a. Gets report ID associated with the Task ID | ||
b. Pulls analysis report from the Elasticsearch datastore | ||
|
||
Analysis | ||
-------- | ||
Analysis tools are integrated into MultiScanner via modules running in the MultiScanner framework. Tools can be custom built Python scripts, web APIs, or software applications running on different machines. Catagories of existing modules include AV scanning, sandbox detonation, metadata extraction, and signature scanning. Modules can be enabled/disabled via a configuration file. Details are provided in the :ref:`analysis-modules` section. | ||
|
||
Analytics | ||
--------- | ||
Enabling analytics and advanced queries is the primary advantage of running several tools against a sample, extracting as much information as possible, and storing the output in a common datastore. For example, the following types of analytics and queries are possible: | ||
|
||
* cluster samples | ||
* outlier samples | ||
* samples for deep-dive analysis | ||
* gaps in current toolset | ||
* machine learning analytics on tool outputs | ||
|
||
Reporting | ||
--------- | ||
Analysis data captured or generated by MultiScanner is accessible in three ways: | ||
|
||
* MultiScanner Web User Interface – Content in the Elasticsearch database is viewable through the Web UI. See :ref:`web-ui` section for details. | ||
|
||
* MultiScanner Reports – MultiScanner reports reflect the content of the MultiScanner database and are provided in raw JSON and PDF formats. These reports capture all content associated with a sample. | ||
|
||
* STIX-based reports *will soon be* available in multiple formats: JSON, PDF, HTML, and text. |
Oops, something went wrong.