Skip to content

Commit

Permalink
Added malware feed client
Browse files Browse the repository at this point in the history
  • Loading branch information
dhondta committed Oct 28, 2024
1 parent 9278f5b commit a73e45e
Show file tree
Hide file tree
Showing 8 changed files with 136 additions and 76 deletions.
20 changes: 14 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,21 @@

[![PyPi](https://img.shields.io/pypi/v/malsearch.svg)](https://pypi.python.org/pypi/malsearch/)
[![Read The Docs](https://readthedocs.org/projects/python-malsearch/badge/?version=latest)](https://python-malsearch.readthedocs.io/en/latest/?badge=latest)
[![Build Status](https://github.com/dhondta/python-malsearch/actions/workflows/python-package.yml/badge.svg)](https://github.com/dhondta/python-malsearch/actions/workflows/python-package.yml)
[![Coverage Status](https://raw.githubusercontent.com/dhondta/python-malsearch/main/docs/coverage.svg)](#)
[![Build Status](https://github.com/packing-box/python-malsearch/actions/workflows/python-package.yml/badge.svg)](https://github.com/packing-box/python-malsearch/actions/workflows/python-package.yml)
[![Coverage Status](https://raw.githubusercontent.com/packing-box/python-malsearch/main/docs/coverage.svg)](#)
[![Python Versions](https://img.shields.io/pypi/pyversions/malsearch.svg)](https://pypi.python.org/pypi/malsearch/)
[![Known Vulnerabilities](https://snyk.io/test/github/dhondta/python-malsearch/badge.svg?targetFile=requirements.txt)](https://snyk.io/test/github/dhondta/python-malsearch?targetFile=requirements.txt)
[![Known Vulnerabilities](https://snyk.io/test/github/packing-box/python-malsearch/badge.svg?targetFile=requirements.txt)](https://snyk.io/test/github/packing-box/python-malsearch?targetFile=requirements.txt)
[![License](https://img.shields.io/pypi/l/malsearch.svg)](https://pypi.python.org/pypi/malsearch/)

This library communicates with API's of multiple malware databases to collect malware samples.
This library communicates with API's of the following malware databases to collect malware samples:

- [Maldatabase](https://maldatabase.com/api-doc.html)
- [Malpedia](https://malpedia.caad.fkie.fraunhofer.de/usage/api)
- [MalShare](https://malshare.com/doc.php)
- [Malware Bazaar](https://bazaar.abuse.ch/api)
- [Triage](https://tria.ge/docs)
- [VirusShare](https://virusshare.com/apiv2_reference)
- [VirusTotal](https://docs.virustotal.com/reference/overview)

```sh
pip install malsearch
Expand All @@ -23,8 +31,8 @@ TODO

## :clap: Supporters

[![Stargazers repo roster for @dhondta/python-malsearch](https://reporoster.com/stars/dark/dhondta/python-malsearch)](https://github.com/dhondta/python-malsearch/stargazers)
[![Stargazers repo roster for @packing-box/python-malsearch](https://reporoster.com/stars/dark/packing-box/python-malsearch)](https://github.com/packing-box/python-malsearch/stargazers)

[![Forkers repo roster for @dhondta/python-malsearch](https://reporoster.com/forks/dark/dhondta/python-malsearch)](https://github.com/dhondta/python-malsearch/network/members)
[![Forkers repo roster for @packing-box/python-malsearch](https://reporoster.com/forks/dark/packing-box/python-malsearch)](https://github.com/packing-box/python-malsearch/network/members)

<p align="center"><a href="#"><img src="https://img.shields.io/badge/Back%20to%20top--lightgrey?style=social" alt="Back to top" height="20"/></a></p>
14 changes: 5 additions & 9 deletions docs/pages/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@
MalSearch is a library that allows to collect malware samples from multiple malware databases using their API's. It relies on:

- [Maldatabase](https://maldatabase.com/api-doc.html)
- [MalShare](https://www.malshare.com)
- [Malpedia](https://malpedia.caad.fkie.fraunhofer.de/usage/api)
- [MalShare](https://malshare.com/doc.php)
- [Malware Bazaar](https://bazaar.abuse.ch/api)
- [Triage]()
- [VirusShare]()
- [VirusTotal](https://docs.virustotal.com/reference/getting-started)
- [Triage](https://tria.ge/docs)
- [VirusShare](https://virusshare.com/apiv2_reference)
- [VirusTotal](https://docs.virustotal.com/reference/overview)


## Setup
Expand All @@ -18,8 +19,3 @@ This library is available on [PyPi](https://pypi.python.org/pypi/malsearch/) and
pip install malsearch
```

or

```sh
pip3 install malsearch
```
2 changes: 1 addition & 1 deletion src/malsearch/VERSION.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.1.0
0.2.0
144 changes: 88 additions & 56 deletions src/malsearch/__init__.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,63 @@
# -*- coding: UTF-8 -*-
import logging
from os import cpu_count

from .clients import *
from .clients import __all__ as _clients


__all__ = ["download_sample", "download_samples"] + _clients
__all__ = ["download_sample", "download_samples", "get_samples_feed"] + _clients

_CLIENTS_MAP = {n.lower(): globals()[n] for n in _clients}
_MAX_WORKERS = 3 * cpu_count()

logger = logging.getLogger("malsearch")


def _check_conf(method):
def _wrapper(f):
from functools import wraps
@wraps(f)
def _subwrapper(*args, config=None, **kwargs):
if config is None:
logger.error("no configuration file provided")
logger.info(f"you can create one at {config} manually (INI format with section 'API keys')")
else:
if isinstance(config, str):
config = _valid_conf(config)
clients = []
for n in config['API keys']:
if not hasattr(_CLIENTS_MAP[n], method):
continue
if n in (kwargs.get('skip') or []):
logger.debug(f"{n} skipped")
continue
if config.has_section("Disabled"):
t = config['Disabled'].get(n)
if t is not None:
try:
if dt.datetime.strptime(t, "%d/%m/%Y %H:%M:%S") < dt.datetime.now():
from contextlib import nullcontext
with kwargs.get('lock') or nullcontext():
config['Disabled'].pop(n)
with open(config.path, 'w') as f:
config.write(f)
else:
logger.warning(f"{n} is disabled until {t}")
continue
except ValueError:
logger.warning(f"{n} is disabled")
continue
cls = _CLIENTS_MAP[n]
if cls.__base__.__name__ == "API":
kwargs['api_key'] = config['API keys'].get(n)
clients.append(cls(config=config, **kwargs))
if len(clients) == 0:
logger.warning("no download client available/enabled")
logger.debug(f"clients: {', '.join(c.name for c in clients)}")
return f(*args, clients=clients, config=config, **kwargs)
return _subwrapper
return _wrapper


def _valid_conf(path):
Expand All @@ -23,68 +75,48 @@ def _valid_conf(path):
return conf


@_check_conf("get_file_by_hash")
def download_sample(hash, config=None, **kwargs):
import logging
logger = logging.getLogger("malsearch")
if config is None:
logger.error("no configuration file provided")
logger.info(f"you can create one at {config} manually (INI format with section 'API keys')")
else:
import datetime as dt
from os.path import exists, join
p = join(kwargs.get('output_dir', "."), hash)
if exists(p) and not kwargs.get('overwrite'):
logger.info(f"'{p}' already exists")
return
if isinstance(config, str):
config = _valid_conf(config)
clients = []
for n in config['API keys']:
if n in (kwargs.get('skip') or []):
logger.debug(f"{n} skipped")
continue
if config.has_section("Disabled"):
t = config['Disabled'].get(n)
if t is not None:
try:
if dt.datetime.strptime(t, "%d/%m/%Y %H:%M:%S") < dt.datetime.now():
from contextlib import nullcontext
with kwargs.get('lock') or nullcontext():
config['Disabled'].pop(n)
with open(config.path, 'w') as f:
config.write(f)
else:
logger.warning(f"{n} is disabled until {t}")
continue
except ValueError:
logger.warning(f"{n} is disabled")
continue
clients.append(n)
if len(clients) == 0:
logger.warning("no download client available/enabled")
logger.debug(f"clients: {', '.join(clients)}")
for n in clients:
logger.debug(f"trying {n}...")
cls = _CLIENTS_MAP[n]
if cls.__base__.__name__ == "API":
kwargs['api_key'] = config['API keys'].get(n)
client = cls(config=config, **kwargs)
try:
client.get_file_by_hash(hash)
if hasattr(client, "content") and client.content is not None and len(client.content) > 0:
logger.debug("found sample !")
return
except ValueError as e:
logger.debug(e)
except Exception as e:
logger.exception(e)
import datetime as dt
from os.path import exists, join
p = join(kwargs.get('output_dir', "."), hash)
if exists(p) and not kwargs.get('overwrite'):
logger.info(f"'{p}' already exists")
return
for client in clients:
logger.debug(f"trying {client.name}...")
try:
client.get_file_by_hash(hash)
if hasattr(client, "content") and client.content is not None and len(client.content) > 0:
logger.debug("found sample !")
return
except AttributeError:
continue # not a client for downloading samples (e.g. Maldatabase)
except ValueError as e:
logger.debug(e)
except Exception as e:
logger.exception(e)
logger.warning(f"could not find the sample with hash {hash}")


def download_samples(*hashes, max_workers=5, **kwargs):
def download_samples(*hashes, max_workers=_MAX_WORKERS, **kwargs):
from concurrent.futures import ThreadPoolExecutor as Pool
from threading import Lock
kwargs['lock'] = Lock()
with Pool(max_workers=max_workers) as executor:
for h in hashes:
executor.submit(download_sample, h.lower(), **kwargs)


@_check_conf("get_malware_feed")
def get_samples_feed(config=None, **kwargs):
count = 0
for client in clients:
logger.debug(f"trying {client.name}...")
try:
for h in client.get_malware_feed():
yield h
count += 1
except Exception as e:
logger.exception(e)
logger.info(f"got {count} hashes")
10 changes: 8 additions & 2 deletions src/malsearch/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,14 @@ def _setup(parser):

def main():
from os import makedirs
from .__init__ import _valid_conf, download_samples
from .__init__ import _valid_conf, download_samples, get_samples_feed
from .clients.__common__ import _valid_hash
parser = _parser("MalSearch", "This tool is aimed to search for malware samples across some public databases", [])
parser = _parser("MalSearch", "This tool is aimed to search for malware samples across some public databases",
["2037f9b7dd268eef7d2e950b27c6cf80e3ba692d262c785ab67b04dc71c99bf9",
"-f hashes.txt -o samples --disable-cache"])
parser.add_argument("sample_hash", type=_valid_hash, nargs="*", help="input hash")
parser.add_argument("-f", "--from-file", help="get hashes from the target file (newline-separated list)")
parser.add_argument("-m", "--from-malware-feed", action="store_true", help="get hashes from malware feeds")
opt = parser.add_argument_group("optional arguments")
opt.add_argument("-c", "--config", default="~/.malsearch.conf", type=_valid_conf, help="INI configuration file")
opt.add_argument("-o", "--output-dir", default=".", help="output directory for downloaded samples")
Expand All @@ -52,6 +55,9 @@ def main():
with open(args.from_file) as f:
for h in f.readlines():
args.sample_hash.append(_valid_hash(h.strip()))
if args.from_malware_feed:
for h in get_samples_feed():
args.sample_hash.append(_valid_hash(h.strip()))
makedirs(args.output_dir, exist_ok=True)
if len(args.sample_hash) > 0:
download_samples(*args.sample_hash, **vars(args))
Expand Down
3 changes: 2 additions & 1 deletion src/malsearch/clients/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# -*- coding: UTF-8 -*-
from .maldatabase import Maldatabase
from .malpedia import Malpedia
from .malshare import MalShare
from .malwarebazaar import MalwareBazaar
Expand All @@ -7,5 +8,5 @@
from .virustotal import VirusTotal


__all__ = ["Malpedia", "MalShare", "MalwareBazaar", "Triage", "VirusShare", "VirusTotal"]
__all__ = ["Maldatabase", "Malpedia", "MalShare", "MalwareBazaar", "Triage", "VirusShare", "VirusTotal"]

17 changes: 17 additions & 0 deletions src/malsearch/clients/maldatabase.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# -*- coding: UTF-8 -*-
from .__common__ import API


__all__ = ["Maldatabase"]


class Maldatabase(API):
doc = "https://maldatabase.com/api-doc.html"
url = "https://api.maldatabase.com/download"
_api_key_header = "Authorization"

def get_malware_feed(self, hashtype="sha256"):
# available output hash types: md5, sha1, sha256
self._get("", headers={'Accept-Encoding': "gzip, deflate"})
for data in self.json:
yield data[hashtype]
2 changes: 1 addition & 1 deletion src/malsearch/clients/virustotal.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ class VirusTotal(API):
doc = "https://docs.virustotal.com/reference/overview"
url = "https://www.virustotal.com/api/v3"
_api_key_header = "X-Apikey"

@hashtype("md5", "sha1", "sha256")
def get_file_by_hash(self, hash):
if self._unpacked:
Expand Down

0 comments on commit a73e45e

Please sign in to comment.