Skip to content

Commit

Permalink
chore: improve the malicious metadata check (#797)
Browse files Browse the repository at this point in the history
This PR refactors and improves the _detect_malicious_metadata_check:

* Moves the check under src/macaron/slsa_analyzer/checks/
* Refactors the implementation of the check to avoid storing the metadata in the PyPIRegistry object and uses the * AssetLocator representation instead.
* Uses DB JSON type to store the serialized metadata info instead of dumping it as a string value.
* Adds a new unit test for the check and improves the other relevant tests.
* Adds the check to the django integration test case and its dependencies.
* Ensures that the source code retrieved by the PyPIRegistry API is the version that matches the artifact PURL.
* Removes the heuristics that introduce too many FPs.

Signed-off-by: behnazh-w <[email protected]>
  • Loading branch information
behnazh-w authored Jul 30, 2024
1 parent dfdc62e commit 8f2e757
Show file tree
Hide file tree
Showing 44 changed files with 2,334 additions and 607 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@ macaron.database.database\_manager module
:undoc-members:
:show-inheritance:

macaron.database.rfc3339\_datetime module
macaron.database.db\_custom\_types module
-----------------------------------------

.. automodule:: macaron.database.rfc3339_datetime
.. automodule:: macaron.database.db_custom_types
:members:
:undoc-members:
:show-inheritance:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
macaron.malware\_analyzer.pypi\_heuristics.metadata package
===========================================================

.. automodule:: macaron.malware_analyzer.pypi_heuristics.metadata
:members:
:undoc-members:
:show-inheritance:

Submodules
----------

macaron.malware\_analyzer.pypi\_heuristics.metadata.closer\_release\_join\_date module
--------------------------------------------------------------------------------------

.. automodule:: macaron.malware_analyzer.pypi_heuristics.metadata.closer_release_join_date
:members:
:undoc-members:
:show-inheritance:

macaron.malware\_analyzer.pypi\_heuristics.metadata.empty\_project\_link module
-------------------------------------------------------------------------------

.. automodule:: macaron.malware_analyzer.pypi_heuristics.metadata.empty_project_link
:members:
:undoc-members:
:show-inheritance:

macaron.malware\_analyzer.pypi\_heuristics.metadata.high\_release\_frequency module
-----------------------------------------------------------------------------------

.. automodule:: macaron.malware_analyzer.pypi_heuristics.metadata.high_release_frequency
:members:
:undoc-members:
:show-inheritance:

macaron.malware\_analyzer.pypi\_heuristics.metadata.one\_release module
-----------------------------------------------------------------------

.. automodule:: macaron.malware_analyzer.pypi_heuristics.metadata.one_release
:members:
:undoc-members:
:show-inheritance:

macaron.malware\_analyzer.pypi\_heuristics.metadata.unchanged\_release module
-----------------------------------------------------------------------------

.. automodule:: macaron.malware_analyzer.pypi_heuristics.metadata.unchanged_release
:members:
:undoc-members:
:show-inheritance:

macaron.malware\_analyzer.pypi\_heuristics.metadata.unreachable\_project\_links module
--------------------------------------------------------------------------------------

.. automodule:: macaron.malware_analyzer.pypi_heuristics.metadata.unreachable_project_links
:members:
:undoc-members:
:show-inheritance:
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
macaron.malware\_analyzer.pypi\_heuristics package
==================================================

.. automodule:: macaron.malware_analyzer.pypi_heuristics
:members:
:undoc-members:
:show-inheritance:

Subpackages
-----------

.. toctree::
:maxdepth: 1

macaron.malware_analyzer.pypi_heuristics.metadata
macaron.malware_analyzer.pypi_heuristics.sourcecode

Submodules
----------

macaron.malware\_analyzer.pypi\_heuristics.base\_analyzer module
----------------------------------------------------------------

.. automodule:: macaron.malware_analyzer.pypi_heuristics.base_analyzer
:members:
:undoc-members:
:show-inheritance:

macaron.malware\_analyzer.pypi\_heuristics.heuristics module
------------------------------------------------------------

.. automodule:: macaron.malware_analyzer.pypi_heuristics.heuristics
:members:
:undoc-members:
:show-inheritance:
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
macaron.malware\_analyzer.pypi\_heuristics.sourcecode package
=============================================================

.. automodule:: macaron.malware_analyzer.pypi_heuristics.sourcecode
:members:
:undoc-members:
:show-inheritance:

Submodules
----------

macaron.malware\_analyzer.pypi\_heuristics.sourcecode.suspicious\_setup module
------------------------------------------------------------------------------

.. automodule:: macaron.malware_analyzer.pypi_heuristics.sourcecode.suspicious_setup
:members:
:undoc-members:
:show-inheritance:
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
macaron.malware\_analyzer package
=================================

.. automodule:: macaron.malware_analyzer
:members:
:undoc-members:
:show-inheritance:

Subpackages
-----------

.. toctree::
:maxdepth: 1

macaron.malware_analyzer.pypi_heuristics

Submodules
----------

macaron.malware\_analyzer.datetime\_parser module
-------------------------------------------------

.. automodule:: macaron.malware_analyzer.datetime_parser
:members:
:undoc-members:
:show-inheritance:
1 change: 1 addition & 0 deletions docs/source/pages/developers_guide/apidoc/macaron.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Subpackages
macaron.config
macaron.database
macaron.dependency_analyzer
macaron.malware_analyzer
macaron.output_reporter
macaron.parsers
macaron.policy_engine
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,14 @@ macaron.slsa\_analyzer.checks.check\_result module
:undoc-members:
:show-inheritance:

macaron.slsa\_analyzer.checks.detect\_malicious\_metadata\_check module
-----------------------------------------------------------------------

.. automodule:: macaron.slsa_analyzer.checks.detect_malicious_metadata_check
:members:
:undoc-members:
:show-inheritance:

macaron.slsa\_analyzer.checks.infer\_artifact\_pipeline\_check module
---------------------------------------------------------------------

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,11 @@ macaron.slsa\_analyzer.package\_registry.package\_registry module
:members:
:undoc-members:
:show-inheritance:

macaron.slsa\_analyzer.package\_registry.pypi\_registry module
--------------------------------------------------------------

.. automodule:: macaron.slsa_analyzer.package_registry.pypi_registry
:members:
:undoc-members:
:show-inheritance:
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ test = [
"pytest-custom_exit_code >=0.3.0,<1.0.0",
"pytest-cov >=5.0.0,<6.0.0",
"pytest-env >=1.0.0,<2.0.0",
"pytest_httpserver >=1.0.10,<2.0.0",
"syrupy >=4.0.0,<5.0.0",
]

Expand Down
4 changes: 2 additions & 2 deletions src/macaron/code_analyzer/call_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ class BaseNode(Generic[Node]):
def __init__(self, caller: Node | None = None, node_id: str | None = None) -> None:
"""Initialize instance.
Parameter
---------
Parameters
----------
caller: Node | None
The caller node.
node_id: str | None
Expand Down
5 changes: 4 additions & 1 deletion src/macaron/config/defaults.ini
Original file line number Diff line number Diff line change
Expand Up @@ -519,7 +519,10 @@ request_timeout = 20

[package_registry.pypi]
request_timeout = 20
hostname = pypi.org
registry_url_netloc = pypi.org
registry_url_scheme = https
fileserver_url_netloc = files.pythonhosted.org
fileserver_url_scheme = https

# Configuration options for selecting the checks to run.
# Both the exclude and include are defined as list of strings:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Copyright (c) 2023 - 2023, Oracle and/or its affiliates. All rights reserved.
# Copyright (c) 2023 - 2024, Oracle and/or its affiliates. All rights reserved.
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.

"""This module implements SQLAlchemy type for converting date format to RFC3339 string representation."""

import datetime
from typing import Any

from sqlalchemy import String, TypeDecorator
from sqlalchemy import JSON, String, TypeDecorator


class RFC3339DateTime(TypeDecorator): # pylint: disable=W0223
Expand Down Expand Up @@ -60,3 +60,35 @@ def process_result_value(self, value: None | str, dialect: Any) -> None | dateti
if result.tzinfo:
return result
return result.astimezone(RFC3339DateTime._host_tzinfo)


class DBJsonDict(TypeDecorator): # pylint: disable=W0223
"""SQLAlchemy column type to serialize dictionaries."""

# It is stored in the database as a json value.
impl = JSON

# To prevent Sphinx from rendering the docstrings for `cache_ok`, make this docstring private.
#: :meta private:
cache_ok = True

def process_bind_param(self, value: None | dict, dialect: Any) -> None | dict:
"""Process when storing a dict object to the SQLite db.
value: None | dict
The value being stored
"""
if not isinstance(value, dict):
raise TypeError("DBJsonDict type expects a dict.")

return value

def process_result_value(self, value: None | dict, dialect: Any) -> None | dict:
"""Process when loading a dict object from the SQLite db.
value: None | dict
The value being loaded
"""
if not isinstance(value, dict):
raise TypeError("DBJsonDict type expects a dict.")
return value
2 changes: 1 addition & 1 deletion src/macaron/database/table_definitions.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@

from macaron.artifact.maven import MavenSubjectPURLMatcher
from macaron.database.database_manager import ORMBase
from macaron.database.rfc3339_datetime import RFC3339DateTime
from macaron.database.db_custom_types import RFC3339DateTime
from macaron.errors import InvalidPURLError
from macaron.slsa_analyzer.provenance.intoto import InTotoPayload, ProvenanceSubjectPURLMatcher
from macaron.slsa_analyzer.slsa_req import ReqName
Expand Down
2 changes: 0 additions & 2 deletions src/macaron/malware_analyzer/checks/__init__.py

This file was deleted.

13 changes: 8 additions & 5 deletions src/macaron/malware_analyzer/datetime_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,17 @@
def parse_datetime(datetime_str: str, datetime_format: str = "%Y-%m-%dT%H:%M:%S") -> datetime | None:
"""Parse a datetime string and handle errors.
Args
----
datetime_str (str): The datetime string to parse.
datetime_format (str): The format to use for parsing the datetime string.
Parameters
----------
datetime_str: str:
The datetime string to parse.
datetime_format str:
The format to use for parsing the datetime string.
Returns
-------
datetime: The parsed datetime object, or None if parsing failed.
datetime | None
The parsed datetime object, or None if parsing failed.
"""
try:
return datetime.strptime(datetime_str, datetime_format)
Expand Down
18 changes: 12 additions & 6 deletions src/macaron/malware_analyzer/pypi_heuristics/base_analyzer.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,15 @@

"""Define and initialize the base analyzer."""

import abc
from abc import abstractmethod

from macaron.json_tools import JsonType
from macaron.malware_analyzer.pypi_heuristics.heuristics import HeuristicResult, Heuristics
from macaron.slsa_analyzer.package_registry.pypi_registry import PyPIRegistry
from macaron.slsa_analyzer.package_registry.pypi_registry import PyPIPackageJsonAsset


class BaseHeuristicAnalyzer:
class BaseHeuristicAnalyzer(abc.ABC):
"""The base analyzer initialization."""

def __init__(
Expand All @@ -25,13 +27,17 @@ def __init__(
)

@abstractmethod
def analyze(self, api_client: PyPIRegistry) -> tuple[HeuristicResult, dict]:
def analyze(self, pypi_package_json: PyPIPackageJsonAsset) -> tuple[HeuristicResult, dict[str, JsonType]]:
"""
Implement the base analyze method for seven analyzers.
Parameters
----------
pypi_package_json: PyPIPackageJsonAsset
The PyPI package JSON asset object.
Returns
-------
tuple[HeuristicResult, int | dict]: Contain the heuristic result and the metadata of the package.
E.g. (1) The release frequency (2) {"maintainers_join_date": datetime}
tuple[HeuristicResult, dict[str, JsonType]]:
The result and related information collected during the analysis.
"""
raise NotImplementedError
Loading

0 comments on commit 8f2e757

Please sign in to comment.