Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial hook-based check execution mechanism #7160

Merged
merged 13 commits into from
Jan 6, 2020
Merged

Add initial hook-based check execution mechanism #7160

merged 13 commits into from
Jan 6, 2020

Conversation

xmunoz
Copy link
Contributor

@xmunoz xmunoz commented Dec 26, 2019

Progress on #7093

@xmunoz xmunoz requested review from ewdurbin and woodruffw December 26, 2019 18:21
@ewdurbin
Copy link
Member

ewdurbin commented Dec 26, 2019

One option rather than instantiating checks directly from the upload method would be to add a db listener:

diff --git a/warehouse/malware/__init__.py b/warehouse/malware/__init__.py
index 0d989ac0..b4b4973b 100644
--- a/warehouse/malware/__init__.py
+++ b/warehouse/malware/__init__.py
@@ -10,8 +10,32 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+from warehouse import db
 from warehouse.malware.interfaces import MalwareCheckService
 from warehouse.malware.services import database_malware_check_factory
+from warehouse.packaging.models import File, Project, Release
+
+
+@db.listens_for(db.Session, "after_flush")
+def determine_malware_checks(config, session, flush_context):
+
+    malware_checks = session.info.setdefault("warehouse.malware.checks", set())
+
+    for obj in session.new:
+        if obj.__class__ in (File,):
+            malware_checks.update([f"example_file_check:{obj.id}"])
+        if obj.__class__ in (Release,):
+            malware_checks.update([f"example_release_check:{obj.id}"])
+        if obj.__class__ in (Project,):
+            malware_checks.update([f"example_project_check:{obj.id}"])
+
+
+@db.listens_for(db.Session, "after_commit")
+def queue_malware_checks(config, session):
+
+    malware_checks = session.info.setdefault("warehouse.malware.checks", set())
+
+    print(malware_checks)
+    # enqueue tasks
 
 
 def includeme(config):

This is similar to how our cache purging works. The example above is a bit rough, but implementing it in this way would allow for instantiating the checks from the event and even instantiating different checks based on what changed.

dev/environment Show resolved Hide resolved
warehouse/malware/__init__.py Show resolved Hide resolved
warehouse/malware/tasks.py Show resolved Hide resolved
warehouse/malware/utils.py Show resolved Hide resolved
@ewdurbin
Copy link
Member

ewdurbin commented Jan 6, 2020

I'm not sure that any of the above concerns are blockers to merge here.

@ewdurbin ewdurbin merged commit 09cc157 into pypi:malware-detection Jan 6, 2020
ewdurbin added a commit that referenced this pull request Jan 6, 2020
* Add initial hook-based check execution mechanism

* scratch/poc

* Add initial hook-based check execution mechanism

* Use sqlalchemy event hooks for malware checks

* Fix unit tests

* Add enum for MalwareCheckObjectType

* Add unit tests for init.

* Add tests for tasks, services, and utils.

Also, some small bugfixes in MalwareCheckFactory and the
get_enabled_checks method.

* Fix spurious task test.

* Add missing drop enum to downgrade function.

* Added TODO to dev/environment

* Be more explicit in check lookup

Co-authored-by: Ernest W. Durbin III <[email protected]>
ewdurbin added a commit that referenced this pull request Jan 7, 2020
* Add initial hook-based check execution mechanism

* scratch/poc

* Add initial hook-based check execution mechanism

* Use sqlalchemy event hooks for malware checks

* Fix unit tests

* Add enum for MalwareCheckObjectType

* Add unit tests for init.

* Add tests for tasks, services, and utils.

Also, some small bugfixes in MalwareCheckFactory and the
get_enabled_checks method.

* Fix spurious task test.

* Add missing drop enum to downgrade function.

* Added TODO to dev/environment

* Be more explicit in check lookup

Co-authored-by: Ernest W. Durbin III <[email protected]>
ewdurbin added a commit that referenced this pull request Jan 7, 2020
* Add initial hook-based check execution mechanism

* scratch/poc

* Add initial hook-based check execution mechanism

* Use sqlalchemy event hooks for malware checks

* Fix unit tests

* Add enum for MalwareCheckObjectType

* Add unit tests for init.

* Add tests for tasks, services, and utils.

Also, some small bugfixes in MalwareCheckFactory and the
get_enabled_checks method.

* Fix spurious task test.

* Add missing drop enum to downgrade function.

* Added TODO to dev/environment

* Be more explicit in check lookup

Co-authored-by: Ernest W. Durbin III <[email protected]>
ewdurbin added a commit that referenced this pull request Jan 8, 2020
* Add initial hook-based check execution mechanism

* scratch/poc

* Add initial hook-based check execution mechanism

* Use sqlalchemy event hooks for malware checks

* Fix unit tests

* Add enum for MalwareCheckObjectType

* Add unit tests for init.

* Add tests for tasks, services, and utils.

Also, some small bugfixes in MalwareCheckFactory and the
get_enabled_checks method.

* Fix spurious task test.

* Add missing drop enum to downgrade function.

* Added TODO to dev/environment

* Be more explicit in check lookup

Co-authored-by: Ernest W. Durbin III <[email protected]>
ewdurbin added a commit that referenced this pull request Jan 10, 2020
* Add initial hook-based check execution mechanism

* scratch/poc

* Add initial hook-based check execution mechanism

* Use sqlalchemy event hooks for malware checks

* Fix unit tests

* Add enum for MalwareCheckObjectType

* Add unit tests for init.

* Add tests for tasks, services, and utils.

Also, some small bugfixes in MalwareCheckFactory and the
get_enabled_checks method.

* Fix spurious task test.

* Add missing drop enum to downgrade function.

* Added TODO to dev/environment

* Be more explicit in check lookup

Co-authored-by: Ernest W. Durbin III <[email protected]>
ewdurbin added a commit that referenced this pull request Jan 13, 2020
* Add initial hook-based check execution mechanism

* scratch/poc

* Add initial hook-based check execution mechanism

* Use sqlalchemy event hooks for malware checks

* Fix unit tests

* Add enum for MalwareCheckObjectType

* Add unit tests for init.

* Add tests for tasks, services, and utils.

Also, some small bugfixes in MalwareCheckFactory and the
get_enabled_checks method.

* Fix spurious task test.

* Add missing drop enum to downgrade function.

* Added TODO to dev/environment

* Be more explicit in check lookup

Co-authored-by: Ernest W. Durbin III <[email protected]>
ewdurbin added a commit that referenced this pull request Jan 13, 2020
* Add initial hook-based check execution mechanism

* scratch/poc

* Add initial hook-based check execution mechanism

* Use sqlalchemy event hooks for malware checks

* Fix unit tests

* Add enum for MalwareCheckObjectType

* Add unit tests for init.

* Add tests for tasks, services, and utils.

Also, some small bugfixes in MalwareCheckFactory and the
get_enabled_checks method.

* Fix spurious task test.

* Add missing drop enum to downgrade function.

* Added TODO to dev/environment

* Be more explicit in check lookup

Co-authored-by: Ernest W. Durbin III <[email protected]>
ewdurbin added a commit that referenced this pull request Jan 16, 2020
* Add initial hook-based check execution mechanism

* scratch/poc

* Add initial hook-based check execution mechanism

* Use sqlalchemy event hooks for malware checks

* Fix unit tests

* Add enum for MalwareCheckObjectType

* Add unit tests for init.

* Add tests for tasks, services, and utils.

Also, some small bugfixes in MalwareCheckFactory and the
get_enabled_checks method.

* Fix spurious task test.

* Add missing drop enum to downgrade function.

* Added TODO to dev/environment

* Be more explicit in check lookup

Co-authored-by: Ernest W. Durbin III <[email protected]>
ewdurbin added a commit that referenced this pull request Jan 17, 2020
* Add initial hook-based check execution mechanism

* scratch/poc

* Add initial hook-based check execution mechanism

* Use sqlalchemy event hooks for malware checks

* Fix unit tests

* Add enum for MalwareCheckObjectType

* Add unit tests for init.

* Add tests for tasks, services, and utils.

Also, some small bugfixes in MalwareCheckFactory and the
get_enabled_checks method.

* Fix spurious task test.

* Add missing drop enum to downgrade function.

* Added TODO to dev/environment

* Be more explicit in check lookup

Co-authored-by: Ernest W. Durbin III <[email protected]>
ewdurbin added a commit that referenced this pull request Jan 27, 2020
* Add initial hook-based check execution mechanism

* scratch/poc

* Add initial hook-based check execution mechanism

* Use sqlalchemy event hooks for malware checks

* Fix unit tests

* Add enum for MalwareCheckObjectType

* Add unit tests for init.

* Add tests for tasks, services, and utils.

Also, some small bugfixes in MalwareCheckFactory and the
get_enabled_checks method.

* Fix spurious task test.

* Add missing drop enum to downgrade function.

* Added TODO to dev/environment

* Be more explicit in check lookup

Co-authored-by: Ernest W. Durbin III <[email protected]>
woodruffw pushed a commit to trail-of-forks/warehouse that referenced this pull request Feb 7, 2020
* Add initial hook-based check execution mechanism

* scratch/poc

* Add initial hook-based check execution mechanism

* Use sqlalchemy event hooks for malware checks

* Fix unit tests

* Add enum for MalwareCheckObjectType

* Add unit tests for init.

* Add tests for tasks, services, and utils.

Also, some small bugfixes in MalwareCheckFactory and the
get_enabled_checks method.

* Fix spurious task test.

* Add missing drop enum to downgrade function.

* Added TODO to dev/environment

* Be more explicit in check lookup

Co-authored-by: Ernest W. Durbin III <[email protected]>
ewdurbin added a commit that referenced this pull request Feb 11, 2020
* Add initial hook-based check execution mechanism

* scratch/poc

* Add initial hook-based check execution mechanism

* Use sqlalchemy event hooks for malware checks

* Fix unit tests

* Add enum for MalwareCheckObjectType

* Add unit tests for init.

* Add tests for tasks, services, and utils.

Also, some small bugfixes in MalwareCheckFactory and the
get_enabled_checks method.

* Fix spurious task test.

* Add missing drop enum to downgrade function.

* Added TODO to dev/environment

* Be more explicit in check lookup

Co-authored-by: Ernest W. Durbin III <[email protected]>
ewdurbin added a commit that referenced this pull request Feb 18, 2020
* Add initial hook-based check execution mechanism

* scratch/poc

* Add initial hook-based check execution mechanism

* Use sqlalchemy event hooks for malware checks

* Fix unit tests

* Add enum for MalwareCheckObjectType

* Add unit tests for init.

* Add tests for tasks, services, and utils.

Also, some small bugfixes in MalwareCheckFactory and the
get_enabled_checks method.

* Fix spurious task test.

* Add missing drop enum to downgrade function.

* Added TODO to dev/environment

* Be more explicit in check lookup

Co-authored-by: Ernest W. Durbin III <[email protected]>
ewdurbin added a commit that referenced this pull request Feb 18, 2020
* Add new models for malware detection. (#7118)

* Add new models for malware detection.

Fixes #7090 and #7092.

* Code review changes.

- FK on release_file.id field instead of md5
- Change message type from String to Text
- Change Enum class in model to singular form

* Add admin interface to view and enable checks (#7134)

* Add admin interface to view and enable checks

- Implement list, detail and change_state views (#7133)
- Add unit tests for check admin view

* Add comprehensive test coverage for check admin

* Add initial hook-based check execution mechanism (#7160)

* Add initial hook-based check execution mechanism

* scratch/poc

* Add initial hook-based check execution mechanism

* Use sqlalchemy event hooks for malware checks

* Fix unit tests

* Add enum for MalwareCheckObjectType

* Add unit tests for init.

* Add tests for tasks, services, and utils.

Also, some small bugfixes in MalwareCheckFactory and the
get_enabled_checks method.

* Fix spurious task test.

* Add missing drop enum to downgrade function.

* Added TODO to dev/environment

* Be more explicit in check lookup

Co-authored-by: Ernest W. Durbin III <[email protected]>

* Add malware check syncing mechanism (#7190)

* Add malware check syncing mechanism

* Code review changes.

* Refactor MalwareCheckBase. Fixes #7091. (#7196)

* Refactor MalwareCheckBase. Fixes #7091.

Add Foreign Keys in MalwareVerdicts for other types of objects
(Releases, Projects).

* Change verdict dict to kwargs.

* Add wipe-out functionality (#7202)

* Add wipe-out functionality

Related: #7133

* Call list explicitly

* Add rudimentary verdicts view. Progress on #6062. (#7207)

* Add rudimentary verdicts view. Progress on #6062.

Also, add some better testing logic for wiped_out condition.

* Code review changes.

- Conditionally show fields that are populated
- JSON pretty formatting

* Fix unit test bug.

- Use `get` instead of `filter` to look up verdict by pkey.

* simplify unit tests for verdicts view

* introduce malware queue (#7227)

* introduce malware queue

* correct syntax, apparently list of tuples documented doesn't work.

* Add backfill functionality to check admin #7094 (#7232)

* Add backfill functionality to check admin #7094

- Add backfill task
- Change lookup of checks to check_name instead of id
- Load checks that are also in "evaluation" state

* Add unit tests for backfill.

- Log number of runs executed by backfill
- Perform basic validation on sample_rate input
- Clean up other testing logic.

* Remove superfluous 'all()'

* Code review changes.

- Set backfill size to a fix number, not configurable via web ui.
- Backfill task enqueues run_check tasks
- Only retry if `check.run` fails, not if loading the check fails.
- Use exponential backoff for retries.

* Update warehouse/admin/templates/admin/malware/checks/detail.html

Co-Authored-By: Ernest W. Durbin III <[email protected]>

Co-authored-by: Ernest W. Durbin III <[email protected]>

* Refactor testing logic #7098 (#7257)

- Add `schedule` field to MalwareCheck model #7096
- Move ExampleCheck into tests/common/ to remove test dependency from
prod code
- Rename functions and classes to differentiate between "hooked" and
"scheduled" checks

* Event-based Malware check (#7249)

* requirements: Introduce yara

* [WIP] malware/check: SetupPatternCheck

In progress.

Introduces SetupPatternCheck, an implementation of an event-based
check that scans the `setup.py`s of release files for suspicious
patterns.

* malware/checks: Give MalwareCheckBase.run/scan args, kwargs

* malware: Add check preparation

Fiddle with the check/run signature a bit more.

* malware/checks: Unpack file path correctly

* docker-compose: Override FILES_BACKEND for worker

The worker needs to be able to see the "files" virtual host
during development so that malware checks can fetch their underlying
release files.

* [WIP] malware/checks: setup.py extraction

* malware/checks: setup_patterns: Fix enum, seek

* malware/checks: setup_patterns: Apply YARA rules

Each rule match becomes a verdict.

* malware/checks: setup_patterns: Prefer get over filter

* warehouse/{admin,malware}: Consistent enum names

Also enforce uniqueness for enum values.

* warehouse/{admin,malware}: More enum changes

* tests: Update admin, malware tests

* tests: Fix enum, more test fixes

* tests: Add prepare tests

* malware/changes: base: Unpack id correctly

* tests: Begin adding SetupPatternCheck tests

* malware/checks: setup_patterns: Fix enum

* tests: More SetupPatternCheck tests

* warehouse/malware: setup_patterns: Fix enums

* tests: More SetupPatternCheck tests

* tests: Add license header

* malware/checks: setup_patterns: Add TODO

* tests: More SetupPatternCheck tests

* tests: More SetupPatternCheck tests

* tests: Complete extraction tests for SetupPatternCheck

* tests: Fix test

* malware/checks: Add docstring for prepare

* malware/checks: blacken

* malware/checks: Document, expand YARA rules

* tests, warehouse: Restructure utilities

* malware: Order some enums, reduce SetupPatternCheck verdicts

* malware/models: Add missing __lt__

* malware/checks: Always embed the model object in the prepared arguments

Use it instead of performing a DB request in the check itself.

* malware/checks: Avoid raw bytes

* malware/changes: Remove unused import

* tests: Fixup malware tests

* warehouse/malware: blacken

* tests: Fill in malware coverage

* tests, warehouse: Add a benign verdict for SetupPatternCheck

* tests: blacken

* Implement scheduled checks #7093 (#7271)

* Implement scheduled checks #7093

- Rename `run_backfill` to `run_evaluation` in admin malware view
- Modify `run` and `scan` method signatures to accept `**kwargs`
- Extend `run_check` to accomodate scheduled check functionality

* Reduce unit test flakiness

* Code review changes.

Also replace `check.hooked_object` with `check.hooked_object.value` in
check detail template.

* tests, warehouse: enum fixes

* Fix lint error

Co-authored-by: William Woodruff <[email protected]>

*  Add verdicts view filtering capabilities #6062. (#7322)

*  Add verdicts view filtering capabilities #6062.

* Code review changes.

- Refactor tests to be parametrized.
- Pass `_query` to `route_path` in template.
- Remove `is None` from filter query, it adds nothing.

* Add verdict administrator review. Fixes #6062. (#7339)

* Add verdict administrator review. Fixes #6062.

- Add new `admin.verdicts.review` endpoint
- Change layout of verdict list and detail view and add forms
- Change sort order of the MalwareChecks, and update the tests

* Code review changes.

- Rename MalwareVerdict field `administrator_verdict` to `reviewer_verdict`.
- Change verdict review permission from `admin` to `moderator`.

* Misc cleanup and TODOs on malware checks. (#7355)

* Misc cleanup and TODOs on malware checks.

    - Change backfill function to invoke `IMalwareCheckService` interface
    - Add support for `kwargs to `IMalwareCheckService` interface
    - Rename variable from reserved word `file` to `release_file`
    - Add `FatalCheckException` for non-retryable exceptions
    - Replace `MALWARE_CHECK_BACKEND` in dev/environment

* Make `IMalwareService` the entrypoint for `run_check`

- Add `run_scheduled_check` task that invokes this interface.
- Remove useless utility method
- Move `FatalCheckException` into warehouse/malware/errors.py.

* malware/checks: PackageTurnover skeleton (#7321)

* malware/checks: PackageTurnover skeleton

* malware/checks: PackageTurnover: Add NOTE

* malware/checks: PackageTurnoverCheck: more work

* tests: blacken

* malware/checks: More PackageTurnoverCheck work

* malware/checks: Blacken

* malware/checks: Blacken

* package_turnover: Promote from indeterminate to threat

* tests: Begin adding package_turnover tests

* tests: Add remaining package_turnover tests

* tests: Drop unused imports

* warehouse: Drop (ww) from NOTE

* checks/package_turnover: Drop NOTE

Co-authored-by: Cristina <[email protected]>
Co-authored-by: William Woodruff <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants