Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add audit functionality #113

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

add audit functionality #113

wants to merge 1 commit into from

Conversation

blancoj
Copy link
Contributor

@blancoj blancoj commented Mar 12, 2021

No description provided.

Copy link
Member

@botimer botimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! One specific comment inline that would drive a little cleanup elsewhere. We'll need to fix up whitespace, indentation, but functionally, I think this is very close.

start_audit(user: User.system_user, audit: audit, packages: packages)
end

def self.start_audit (user, audit, packages)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not pass the packages here. We're implying that it should be an audit of everything, so if we do the lookup of Package.stored here and update the audit with the count before it is saved, we can drop the parameter and duplicate lookups.

packages.each do |package|
AuditFixityCheckJob.perform_later(package: package, user: current_user, audit: audit)
end
AuditManager.start_audit (user: current_user, audit: audit, packages: packages)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually need to squish the parentheses and method name together... It's a weird Ruby parsing thing.

Copy link
Member

@botimer botimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super close now! Final comments inline.


packages = Package.stored
audit = Audit.new(user: current_user, packages: packages.count)
audit = Audit.new(user: current_user, packages: Package.stored.count)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we're not saving in this method anymore, we don't have to set the package count. Just give it what we know here, but the audit method cannot know -- the user.

@@ -0,0 +1,14 @@
class AuditManager
def self.system_audit
audit = Audit.new(user: User.system_user, packages: Package.stored.count)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in the controller, just set the user on the audit. Let start_audit query and set the count.


def self.start_audit (user, audit)
audit.save
packages = Package.stored
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this before the save, and set the count on the audit object.

Copy link
Member

@botimer botimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further changes requested. Seeing it fresh made me reconsider whether to have a standalone class or a job subclass -- see inline for detailed thoughts.

Comment on lines 21 to 24
packages = Package.stored
audit = Audit.new(user: current_user, packages: packages.count)
audit = Audit.new(user: current_user)

resource_policy.new(current_user, audit).authorize! :save?
audit.save

packages.each do |package|
AuditFixityCheckJob.perform_later(package: package, user: current_user, audit: audit)
end
AuditManager.start_audit(user: current_user, audit: audit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the total logic is brief and each line is simple, I would remove the blank lines between them and put one before the response. Then the business part is grouped together, and the web part finishes the method alone.

Comment on lines 1 to 5
require "audit_manager"

# To run it
# bin/rails runner app/jobs/do_audit.rb
AuditManager.system_audit()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is unnecessary because rails runner works directly with ruby code. In this case:

bin/rails runner "AuditManager.system_audit"

This is why we chose to make it a class method rather than an instance method -- to keep it concise.

https://guides.rubyonrails.org/command_line.html#bin-rails-runner

@@ -0,0 +1,15 @@
class AuditManager
Copy link
Member

@botimer botimer Mar 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I suggested making a standalone class, I may have been unnecessarily idealistic. These methods can go directly on AuditFixityCheckJob if we choose. There is a trade-off, here...

The ActiveJob subclasses behave somewhere between classes and instances in a way that is not obvious. When you call the perform_later method on the class, it enqueues a job (serializing all of the parameters, because the workers are in a separate Ruby process). When that job is processed, an instance of your job class is created with no parameters, and the perform method is called with the deserialized versions of the parameters you passed to perform_later. (phew!)

You can add other instance or class methods to the ActiveJob subclasses. I initially suggested a separate class to avoid any confusion about the perform behavior and other entrypoints to the class. However, having a class in the jobs directory that doesn't behave like the others is its own source of confusion. We could put this class somewhere else, but here, we're taking advantage of the conventional Rails directory structure and auto-loading.

Reconsidering, I think we should make this something more like:

class RepositoryAuditJob < ApplicationJob
  def perform(user: User.system_user, audit: Audit.new(user: user))
    # ...
  end
end

Note that the default value for audit uses the user parameter. This is totally fine in Ruby. The invocation from the controller would be similar to what it is now:

  RepositoryAuditJob.perform_later(user: current_user, audit: audit)

And the "runner" invocation would take advantage of the default parameters:

bin/rails runner RepositoryAuditJob.perform_later

I will note, this is still a little awkward in the design because we expect a user, and an audit instance that knows the user... but it follows job conventions more closely, and is amenable to refactoring. The two-stage policy check in the controller is the issue there, and should be cleaned up such that the authorization does not depend on on Audit instance. Then, the Audit can be created wholly within in this job. That can happen now, or in another step.

Comment on lines 9 to 10
# Sys Admin username (used for system audit)
admin_username: (sysadmin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it some more, this username should not be configurable, and it's not really the "administrator". This is meant to represent the system itself.

@@ -12,6 +12,10 @@ class User < ApplicationRecord
# Assign an API key
after_initialize :add_key, on: :create

def self.system_user
new(username: Chipmunk.config["admin_username"], email: Chipmunk.config["admin_useremail"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a literal (system) is better for the username here. It's already a bit of a concession that we have a fake user and it must have a username/email, rather than a proper notion of the system taking action. I cannot imagine a scenario where that username should be configured. It is "the system", not an unnamed administrator.

The email key here is mismatched with the config file. I think admin_email like it is in the config file is fine as it stands. In the grand scheme, I'd prefer to formalize a notification scheme and recipients, but this has to do for now.

@@ -0,0 +1,15 @@
class AuditManager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is now left over. It should go away.

resource_policy.new(current_user, audit).authorize! :save?
audit.save
RepositoryAuditJob.perform_later(user: current_user, audit: audit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a little worried about this... https://travis-ci.org/github/mlibrary/chipmunk/jobs/763053000#L1255

Because of the assumptions built into ActiveJob, it tries to serialize any ActiveRecord model instances. It needs them to be saved to do so. I think there are two solutions, here:

  1. Remove the audit parameter from this call, so it is created in the job.
  2. Switch to perform_now.

I think what we really want is the first option. The fact that we do authorization on the audit instance is strange and redundant, anyway. If the rules would differ at the instance, we could enforce that policy within the job. What we're seeing here is one of the fundamental conceptual issues with just how "active" the model classes are in typical Rails patterns, and how the REST orientation cutting through the entire application can cause some real awkwardness.

Basically, new and create have the same rule for the Audit type, so we should take out the instance and resource policy check entirely. This becomes a very simple, two-line method.

@@ -0,0 +1,10 @@
class RepositoryAuditJob < ApplicationJob
def perform(user: User.system_user, audit: Audit.new(user: user))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't need to pass the audit, we can make it in the body of the method now.

I think the serialization would also choke on the system user with perform_later, so we will have to try RepositoryAuditJob.perform_now to be sure rails runner can run the job with a transient instance.

@blancoj blancoj force-pushed the issue-audit branch 2 times, most recently from 17baed9 to 25a51ba Compare March 16, 2021 14:45
Copy link
Member

@botimer botimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something weird happened in the CI run, but it did reveal a bug. We removed the audit instance entirely from the controller method, which we then try to use:

https://travis-ci.org/github/mlibrary/chipmunk/jobs/763122354#L1510

A suggestion of something to try inline.

@@ -17,16 +17,8 @@ def show

def create
collection_policy.new(current_user).authorize! :new?
RepositoryAuditJob.perform_later(user: current_user)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Last, chance, ActiveJob...

If we're trying to preserve the ability to start an audit through the API, and the RESTful nature, we need the job to create the audit, enqueue the package jobs, and return the audit instance, so we can redirect to it with a 201.

If we can't get it to cooperate by switching this to a perform_now and giving us a real return value, it's back to a standalone class.

app/jobs/repository_audit_job.rb Show resolved Hide resolved
@blancoj blancoj force-pushed the issue-audit branch 4 times, most recently from d8fa586 to 64a114c Compare March 17, 2021 20:09
packages = Package.stored
audit = Audit.create(user: user, packages: packages.count)
packages.each do |package|
AuditFixityCheckJob.perform_now(package: package, user: user, audit: audit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops! We can't perform_now on the individual items. We just want to initiate the audit, scheduling the individual packages checks (peform_later), and then return the audit.

Copy link
Member

@botimer botimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the controller test passing, I believe that perform_now is returning the value from perform in process. I think this is ready to go. Nice work sticking with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants