Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

22 unit tests #27

Merged
merged 8 commits into from
Sep 5, 2024
Merged

22 unit tests #27

merged 8 commits into from
Sep 5, 2024

Conversation

pat-alt
Copy link
Member

@pat-alt pat-alt commented Sep 2, 2024

  • applies formatting
  • adds format check to CI
  • adds a generall attack method and an in-place version attack! that can be dispatched over different attack types (currently these are functions, which we probably want to change)
  • no longer relying on custom crossentropy loss function which involves a bit of hard coding
  • removed some hard-coded rules from PGD, FGSM and AutoPGD, in particular the call on the input size

@pat-alt pat-alt linked an issue Sep 2, 2024 that may be closed by this pull request
Copy link

codecov bot commented Sep 2, 2024

Codecov Report

Attention: Patch coverage is 51.56250% with 31 lines in your changes missing coverage. Please review.

Project coverage is 41.32%. Comparing base (da6c85e) to head (47c72da).
Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
src/training/adversarial_training.jl 0.00% 16 Missing ⚠️
src/attacks/square/square_attack.jl 0.00% 8 Missing ⚠️
src/attacks/square/utils.jl 0.00% 4 Missing ⚠️
src/attacks/common_utils.jl 0.00% 2 Missing ⚠️
src/attacks/autopgd/utils.jl 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            main      #27       +/-   ##
==========================================
+ Coverage   0.41%   41.32%   +40.90%     
==========================================
  Files          9       10        +1     
  Lines        242      242               
==========================================
+ Hits           1      100       +99     
+ Misses       241      142       -99     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pat-alt pat-alt requested a review from rithik83 September 2, 2024 14:14
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs some work, so I've for now left this mostly unchanged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I will have to come back to this soon

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to rely on this function. As far as I understand, we would generally sue the same loss function for the attack as for backprob?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. That function is kind of 'legacy' from back when I did autopgd for a batch rather than individual.

there i needed to save the CE loss for every datapoint so I made this so that with one call it would find CE for everything in the batch (calling logitcrossentropy aggregates it all as is done for batch gradient descent ig)

passing on the same backprob loss function now is good. but maybe we leave this for now (because if we want to revert to generating adv examples for a batch for parallelism this can be useful) - what do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeh happy to keep in the repo for now, but would try to avoid relying on it. I think we should be able to implement parallelism through broadcasting, relying solely on what Flux offers in terms of loss functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good then! Basically all I want to avoid wrt this one, is aggregating CE losses for all examples in a batch. every example should have its own CE loss (as with autoPGD we use each example's CE loss or targeted DLR loss to decide whether to halve the step size eta)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you have in mind can be achieved by just using the agg keyword in the standard Flux.Losses functions:

julia> Flux.Losses.logitcrossentropy(yhat[:,1:3],ytrain[:,1:3], agg=(x -> x))
1×3 Matrix{Float32}:
 0.523598  0.242503  0.454158

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where I've added the general function(s) for applying an attack @rithik83. I think that instead of dispatching on functions, we probably want to dispatch on concrete types instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to also add the AutoPGD and Square to the available attacks? Afaik the impls work (or at least they produce some perturbation that makes sense) and they take in the same main inputs (model, x, y before the ';'). remaining in kwargs ofc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh will have a go at this now

@@ -7,11 +7,19 @@ using Flux: onehotbatch, onecold

# White-box Fast Gradient Sign Method (FGSM) attack by Goodfellow et al. (arxiv.org/abs/1412.6572)
# Code adapted from (github.com/jaypmorgan/Adversarial.jl)
function FGSM(model, x, y; loss=cross_entropy_loss, ϵ = 0.3, min_label=0, max_label=9, clamp_range = (0, 1))
w, h, c = size(x)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have generalized this to ensure these attacks can also be used with non-image data (see test file)

ϵ = 0.3,
clamp_range = (0, 1),
)
x_adv = reshape(x, size(x)..., 1) # why is this necessary @rithik83?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this reshape?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model I used to test, accepted inputs of the shape (w, h, c, batch_size). This function (and all attacks) have an input of a single image/datapoint etc (shape: (w, h, c)). so to make it compatible with the model to be able to test and iterate, I reshaped.

How do you suggest we do it? passing inputs as a batch probably is better long-term (speedups and what not) but then we may have to alter the attack algorithms to be able to deal with it.
or for now, pass an input as (w, h, c, 1) itself so it doesn't have to keep being reshaped

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably want to use broadcasting for that instead

y;
loss = cross_entropy_loss,
ϵ = 0.3,
clamp_range = (0, 1),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clamp range seems specific to images (which might be okay considering that's the main domain people in this field seem to work on)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it was like that to be able to test on MNIST. ofc if we are doing a different kind of data we can just pass on a different clamp range which will instead be used, and that should be fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't really touched this, just formatting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that it's fine to have this here for prototyping, but I would actually prefer to focus on delivering functionality here that can be easily integrated with conventional Flux workflows. Happy to discuss!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the adv-training mechanism?
Yeah we can discuss on this one, I'm not too sure I follow - particularly wrt conventional flux workflows as I am not too familiar with them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that we don't necessarily need to ship a function for training where we either impose too many restrictions (e.g. in the current version you train wrt to clean and attacked samples) or we need to add a whole lot of arguments to maintain flexibility.

Flux itself has a train! convenience method for plain vanilla training, but the docs mostly go through complete examples with custom training loops, because most of the time the plain vanilla approach will not be flexible enough users.

So I think an idea could be to try and overload the train! method for adversarial training or just not ship a method for training at all and demonstrate through examples how adversarial training can be achieved (e.g. simple toy example for testing below).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should ship functionality for adversarial training, as it would make the package more well-rounded in terms of being a 'one-stop shop' for all things adversarial ML in Julia.

I will definitely have a look at it, what I have in mind is to allow PGD/FGSM training alone for now (pretty much most adversarial training does this, or variants of FGSM training to bridge the gap - note that PGD training is a lot better but also more expensive) and train on only attacked samples which is the correct way to do it.

What we can do is have a 'baseline' implementation of AT adhering to the above - here we allow the user to specify PGD or FGSM and set the iterations, step-size and epsilon. We can also have a demonstration as you said, to show how custom ways of AT can be achieved (say if someone wants to train on both, or re-train on attacked samples later etc)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, you are right - from my own experiences AT can be done in different ways (for ex: my way above, gradually increasing the attack strength with epochs, training on clean + perturbed etc) and we can allow the user to decide. maybe add a few scripts as you mentioned.


# Attack the input:
println("Attacking input: $input")
input = attack(attack_type, input, label, _model, loss)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rithik83 note how I pass on the existing loss function that's also used for backprob.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I see. this is a lot nicer

actually used to have it this way too but switched over to custom one when i batched inputs. but indeed as you said we can rely on flux implementations when we get back to that

Copy link
Member Author

@pat-alt pat-alt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments here. Hopefully that gives you some sense of what type changes I think are necessary, but perhaps it's also a good idea to jump on a call.

I'll keep this here as is for you to review before making more changes to the package.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manifest file is usually git-ignored, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, definitely for packages, although this is for the test/ environment. But according to this probably better to add to gitignore.

Random.seed!(0)
w, h, c = size(x)
n_features = c*h*w
n_features = c * h * w

# Initialization (stripes of +/-ϵ)
init_δ = rand(w, 1, c) .|> x -> x < 0.5 ? -ϵ : ϵ
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very specific to images, not sure how to address so will not use for testing (cannot be easily generalized)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, square's implementation seems tailored to images. I followed the original paper in my impl

Copy link
Collaborator

@rithik83 rithik83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, a lot of necessary changes.

@rithik83 rithik83 merged commit a02d3e9 into main Sep 5, 2024
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unit tests
2 participants