Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalizing Paper Story #5

Open
akondrahman opened this issue Nov 11, 2020 · 20 comments
Open

Finalizing Paper Story #5

akondrahman opened this issue Nov 11, 2020 · 20 comments
Assignees

Comments

@akondrahman
Copy link
Contributor

akondrahman commented Nov 11, 2020

Selling Point (Option-1)

Creating this issue so that discussion on definitions does not get lost. Here is how I am defining forensic anti-patterns:

Forensic anti-patterns for machine learning are absence of coding patterns in source code that are necessary to capture unexpected behaviors within a machine learning project.  

Note to self:

Counter-argument: forensic anti-patterns are hard to detect e.g. we can never conclusively say sth. is missing or not logged. If developers do not log X is the focus of the paper, then paper may get rejected.

@akondrahman akondrahman self-assigned this Nov 11, 2020
@akondrahman
Copy link
Contributor Author

akondrahman commented Nov 11, 2020

Selling Point (Option-2)

Suggesting another selling point:

Possible Title: Tell Me What: Towards Security-focused Logging for Machine Learning Development
Possible RQs:

RQ1: What security-related events can be logged for machine learning development? 
RQ2: How frequently do security-related events appear in machine learning development? How frequently are security-related events logged in machine learning implementations? 
RQ3: How do practitioners perceive the identified security-related events for machine learning? 

@akondrahman akondrahman changed the title Definition of forensic anti-patterns for machine learning Finalizing Paper Story Nov 13, 2020
@akondrahman
Copy link
Contributor Author

akondrahman commented Nov 13, 2020

Selling Point (Option-3)

Possible RQs

RQ1: What categories of security-relevant code snippets can be logged for machine learning development?   
RQ2: How frequently do security-relevant code snippets appear in machine learning development? How frequently are security-relevant code snippets logged in machine learning development?
RQ3: How do practitioners perceive the identified security-relevant code snippets for machine learning development? 

The problem with security-relevant code snippets is that it can also include insecure coding snippets, which we are not detecting

@akondrahman
Copy link
Contributor Author

Selling point 4 (building on option#2)

King has identified mandatory log events ... can we build on top of King to find security log events.
King says A mandatory log event is an action that must be logged in order to hold the software user accountable for performing the action
We will say A security log event is an action expressed by source code elements that should be logged to perform post mortem analysis of security attacks in machine learning
We will identify security log events for ML using:

  1. Manually inspect each Python file
  2. Identify source code elements
    a. that perform any of the following actions identified as mandatory by King: create, read, update, delete, print failure; and
    b. that can be used to conduct a security attack as reported by prior work

Another option is to say adversarial log event instead of security log event
If we want to tone it down we can say Likely adversarial log events or candidate security log event instead of security log event

@akondrahman
Copy link
Contributor Author

akondrahman commented Nov 18, 2020

Useful definitions from Chuvakin's book:

An event is a single occurrence within an environment, usually involving an attempted state change
An event field describes one characteristic of an event
An event record is a collection of event fields
A log is a collection of event records
Logging is the act of collecting event records into logs
Alert or alarm is an action taken in response to an event, usually intended to get the attention of someone or sth.

@akondrahman
Copy link
Contributor Author

Page#235 of Chuvakin's book to motivate the paper better

@akondrahman
Copy link
Contributor Author

akondrahman commented Nov 18, 2020

May be it will not be wise to submit bug reports ... it is possible that a lot of people will say no. Better to do a survey.
Use page#2 as motivation from Security Engineering for Machine Learning

@akondrahman
Copy link
Contributor Author

In the discussion section need to say why automated log assistant was not done and can be done in future ... groundwork, perceptions etc.

@akondrahman
Copy link
Contributor Author

akondrahman commented Nov 20, 2020

Selling point 5

Forensic events: A forensic event in machine learning is an action expressed by source code elements that should be logged to perform post mortem analysis of security attacks in machine learning

@akondrahman
Copy link
Contributor Author

akondrahman commented Nov 20, 2020

Selling point 6

Forensic-likely coding patterns can be one term that we can use. This will require submitting bug report that will not give us good response rate. Can frame it as categories of forensic-likely coding patterns and see if devs agree with that.

Example forensic-likely coding patterns are load, read methods used to read datasets for training.

Definition: forensic-likely coding patterns are recurring coding patterns that express a mandatory log event needed to perform post mortem analysis of security attacks.

Category names:

  • Poison forensics
  • Perturbation forensics

@akondrahman
Copy link
Contributor Author

akondrahman commented Nov 21, 2020

Selling point 7 (credit to @effat )

Limit scope by focusing on adversarial machine learning, like what to log to diagnose adversarial attacks on machine learning ... need to define:

  • adversarial machine learning
  • what is an attack in adversarial ML
  • example, simple attack
  • attack types from Papernot et al.

Follow the train of thought: initially it was not clear why different from King, then definition of adversarial ML, then attack in the context of adversarial ML, then example attacks, how different actions map to attacks, interesting names like reinforcement learning environment

@akondrahman
Copy link
Contributor Author

akondrahman commented Nov 22, 2020

Selling Point 7 (Contd.)

What categories map to what attacks:
  1. Load training data can facilitate data poisoning attacks [https://ieeexplore.ieee.org/document/8406613 <SURVEY_PAPER>]
  2. Load pre-trained model can facilitate model poisoning attacks [https://arxiv.org/pdf/1911.12562.pdf (Finding-11) <SURVEY_PAPER>]
  3. Download data from remote source can facilitate attacks due to malformed input [https://ieeexplore.ieee.org/document/8424643][https://arxiv.org/pdf/2007.10760.pdf <SURVEY_PAPER>]
  4. Load classification labels from file can facilitate label perturbation attack [https://ieeexplore.ieee.org/document/8406613 <SURVEY_PAPER>]
  5. Load pipeline configuration can facilitate physical domain attacks [https://ieeexplore.ieee.org/document/8406613 <SURVEY_PAPER>]
  6. Update in reinforcement learning environment can facilitate strategically timed attacks [https://www.ijcai.org/Proceedings/2017/525] and neural network policy attacks [https://research.google/pubs/pub46154/] and enchanting attacks [https://arxiv.org/pdf/1801.00553.pdf <SURVEY_PAPER>]
  7. Reading model results can be used to detect model stealing attacks [https://ieeexplore.ieee.org/document/8979377][https://arxiv.org/pdf/1911.12562.pdf <SURVEY_PAPER>]

policy attacks need policy detection ... is a set of steps and values ... see: https://stackoverflow.com/questions/46260775/what-is-a-policy-in-reinforcement-learning

@akondrahman
Copy link
Contributor Author

akondrahman commented Nov 22, 2020

Selling Point 7 (Contd.)

Names
  • Forensic coding patterns (First choice)
  • Forensic-likely coding patterns
  • Candidate forensic coding patterns

@akondrahman
Copy link
Contributor Author

Selling Point 7 (Contd.)

Possible Category Names (Version-1):
  1. Poisoned data forensics
  2. Model forensics
  3. Download forensics
  4. Classification label tracing
  5. Configuration forensics
  6. Policy forensics in reinforcement learning
  7. Prediction result tracking

@akondrahman
Copy link
Contributor Author

@fbhuiyan42 ... hope you are following this thread. This is where you discuss and ask questions.

@akondrahman
Copy link
Contributor Author

Selling Point 7 (Contd.)

Possible Category Names (Version-2):
  1. Poisonous training data
  2. Model poisoning
  3. Remote downloads
  4. Classification label perturbations
  5. Pipeline forensics
  6. Policy forensics in reinforcement learning
  7. Prediction result tracking

@akondrahman
Copy link
Contributor Author

Selling Point 7 (Contd.)

Possible Category Names (Version-3 to accomodate supervised learning):
  1. Poisonous training data
  2. Model poisoning
  3. Remote downloads
  4. Classification label perturbations
  5. Pipeline forensics
  6. Prediction result tracking

@fbhuiyan42
Copy link
Contributor

Are we planning to present the paper only for supervised projects? I thought we are presenting all types of projects, the category "Policy forensics in reinforcement learning" being applicable only for reinforcement learning.

@akondrahman
Copy link
Contributor Author

@fbhuiyan42

This will depend on how clear your project classification is: we will do analysis on projects that are clearly labeled as supervised, unsupervised, or reinforcement. As far as I can remember, you were confidently able to classify supervised learning projects. Correct me if I am wrong.

@fbhuiyan42
Copy link
Contributor

I am confident about the reinforcement projects also. But in that case, yes, I agree, without the unsupervised projects, it's better not to report the RL projects also.

@akondrahman
Copy link
Contributor Author

Yes. We need to tell a consistent story. That is why we will skip reinforcement-related findings for this project.
We will save the reinforcement results for a short paper or sth. after this one has a home.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants