Add functionality to compute membership inference risk for each individual point #146

lwsong · 2020-12-03T03:01:39Z

Following our recent paper https://arxiv.org/abs/2003.10595, I implement the code to compute the privacy risk score for each individual sample, which represents its likelihood of being a member. The main function is defined as "_compute_privacy_risk_score" in "membership_inference_attack.py". The function will compute risk scores for all training and test points, which are passed to the "SingleRiskScoreResult" class in "data_structures.py". I also add "codelab_privacy_risk_score.ipynb" to demonstrate how to run the code. Test cases for privacy risk scores are also added.

review-notebook-app · 2020-12-03T03:01:43Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

CdavM

Looks good overall. I will probably need to make some formatting changes for consistency (like the printing format). Hopefully this will be merged by the end of the week.

tensorflow_privacy/privacy/membership_inference_attack/data_structures.py

CdavM

Hi @lwsong ,I took another look and left some minor comments. Also, can you add the required types in function signatures (and change regular lists to np.ndarrays)?

...orflow_privacy/privacy/membership_inference_attack/codelabs/codelab_privacy_risk_score.ipynb

lwsong · 2020-12-10T23:51:32Z

Hi @CdavM , I just fixed all these issues you mentioned. Please take a look, thanks!

CdavM · 2020-12-14T09:42:51Z

Hi @lwsong !

Thanks for making these changes. This looks great! I can't merge the PR, the button is grayed out and there's a warning saying "This branch has conflicts that must be resolved". Can you resolve the conflicts?

Also, a meeting would be great. I worry that finding a good time with the holiday season will be hard, but I'll set something up in early January. The meeting isn't blocking this PR anyway.

lwsong · 2020-12-14T20:23:48Z

Hi @CdavM , just resolved the conflicts. Let me know if there is an issue.
Sure, let's meet in early January.

dvadym

Thanks for contributing, it looks great!

I left comments. Sorry maybe there are more of them, it's very common in software development to have more comment on code review. Of course I might have missed something in comments or wrote something incorrect, that please respond (just for the context, usually comments on code review are kind of the discussions).

tensorflow_privacy/privacy/membership_inference_attack/membership_inference_attack.py

dvadym · 2020-12-15T18:56:42Z

tensorflow_privacy/privacy/membership_inference_attack/membership_inference_attack.py

@@ -172,6 +174,85 @@ def run_attacks(attack_input: AttackInputData,
      privacy_report_metadata=privacy_report_metadata)


+def _compute_privacy_risk_score(attack_input: AttackInputData,
+                                num_bins: int = 15) -> SingleRiskScoreResult:
+  """compute each individual point's likelihood of being a member (https://arxiv.org/abs/2003.10595)


Could you please elaborate in comments more what this score means?

Added more explanation in the comments

tensorflow_privacy/privacy/membership_inference_attack/membership_inference_attack.py

dvadym · 2020-12-15T19:35:16Z

tensorflow_privacy/privacy/membership_inference_attack/data_structures.py

+    summary = []
+    for single_result in self.risk_score_results:
+      single_summary = single_result.collect_results()
+      for line in single_summary:


Nit: summary.extend(single_summary) and no need to use for

already updated the code!

dvadym · 2020-12-15T20:32:39Z

tensorflow_privacy/privacy/membership_inference_attack/membership_inference_attack.py

+
+  min_log_value = np.amin(np.concatenate((train_log_values, test_log_values)))
+  max_log_value = np.amax(np.concatenate((train_log_values, test_log_values)))
+  bins_hist = np.linspace(min_log_value, max_log_value, num_bins+1)


It looks replacement
np.linspace -> np.logspace, would make log histograms on train_values/test_values (i.e. w/o logs).
Is it correct replacement?
If yes, could you please update the code? (it's always better to have simpler code for understanding and maintenance)

Changed the code to np.logspace!

dvadym · 2020-12-15T20:33:24Z

tensorflow_privacy/privacy/membership_inference_attack/membership_inference_attack.py

+
+  train_hist, _ = np.histogram(train_log_values, bins=bins_hist)
+  train_hist = train_hist/(len(train_log_values)+0.0)
+  train_hist_indices = np.fmin(np.digitize(train_log_values, bins=bins_hist),num_bins)-1


why -1? doesn't np.digitize return 0-based indices?

bins_hist has num_bins+1 elements, and np.digitize returns values from 1 to num_bins+1 (https://stackoverflow.com/questions/40880624/binning-in-numpy)

tensorflow_privacy/privacy/membership_inference_attack/data_structures.py

dvadym · 2020-12-15T20:50:28Z

tensorflow_privacy/privacy/membership_inference_attack/data_structures.py

+          recall_list.append(true_positive_normalized)
+    return np.array(meaningful_threshold_list), np.array(precision_list), np.array(recall_list)
+
+  def collect_results(self, threshold_list=np.array([1,0.9,0.8,0.7,0.6,0.5])):


Using default arguments with mutable types is not allowed by Google Python Styleguide
https://google.github.io/styleguide/pyguide.html#212-default-argument-values

Please remove using default arguments here.

The reason is that it's error-prone (more details in style guide)

removed the default arguments

sushkoy · 2020-12-15T21:22:40Z

...orflow_privacy/privacy/membership_inference_attack/codelabs/privacy_risk_score_codelab.ipynb

+   "source": [
+    "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
+    "  <td>\n",
+    "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",


Minor: This link and the link below seem to point to an incorrect colab.

sushkoy · 2020-12-15T21:37:55Z

...orflow_privacy/privacy/membership_inference_attack/codelabs/privacy_risk_score_codelab.ipynb

+    "\n",
+    "This part shows how to use the privacy risk score.\n",
+    "\n",
+    "For each data slice, we compute privacy risk scores for both training and test data. We then set a threshold on risk scores (an input is inferred as a member if and only if its risk score is higher than the threshold) and compute the attack precision and recall values"


Would that be possible to add some interpretation of the obtained results?

Some questions that we could highlight:

How does the precision / recall figures compare to the membership inference attack? As you have the results for MIA right above, it might be helpful to compare the two methods.

Are there any samples that have high risk scores? Is there anything special about those samples?

What is the distribution of privacy risk scores? We could probably plot a simple histogram.

For the precision-recall figures, it will be pretty similar to the threshold attacks based on prediction loss or entropy since the privacy risk score is computed based on the distributions of prediction loss or entropy over training and test data.
The importance of privacy risk score analysis is that we actually compute a risk value for each sample such that we can know which samples have high risks. The precision-recall metric is just one way to present the results.

Indeed, some samples have high risk scores. In the codelab running example, class 3 and class 4 have certain training samples with the risk score of 1. It is a very interesting future direction to explore why those samples have high risks.

I agree that we can plot the histogram of the privacy risk scores.
I will suggest a meeting after the holiday season to thoroughly discuss what is the best way to present the privacy risk score results.

Setting up a meeting sounds good, thank you. Some comments (not blocking for this PR, we can address those after the meeting):

it will be pretty similar to the threshold attacks based on prediction loss or entropy since the privacy risk score is computed based on the distributions of prediction loss or entropy over training and test data.

This sounds reasonable. Can we compare these numbers in the codelab, as a sanity check?

It is a very interesting future direction to explore why those samples have high risks.

As a low hanging fruit, could we plot a few pictures with low and high risk? That might already be very insightful.
We're looking for some way to show how this metrics work in practice.

importance of privacy risk score analysis is that we actually compute a risk value for each sample such that we can know which samples have high risks

This makes me realize that some of the trained attacks, and the threshold attack also give some kind of score, which we threshold to classify into training / non-training examples. For the threshold attack, this score is simply the loss. For the neural network classifiers, this is the output of the softmax layer.

Conceptually your score a very similar, except that you provide a different empirical measure.

We discussed this internally, and we think it might make sense to consolidate this with the other membership attacks instead of having this as a separate codepath. This might make the implementation cleaner and simpler.

We're happy to do this refactoring ourselves, just wanted to let you know about this.

I added the AUC and advantage values into the new codelab file. As expected, the results are similar to threshold attacks.

I also plotted several figures with high and low risks for each class label in the codelab file, please take a look.

I agree that neural network attack classifiers or threshold attacks can also give some kind of scores.
The advantage of our proposed privacy risk score metric is that it closely captures the real likelihood of being in the training set. For example, if we count all training and test samples with the privacy risk score around (let's say) 0.8, then among those samples, it is indeed around 80% of them are from the training set. While neural network classifier output usually does not well capture the real likelihood of being a member. You can check more in our USENIX paper (Figure 3 and Figure 11 in https://arxiv.org/pdf/2003.10595.pdf)

Sure, I can see the benefits of consolidating the privacy risk score part with other attacks.

sushkoy

Thanks a lot! Added a small question on the colab. It would be helpful to add interpretation/discussion of results.

lwsong · 2020-12-16T22:02:57Z

Hi @dvadym @sushkoy , I updated the code following your comments. For the codelab file, I feel it is better to have a meeting to thoroughly discuss what is the best way to present the privacy risk score results before I further change the code. So I just leave the file as it is. Let me know if you have more comments or questions, thanks!

dvadym

Thanks for addressing comments!

We've discussed internally about the metric name. Risk is very generic term and for library users (which might be unfamiliar with details) it might create a wrong impression, eg. when the training data are public there is no risk in exposure any information about the train data. It's better to represent in the name more precisely what it means.
Would it be possible to have some name that represents it? Eg. train_probability (just for example). WDYT?
It's fine to have in comments references to privacy score, since it's how it's named in the paper.

tensorflow_privacy/privacy/membership_inference_attack/data_structures.py

lwsong · 2020-12-17T21:38:11Z

Hi @dvadym @sushkoy , I have added more results (AUC and advantage values, figure plots) in the codelab file.
For the name, I have changed "risk score" to "membership_probability" in all related code, including the codelab. Let me know if there are other questions or comments, thanks.

dvadym

Thanks!

lwsong added 3 commits December 2, 2020 18:57

add privacy risk score

21a891c

codelab for privacy risk score

d80df35

add test cases for privacy risk score

bf65f55

google-cla bot added the cla: yes cla: yes label Dec 3, 2020

CdavM approved these changes Dec 10, 2020

View reviewed changes