Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to control the privacy budget #2

Open
TeDiou opened this issue Dec 24, 2023 · 5 comments
Open

How to control the privacy budget #2

TeDiou opened this issue Dec 24, 2023 · 5 comments

Comments

@TeDiou
Copy link

TeDiou commented Dec 24, 2023

As we set the private = True, in your source code it only calculates the privacy budget. How can we control the privacy budget? By adding a if statement?

@zhao-zilong
Copy link
Contributor

Hi @TeDiou

If you set private = True, then you enable the training with DP. And for calculate privacy budget, the code block is starting from here:

And from this line of code:

rdp = compute_rdp(self.micro_batch_size / train_data.shape[0], self.sigma, steps, lmbds)

You can see that to calculate RDP, the batch_size, dataset size, sigma and training steps are four features influencing the privacy budget.

then in the following line:

epsilon, _, _ = get_privacy_spent(lmbds, rdp, target_delta=1e-5)

Epsilon is the privacy budget, can you add an if in the beginning of the loop to control the training only if the epsilon is less than a certain value.

Hope that solves your question.

@TeDiou
Copy link
Author

TeDiou commented Dec 26, 2023

Thanks for your answer!

@TeDiou
Copy link
Author

TeDiou commented Dec 27, 2023

Sorry to bother u, why this dp-synthesizer.sample method is different from the ctabganplus.sample 。The two models differ only in a privacy module. However, in ctabganplusdp, the generation part requires multiple loops for generation.

@zhao-zilong
Copy link
Contributor

Hi @TeDiou
Yeah, we need a loop to generate enough synthetic data, the reason is because we implemented a filter to filter out the invalid generation, so it takes more sampling than the required data number. Check this issue answer:
Team-TUD/CTAB-GAN-Plus#7 (comment)

@TeDiou
Copy link
Author

TeDiou commented Dec 28, 2023

I got that. Thanks a lot!_

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants