Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(telemetry): anonymise properly model id #1654

Merged
merged 5 commits into from
Sep 13, 2024

Conversation

axel7083
Copy link
Contributor

@axel7083 axel7083 commented Sep 3, 2024

What does this PR do?

Prevent leakage of user information in the telemetry

Screenshot / video of UI

N/A

What issues does this PR fix or reference?

Fixes https://github.com/containers/podman-desktop-internal/issues/325

How to test this PR?

  • unit tests has been provided

@axel7083 axel7083 requested review from benoitf, jeffmaury and a team as code owners September 3, 2024 13:25
Copy link
Collaborator

@benoitf benoitf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not sending the hash of the model id ?

@axel7083
Copy link
Contributor Author

axel7083 commented Sep 4, 2024

why not sending the hash of the model id ?

Imported models do not have a hash, we would have to compute it on import, that feasible, but would require some work on the import side, and could lead to potential issues: what if the user import manually by editing the file, do we compute at startup ?

@benoitf
Copy link
Collaborator

benoitf commented Sep 4, 2024

@axel7083 hash of the id

@axel7083
Copy link
Contributor Author

axel7083 commented Sep 4, 2024

@axel7083 hash of the id

I mean, are we interested in this information for imported user ? It would be different for each imported path, I feel like having a fixed value for imported models (<imported>) would ease the filtering on the telemetry, as we don't have tons of hashes without meaning and can simply filter out this information.

@axel7083 axel7083 requested a review from benoitf September 5, 2024 08:56
@jeffmaury
Copy link
Contributor

@axel7083 hash of the id

I mean, are we interested in this information for imported user ? It would be different for each imported path, I feel like having a fixed value for imported models (<imported>) would ease the filtering on the telemetry, as we don't have tons of hashes without meaning and can simply filter out this information.

Yes but having a distinct value may help to detect patterns in telemetry

@axel7083 axel7083 force-pushed the fix/anonymise-model-logger branch from 3a39231 to 3b81fee Compare September 10, 2024 09:07
Copy link
Collaborator

@benoitf benoitf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we store the hash in the model object so we only compute it once ?

@axel7083
Copy link
Contributor Author

can we store the hash in the model object so we only compute it once ?

Here I replicated what we do for the PullImage telemetry on Podman Desktop, which is hashing the id. We have the sha256 for the model we put in the catalog, but not the one the user imported.

There a several method to import a model, through the UI or through the user-catalog editing. Therefore computing the hash of the model object (not the id) would require deeper change.

IMO we should not hash and use the imported models for telemetry, as we could deduce easily the models that users are using, which seems to be personal/private information ?

@benoitf
Copy link
Collaborator

benoitf commented Sep 10, 2024

personal/private information is if you're able to identify the user (like if the name of the model is the name of the user)

if you're able to see that people are using models being available on huggingface (through the hash), it's not because you can detect that someone is using that model that you know who is the person.

@axel7083
Copy link
Contributor Author

if you're able to see that people are using models being available on huggingface (through the hash), it's not because you can detect that someone is using that model that you know who is the person.

Okey, yeah I agree.

But hashing the model would need a deeper change here, which would need it own issue, next sprint.

@axel7083 axel7083 requested a review from benoitf September 11, 2024 08:22
Copy link
Contributor

@gastoner gastoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Great work!

@axel7083 axel7083 force-pushed the fix/anonymise-model-logger branch from 172b3ab to 2531b76 Compare September 13, 2024 09:27
@axel7083 axel7083 enabled auto-merge (squash) September 13, 2024 09:28
@axel7083 axel7083 merged commit 6a229f4 into containers:main Sep 13, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants