Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking should be opt-in and not use the SECRET_KEY #609

Closed
tfeldmann opened this issue Apr 23, 2024 · 14 comments
Closed

Tracking should be opt-in and not use the SECRET_KEY #609

tfeldmann opened this issue Apr 23, 2024 · 14 comments

Comments

@tfeldmann
Copy link

tfeldmann commented Apr 23, 2024

As the title says.

@chrisclark
Copy link
Collaborator

Thanks - what's the problematic vector re: the secret key? Totally open to changing it - just trying to understand.

@chrisclark
Copy link
Collaborator

Regarding opt-in vs. opt out -- reasonable people can disagree, but I think collecting anonymized telemetry is a normal practice, and it's well documented and straightforward to opt-out. With that said I definitely don't want to collect anything anyone might view as sensitive. The telemetry will help determine what functionality is in use, what can be pruned, added, etc in future releases.

I'm definitely sensitive to the latter part of this issue if there's any chance of identification leakage, etc, I will address promptly. Please let me know.

@tfeldmann
Copy link
Author

tfeldmann commented Apr 24, 2024

Some thoughts on this:

Something about enabling telemetry by default in a point release feels off to me.
We are using this library on a small internal web app. Dependabot might update this library and then our security guy would start investigating outgoing connections if I'm not careful. Maybe it's just me, but I don't think libraries should phone home.

Regarding the secret key: I'm not a security professional. On first sight it seems like the key might be exposed in tracebacks, in case .encode raises an exception. Also the migration timestamp is very much brute-forceable. I think SECRET_KEY should not be touched by any third party library at all.

@chrisclark
Copy link
Collaborator

chrisclark commented Apr 24, 2024

First - thanks for the civil reply and well-articulated concerns.

When I was creating this, I was looking at how DBT and Streamlit (two very large open source projects) handle telemetry:

And went for a similar approach. The goal is simply to figure out e.g. do I need to keep supporting the lightweight charting functionality? Does anyone use it? Should it be built-up or deleted?

Understood on the feedback regarding the point release. And like I said - I think reasonable people can disagree here.

Regarding the SECRET_KEY - ok, I'm convinced. While I don't see a clear problem, I think there is enough risk that I will address the concerns and push a new release. It's not obvious to me what to use instead (e.g. how to avoid referencing it at all), but I will address the possibility of brute-forcing, and the stacktrace issue. Open to other ideas.

@chrisclark
Copy link
Collaborator

Here's an improved identifier function that I'll push out shortly in a 4.1.1 release.

from django.contrib.auth.hashers import make_password

def _instance_identifier():

    try:
        migration = MigrationRecorder.Migration.objects.all().order_by("applied").first()
        ts = migration.applied.timestamp()
        ts_bytes = str(ts).encode("utf-8")
        s_bytes = settings.SECRET_KEY.encode("utf-8")
        return make_password(ts_bytes + s_bytes)
    except Exception:
        return "unknown"

This yields a few improvements:

  1. Uses Django's built-in hashing which meets OWASP standards and is by default based on PBKDF2 (making brute-forcing a nonstarter). Also future-proofs it as Django keeps up with those standards in each release.
  2. Swallows the exception. No risk of exposing the key in a stack trace.

I would like to get rid of touching the SECRET_KEY altogether as your suggest - but don't have an immediate solution. I'll push this out in the mean time as it is a meaningful improvement, and keep thinking about a better way to handle it.

Feedback welcome.

@tfeldmann
Copy link
Author

tfeldmann commented Apr 24, 2024

What about saving a random TRACKING_UUID in a well known location (FS or DB)?

@tfeldmann
Copy link
Author

Appreciate the fast corrections. I'm still having privacy and security concerns because as a user I cannot be sure you're not tracking the origin of the request to check whether I'm behind on django security updates. Also I cannot verify the security of your tracking setup and hosting.

Please have a look at Russ Cox' thoughts about transparent telemetry

@chrisclark
Copy link
Collaborator

Yep, ok. I'll figure out a way to do that. I think you are right that the app should not touch SECRET_KEY. I hate to add a DB table to store a single value, but perhaps I can overload something extant.

There are a lot of big ideas in that post, many of which are simply not practical for this project to implement (and frankly, I don't see adopted elsewhere). Nonetheless the point stands and I'd say:

  1. Absolutely feel free to disable it! If you think the notice about the telemetry could be more prominent, happy to take that feedback as well.
  2. I'll consider publishing the tracker. Obviously (without more effort) it's not provable that the code I'll publish is identical to what's running in the collector endpoint, I think it's nonetheless a big step up compared to "doing nothing". And there isn't anything secret in there; just dumps the stats to a dyanmodb table.
  3. I like the idea of throwing away a random % of the stats -- and even outside of privacy concerns, this is a litttleee more popular than I realized and the metrics are a littttttleeee chattier than I realized and I'm going to need to turn the volume down regardless!

I'll make every effort to get a point release out this week that avoids SECRET_KEY entirely.

@tfeldmann
Copy link
Author

Thanks, appreciate it. I totally understand where you are coming from. Thanks for developing this!

@tfeldmann
Copy link
Author

Also future-proofs it as Django keeps up with those standards in each release.

Won’t this also change the tracking ID? This generates meta data of how fast updates are applied for each IP.

@chrisclark
Copy link
Collaborator

Yeah it turns out that won't work at all for a whole variety of reasons, hah! So I'm going to add a table, I think, to hold settings and store a UUID in there. With some additional functionality I have planned, it'll come in handy anyway.

@chrisclark
Copy link
Collaborator

Check this out:

https://github.com/chrisclark/django-sql-explorer/blob/16aaa994383ab23833acb2c7c5272f9de0851f34/explorer/tracker.py#L22

Different approach, no SECRET_KEY usage. I also added this:
https://github.com/chrisclark/django-sql-explorer/blob/16aaa994383ab23833acb2c7c5272f9de0851f34/explorer/tracker.py#L87

Which rate-limits stat collection per instance, and further anonymizes the tracking. I'll get this done in a 4.1.1 release shortly.

@tfeldmann
Copy link
Author

tfeldmann commented Apr 25, 2024

Check this out:

https://github.com/chrisclark/django-sql-explorer/blob/16aaa994383ab23833acb2c7c5272f9de0851f34/explorer/tracker.py#L22

Looks hacky and the randomness mostly comes from the migration timestamp?

The goal is simply to figure out e.g. do I need to keep supporting the lightweight charting functionality? Does anyone use it? Should it be built-up or deleted?

I'm still opposed to any tracking at all and in case of the charting functionality the tracking data wouldn't even tell the truth. I'd really like to use this feature but gave up because the documentation lacks details about the expected format:

The tabs show the respective charts if the query result table adheres to a format which the chart widget can read. Otherwise a message explaining the required format together with an example query is displayed.

I was too lazy to find out. That doesn't mean I wouldn't use this.

@chrisclark chrisclark mentioned this issue Apr 25, 2024
Merged
@chrisclark
Copy link
Collaborator

Yeah I also don't understand the chart stuff -- it was a contribution that I probably should have looked harder at (hah!).

Anyway -- all of this is now addressed in #611. It uses a persistent value (UUID persistent to the DB), adds a bunch of additional anonymization, and generally is a better approach. Even though we don't see 100% eye-to-eye on all of it, I really appreciate the push to improve, and the project is better as a result. Thank you!

I'm going to get a beta out today, and hope to release it as 4.2 over the weekend. It has a migration (and a few other fixes/improvements) hence the minor version bump (vs. a patch).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants