Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log session ID instead of client IP during captcha flow #17956

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

louisgregg
Copy link

@louisgregg louisgregg commented Nov 23, 2024

Logging the IP address of a client during the captcha registration flow is a privacy issue. Client IPs are stored in the database along with timestamp and UserAgent, so it seems redundant to log them here, where it would may be easier for an attacker to extract them, depending on how the Postgres DB is protected.

Therefore, I have modified the code to log the user session instead, which isn't PII at least. I'm not entirely sure how sensitive these session IDs are in the context of this application, or in the captcha flow. Does logging the session ID pose a security risk? Please let me know - I am happy to modify the PR and tests to log neither session ID nor IP address instead.
Thank you

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct
    (run the linters)

@louisgregg louisgregg requested a review from a team as a code owner November 23, 2024 15:12
@CLAassistant
Copy link

CLAassistant commented Nov 23, 2024

CLA assistant check
All committers have signed the CLA.

@louisgregg
Copy link
Author

How can I effectively test against the trial-olddeps setup in your pipeline? This build step is failing
https://github.com/element-hq/synapse/actions/runs/11988497284/job/33428556861
I believe this is because I am developing against Twisted 24.7.0 and the trial-olddeps step uses Twisted 18.9.0. I am going to try to replicate the build steps on my machine by running the prepare_old_deps.sh script. Otherwise, do you have any developer documentation on how to get this environment set up correctly?
Thanks!


logger.info(
"Submitting recaptcha response %s with remoteip %s", user_response, clientip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can certainly see that logging the client's IP in this line is redundant, because that should already be getting logged for the request's log line that Synapse logs for every request.

I could get behind just removing the client IP from this log line and even downgrading it to debug instead of info, since it seems a bit needless to just log the input from the client.
I don't personally see much benefit in logging the session for this flow on its own, so we could leave that off too.

However, I feel like I should mention the Synapse logs still contain more PII — User IDs, IP addresses on the request lines, etc — that I don't feel this PR really makes much difference on that particular front?

Copy link
Author

@louisgregg louisgregg Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could get behind just removing the client IP from this log line and even downgrading it to debug instead of info, since it seems a bit needless to just log the input from the client.
I don't personally see much benefit in logging the session for this flow on its own, so we could leave that off too.

OK sure. I will modify my PR to avoid logging the session ID and change info to debug.

However, I feel like I should mention the Synapse logs still contain more PII — User IDs, IP addresses on the request lines, etc — that I don't feel this PR really makes much difference on that particular front?

Point taken about PII logging in other locations. To focus on client IPs specifically, I searched the codebase with the following regex and couldn't find locations where IPs are logged:
logger\.\w*\(\n*.*(?:ip|IP|Ip)
Are you aware of specific locations where the client IP ends up in the logs, either explicitly or as part of a larger data structure?

My overall goal in this PR is to minimize or eliminate logging of client IPs across the codebase.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the changes you suggested. Can you approve another run of the test suite please? Thanks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failing test, "Register with a recaptcha" in sytest, should now pass. Sytest, Complement and trial test suites are passing on my machine. Please approve another run of the test pipeline when you get a chance. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants