Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse json validation #16923

Merged
merged 16 commits into from
Apr 18, 2024

Conversation

TrevisGordan
Copy link
Contributor

This pull request introduces a new servlet function named parse_json, aimed at simplifying and standardizing the process of parsing JSON objects from query parameters. This function complements our existing utility functions such as parse_integer and parse_string, offering a unified approach to parameter parsing across our application.

The necessity for parse_json arises from the requirement to handle JSON formatted data in query parameters (in filterparameters, on room endpoints), which, lacked a dedicated parsing mechanism. By incorporating parse_json, we streamline the parsing process and also enforce stricter validation of input data, ensuring that only valid JSON objects are processed, preventing internal server errors. This is achieved through the inclusion of an INVALID_PARAM error response, which is triggered when the parsed data fails to meet the JSON object criteria. The error message explicitly states that the parameter "must be a valid JSON object," thereby providing clear feedback for troubleshooting.

Key Features:

Implementation Snippet:

Below is an example snippet demonstrating the usage of parse_json to parse a JSON object from the filter query parameter:

filter_json = parse_json(request, "filter", encoding="utf-8")

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct
    (run the linters)

@TrevisGordan TrevisGordan requested a review from a team as a code owner February 15, 2024 09:06
@clokep
Copy link
Contributor

clokep commented Feb 16, 2024

I think the original long term plan here was to replace the JSON parsing with Pydantic (see #13147), but this doesn't seem to preclude that at all.

@TrevisGordan
Copy link
Contributor Author

Yes, I've also seen that thanks. Exactly. Coming from FastAPI, I'm a big fan of Pydantic and would advocate for its use in model validation. This, though, is meant for a step before—checking if the input is actually JSON, particularly in the case of query parameters. From what I've seen, this is mostly used for the filtering functionality. I even already thought about it as a point for consideration on "where to go from here" for the next steps. I found a filter validation function (jsonschema) def check_valid_filter(self, user_filter_json: JsonDict) -> None: but not utilized on the endpoints—to further ensure that in the case of valid JSON, it is a valid filter. This could be uniformly done with Pydantic.

@clokep
Copy link
Contributor

clokep commented Feb 19, 2024

Makes sense! I don't think there are other spots that JSON is passed as a query parameter, it is a rather odd choice.

Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great, thank you for putting this together! I have a smattering of wording changes, and a request for tests.

Comment on lines 1 to 3
Adds parse_json servlet function for standardized JSON parsing from query parameters, ensuring enhanced data validation and error handling.
Introduces INVALID_PARAM error response for invalid JSON objects, improving parameter validation feedback.
Adds validation check to prevent 500 internal server error on invalid Json Filter request.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we try to keep changelog entries short, and acknowledge that the audience is system administrators. Such an audience won't care to know the details of the implementation, but rather than user-facing impact. My suggestion would be:

Suggested change
Adds parse_json servlet function for standardized JSON parsing from query parameters, ensuring enhanced data validation and error handling.
Introduces INVALID_PARAM error response for invalid JSON objects, improving parameter validation feedback.
Adds validation check to prevent 500 internal server error on invalid Json Filter request.
Return `400 M_INVALID_PARAM` upon receiving invalid JSON in query parameters across various client and admin endpoints, rather than an internal server error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the Insights! - I'll keep that in mind. 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you happy to accept my suggestion here? I prefer the suggested version of the changelog for the reasons in my initial comment.

synapse/http/servlet.py Outdated Show resolved Hide resolved
synapse/http/servlet.py Outdated Show resolved Hide resolved
synapse/http/servlet.py Outdated Show resolved Hide resolved
synapse/http/servlet.py Outdated Show resolved Hide resolved
synapse/http/servlet.py Outdated Show resolved Hide resolved
synapse/http/servlet.py Show resolved Hide resolved
synapse/http/servlet.py Show resolved Hide resolved
Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise!

synapse/http/servlet.py Show resolved Hide resolved
# Does not test the validity of the filter, only the json validation.

# Check Get with valid json filter parameter, expect 200.
_valid_filter_str = '{"types": ["m.room.message"]}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally you should only precede a variable name with an underscore in Python if you'd like to label the output of a function, but not actually use it. Here we are using _valid_filter_str, so it should not have a leading underscore.

Could you remove it, along with _invalid_filter_str and from other tests please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh took me a bit! But now I think I know what you mean.
You are referring to the use of a single underscore, _, as a throwaway variable, commonly used for temporary or insignificant values, as in:

for _ in range(32):
    print('Hello, World.')

this woulde be correct. But in this case, following PEP 8, _single_leading_underscore signals internal use, here serving as a minor internal string helper. It marks variables that are temporary or specific to this test's context, distinguishing between main test logic and setup details. However, I'm also more than happy to adjust this for you too! ;) Always wanted to cite a PEP tho. 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for citing the PEP! It's interesting to read where this convention came from.

In Synapse, we certainly do use leading underscores for internal function/method names and private class variables (self._internal_var). This is to signal to code external to classes that they shouldn't try to access this field (it is internal).

However, we don't use this convention for local variable names - so at least for this codebase, I would remove the leading underscore.

Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge apologies for taking so long to get back to this - I've only now been getting back to reviewing Synapse PRs. I have only a couple small things below, but otherwise this looks good to go.

Thanks for your patience.

synapse/http/servlet.py Outdated Show resolved Hide resolved
# Does not test the validity of the filter, only the json validation.

# Check Get with valid json filter parameter, expect 200.
_valid_filter_str = '{"types": ["m.room.message"]}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for citing the PEP! It's interesting to read where this convention came from.

In Synapse, we certainly do use leading underscores for internal function/method names and private class variables (self._internal_var). This is to signal to code external to classes that they shouldn't try to access this field (it is internal).

However, we don't use this convention for local variable names - so at least for this codebase, I would remove the leading underscore.

Comment on lines 1 to 3
Adds parse_json servlet function for standardized JSON parsing from query parameters, ensuring enhanced data validation and error handling.
Introduces INVALID_PARAM error response for invalid JSON objects, improving parameter validation feedback.
Adds validation check to prevent 500 internal server error on invalid Json Filter request.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you happy to accept my suggestion here? I prefer the suggested version of the changelog for the reasons in my initial comment.

@TrevisGordan
Copy link
Contributor Author

@anoadragon453 Ah no worries! Thanks for coming back to it.
I made the requested changes. :)
I updated the new test cases accordingly for "NOT_JSON"

I re-ran the linters.
I re-ran the tests (passed)
I also ran systests (passed)

Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you very much!

@TrevisGordan
Copy link
Contributor Author

✅ 🎉 - @anoadragon453 Please don't forget to merge before we lose sync to upstream. :)

@anoadragon453 anoadragon453 merged commit 1d47532 into element-hq:develop Apr 18, 2024
38 checks passed
@anoadragon453
Copy link
Member

Oops, apparently forgot to press the button. Thanks for the poke!

hughns pushed a commit to hughns/synapse that referenced this pull request Apr 25, 2024
yingziwu added a commit to yingziwu/synapse that referenced this pull request May 3, 2024
No significant changes since 1.106.0rc1.

- Send an email if the address is already bound to an user account. ([\#16819](element-hq/synapse#16819))
- Implement the rendezvous mechanism described by [MSC4108](matrix-org/matrix-spec-proposals#4108). ([\#17056](element-hq/synapse#17056))
- Support delegating the rendezvous mechanism described [MSC4108](matrix-org/matrix-spec-proposals#4108) to an external implementation. ([\#17086](element-hq/synapse#17086))

- Add validation to ensure that the `limit` parameter on `/publicRooms` is non-negative. ([\#16920](element-hq/synapse#16920))
- Return `400 M_NOT_JSON` upon receiving invalid JSON in query parameters across various client and admin endpoints, rather than an internal server error. ([\#16923](element-hq/synapse#16923))
- Make the CSAPI endpoint `/keys/device_signing/upload` idempotent. ([\#16943](element-hq/synapse#16943))
- Redact membership events if the user requested erasure upon deactivating. ([\#17076](element-hq/synapse#17076))

- Add a prompt in the contributing guide to manually configure icu4c. ([\#17069](element-hq/synapse#17069))
- Clarify what part of message retention is still experimental. ([\#17099](element-hq/synapse#17099))

- Use new receipts column to optimise receipt and push action SQL queries. Contributed by Nick @ Beeper (@Fizzadar). ([\#17032](element-hq/synapse#17032), [\#17096](element-hq/synapse#17096))
- Fix mypy with latest Twisted release. ([\#17036](element-hq/synapse#17036))
- Bump minimum supported Rust version to 1.66.0. ([\#17079](element-hq/synapse#17079))
- Add helpers to transform Twisted requests to Rust http Requests/Responses. ([\#17081](element-hq/synapse#17081))
- Fix type annotation for `visited_chains` after `mypy` upgrade. ([\#17125](element-hq/synapse#17125))

* Bump anyhow from 1.0.81 to 1.0.82. ([\#17095](element-hq/synapse#17095))
* Bump peaceiris/actions-gh-pages from 3.9.3 to 4.0.0. ([\#17087](element-hq/synapse#17087))
* Bump peaceiris/actions-mdbook from 1.2.0 to 2.0.0. ([\#17089](element-hq/synapse#17089))
* Bump pyasn1-modules from 0.3.0 to 0.4.0. ([\#17093](element-hq/synapse#17093))
* Bump pygithub from 2.2.0 to 2.3.0. ([\#17092](element-hq/synapse#17092))
* Bump ruff from 0.3.5 to 0.3.7. ([\#17094](element-hq/synapse#17094))
* Bump sigstore/cosign-installer from 3.4.0 to 3.5.0. ([\#17088](element-hq/synapse#17088))
* Bump twine from 4.0.2 to 5.0.0. ([\#17091](element-hq/synapse#17091))
* Bump types-pillow from 10.2.0.20240406 to 10.2.0.20240415. ([\#17090](element-hq/synapse#17090))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JSON Filter Parameter Validation Error on Room Event Endpoint
3 participants