-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse json validation #16923
Parse json validation #16923
Conversation
For parsing JSON parameter from the request query string. (Used for Filtering)
Validating the JSON query parameter should fix element-hq#16922
I think the original long term plan here was to replace the JSON parsing with Pydantic (see #13147), but this doesn't seem to preclude that at all. |
Yes, I've also seen that thanks. Exactly. Coming from FastAPI, I'm a big fan of Pydantic and would advocate for its use in model validation. This, though, is meant for a step before—checking if the input is actually JSON, particularly in the case of query parameters. From what I've seen, this is mostly used for the filtering functionality. I even already thought about it as a point for consideration on "where to go from here" for the next steps. I found a filter validation function (jsonschema) def check_valid_filter(self, user_filter_json: JsonDict) -> None: but not utilized on the endpoints—to further ensure that in the case of valid JSON, it is a valid filter. This could be uniformly done with Pydantic. |
Makes sense! I don't think there are other spots that JSON is passed as a query parameter, it is a rather odd choice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great, thank you for putting this together! I have a smattering of wording changes, and a request for tests.
changelog.d/16923.bugfix
Outdated
Adds parse_json servlet function for standardized JSON parsing from query parameters, ensuring enhanced data validation and error handling. | ||
Introduces INVALID_PARAM error response for invalid JSON objects, improving parameter validation feedback. | ||
Adds validation check to prevent 500 internal server error on invalid Json Filter request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally we try to keep changelog entries short, and acknowledge that the audience is system administrators. Such an audience won't care to know the details of the implementation, but rather than user-facing impact. My suggestion would be:
Adds parse_json servlet function for standardized JSON parsing from query parameters, ensuring enhanced data validation and error handling. | |
Introduces INVALID_PARAM error response for invalid JSON objects, improving parameter validation feedback. | |
Adds validation check to prevent 500 internal server error on invalid Json Filter request. | |
Return `400 M_INVALID_PARAM` upon receiving invalid JSON in query parameters across various client and admin endpoints, rather than an internal server error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the Insights! - I'll keep that in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you happy to accept my suggestion here? I prefer the suggested version of the changelog for the reasons in my initial comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise!
tests/rest/admin/test_room.py
Outdated
# Does not test the validity of the filter, only the json validation. | ||
|
||
# Check Get with valid json filter parameter, expect 200. | ||
_valid_filter_str = '{"types": ["m.room.message"]}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally you should only precede a variable name with an underscore in Python if you'd like to label the output of a function, but not actually use it. Here we are using _valid_filter_str
, so it should not have a leading underscore.
Could you remove it, along with _invalid_filter_str
and from other tests please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh took me a bit! But now I think I know what you mean.
You are referring to the use of a single underscore, _
, as a throwaway variable, commonly used for temporary or insignificant values, as in:
for _ in range(32):
print('Hello, World.')
this woulde be correct. But in this case, following PEP 8, _single_leading_underscore
signals internal use, here serving as a minor internal string helper. It marks variables that are temporary or specific to this test's context, distinguishing between main test logic and setup details. However, I'm also more than happy to adjust this for you too! ;) Always wanted to cite a PEP tho. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for citing the PEP! It's interesting to read where this convention came from.
In Synapse, we certainly do use leading underscores for internal function/method names and private class variables (self._internal_var
). This is to signal to code external to classes that they shouldn't try to access this field (it is internal).
However, we don't use this convention for local variable names - so at least for this codebase, I would remove the leading underscore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huge apologies for taking so long to get back to this - I've only now been getting back to reviewing Synapse PRs. I have only a couple small things below, but otherwise this looks good to go.
Thanks for your patience.
tests/rest/admin/test_room.py
Outdated
# Does not test the validity of the filter, only the json validation. | ||
|
||
# Check Get with valid json filter parameter, expect 200. | ||
_valid_filter_str = '{"types": ["m.room.message"]}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for citing the PEP! It's interesting to read where this convention came from.
In Synapse, we certainly do use leading underscores for internal function/method names and private class variables (self._internal_var
). This is to signal to code external to classes that they shouldn't try to access this field (it is internal).
However, we don't use this convention for local variable names - so at least for this codebase, I would remove the leading underscore.
changelog.d/16923.bugfix
Outdated
Adds parse_json servlet function for standardized JSON parsing from query parameters, ensuring enhanced data validation and error handling. | ||
Introduces INVALID_PARAM error response for invalid JSON objects, improving parameter validation feedback. | ||
Adds validation check to prevent 500 internal server error on invalid Json Filter request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you happy to accept my suggestion here? I prefer the suggested version of the changelog for the reasons in my initial comment.
@anoadragon453 Ah no worries! Thanks for coming back to it. I re-ran the linters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you very much!
✅ 🎉 - @anoadragon453 Please don't forget to merge before we lose sync to upstream. :) |
Oops, apparently forgot to press the button. Thanks for the poke! |
Co-authored-by: Andrew Morgan <[email protected]>
No significant changes since 1.106.0rc1. - Send an email if the address is already bound to an user account. ([\#16819](element-hq/synapse#16819)) - Implement the rendezvous mechanism described by [MSC4108](matrix-org/matrix-spec-proposals#4108). ([\#17056](element-hq/synapse#17056)) - Support delegating the rendezvous mechanism described [MSC4108](matrix-org/matrix-spec-proposals#4108) to an external implementation. ([\#17086](element-hq/synapse#17086)) - Add validation to ensure that the `limit` parameter on `/publicRooms` is non-negative. ([\#16920](element-hq/synapse#16920)) - Return `400 M_NOT_JSON` upon receiving invalid JSON in query parameters across various client and admin endpoints, rather than an internal server error. ([\#16923](element-hq/synapse#16923)) - Make the CSAPI endpoint `/keys/device_signing/upload` idempotent. ([\#16943](element-hq/synapse#16943)) - Redact membership events if the user requested erasure upon deactivating. ([\#17076](element-hq/synapse#17076)) - Add a prompt in the contributing guide to manually configure icu4c. ([\#17069](element-hq/synapse#17069)) - Clarify what part of message retention is still experimental. ([\#17099](element-hq/synapse#17099)) - Use new receipts column to optimise receipt and push action SQL queries. Contributed by Nick @ Beeper (@Fizzadar). ([\#17032](element-hq/synapse#17032), [\#17096](element-hq/synapse#17096)) - Fix mypy with latest Twisted release. ([\#17036](element-hq/synapse#17036)) - Bump minimum supported Rust version to 1.66.0. ([\#17079](element-hq/synapse#17079)) - Add helpers to transform Twisted requests to Rust http Requests/Responses. ([\#17081](element-hq/synapse#17081)) - Fix type annotation for `visited_chains` after `mypy` upgrade. ([\#17125](element-hq/synapse#17125)) * Bump anyhow from 1.0.81 to 1.0.82. ([\#17095](element-hq/synapse#17095)) * Bump peaceiris/actions-gh-pages from 3.9.3 to 4.0.0. ([\#17087](element-hq/synapse#17087)) * Bump peaceiris/actions-mdbook from 1.2.0 to 2.0.0. ([\#17089](element-hq/synapse#17089)) * Bump pyasn1-modules from 0.3.0 to 0.4.0. ([\#17093](element-hq/synapse#17093)) * Bump pygithub from 2.2.0 to 2.3.0. ([\#17092](element-hq/synapse#17092)) * Bump ruff from 0.3.5 to 0.3.7. ([\#17094](element-hq/synapse#17094)) * Bump sigstore/cosign-installer from 3.4.0 to 3.5.0. ([\#17088](element-hq/synapse#17088)) * Bump twine from 4.0.2 to 5.0.0. ([\#17091](element-hq/synapse#17091)) * Bump types-pillow from 10.2.0.20240406 to 10.2.0.20240415. ([\#17090](element-hq/synapse#17090))
This pull request introduces a new servlet function named
parse_json
, aimed at simplifying and standardizing the process of parsing JSON objects from query parameters. This function complements our existing utility functions such asparse_integer
andparse_string
, offering a unified approach to parameter parsing across our application.The necessity for
parse_json
arises from the requirement to handle JSON formatted data in query parameters (infilter
parameters, on room endpoints), which, lacked a dedicated parsing mechanism. By incorporatingparse_json
, we streamline the parsing process and also enforce stricter validation of input data, ensuring that only valid JSON objects are processed, preventing internal server errors. This is achieved through the inclusion of anINVALID_PARAM
error response, which is triggered when the parsed data fails to meet the JSON object criteria. The error message explicitly states that the parameter "must be a valid JSON object," thereby providing clear feedback for troubleshooting.Key Features:
Implementation Snippet:
Below is an example snippet demonstrating the usage of
parse_json
to parse a JSON object from thefilter
query parameter:Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.(run the linters)