Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POST request handling and indexing improvements #636

Merged
merged 4 commits into from
Apr 28, 2021

Conversation

ikreymer
Copy link
Member

Description

This PR updates the indexing of POST and other non-GET requests to include the request body in the URL as a query.
This allows for improved replay fidelity where request->response matching can not rely on the URL alone, and must also compare (parts) of the request body. The changes include:

  • POST application/json request are parsed and primitive json values are added to query string. Duplicate json values are also added with a special suffix, eg. {"a", "b", "c": {"a": "b"}} will be converted to query string a=b&a.2_=b (the suffix is hopefully one that is not commonly used).
  • POST text/plain data is attempted to be parsed as json, then as binary
  • PUT requests treated same as POST
  • For all non-GET requests, __wb_method=<method> is also added, so post requests will have a __wb_method=POST in the query

This indexing approach is compatible with cdxj-indexer and the replay used in wabac.js/ReplayWeb.page

This is a breaking change for existing collections that already have POST requests. These will need to be re-indexed to get accurate POST request replay.

Motivation and Context

Many sites require accurate replay of POST and PUT requests to be able to successfully match requests to responses and accurate replay pages.

Screenshots (if appropriate):

Types of changes

  • Replay fix (fixes a replay specific issue)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added or updated tests to cover my changes.
  • All new and existing tests passed.

- parse json primitives for post query
- standardize post append indexing
- include '__wb_method' in urlkey
- add 'requestBody' and 'method' to cdxj
fuzzy rules: add rule for yt
update to latest wombat
- support unique dupe params for json-to-query conversion
- update tests for test_inputreq,
- update post-test.cdxj
tox: run full test suite!
ci: disable appveyor
…POST data to avoid hung request in live mode. instead, truncate final query string
@ikreymer ikreymer changed the title Post append improvements POST request handling and indexing improvements Apr 28, 2021
@ikreymer ikreymer merged commit 626da99 into master Apr 28, 2021
@ikreymer ikreymer deleted the post-append-improvements branch April 28, 2021 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant