Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Content-Type multipart/form-data without boundary #599

Merged
merged 2 commits into from
Dec 17, 2020

Conversation

ldko
Copy link
Contributor

@ldko ldko commented Dec 4, 2020

Description

Catches ValueError on a multipart/form-data mime when "boundary" is missing, and handles it in the same way as an earlier condition for 'application/x-www-form-urlencoded' erroring on UnicodeDecodeError. When I try to run the tests with python setup.py test, test_proxy.py tests fail, but this seems unrelated and is the same when I run them on the master branch.

Motivation and Context

Fix for #598 to handle a ValueError on a multipart/form-data mime when "boundary" is missing.

Screenshots (if appropriate):

Types of changes

  • Replay fix (fixes a replay specific issue)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added or updated tests to cover my changes.
  • All new and existing tests passed.

@ikreymer
Copy link
Member

ikreymer commented Dec 5, 2020

Out of curiosity, what version of python are you running with?
I have not been able to repro this error with the supplied WARC in Python 3.6+ in either Ubuntu or OSX, wonder if it only happens in an older python? But, good to have just in case anyway.

@ldko
Copy link
Contributor Author

ldko commented Dec 5, 2020

Hrrrm...I have gotten the error on Ubuntu 18.04.5 running in a Python 3.7.6 env and a 3.6.9 env when trying to index this warc and another (autoindex or using wb-manager index). Running python setup.py install on that machine in Python 3.4 or 3.5, the install fails with trying to install the markupsafe depdendency, but I haven't looked into that.

I have also tried this on another machine running Ubuntu 20.04 in Python 3.8.5 and see the same issue:

$ wb-manager init test
$ wb-manager add test UNT-20201119192756453-00043-oirxa9z4.warc.gz 
2020-12-05 10:49:21,350: [INFO]: Copied /home/gitty/pywb/UNT-20201119192756453-00043-oirxa9z4.warc.gz to /home/gitty/pywb/collections/test/archive
Error: Invalid boundary in multipart form: b''

@ikreymer
Copy link
Member

ikreymer commented Dec 5, 2020

Ah, thanks for the steps. Heh, my mistake, i was testing it with just cdx-indexer which of course did not have the write settings by default!

@ikreymer
Copy link
Member

ikreymer commented Dec 5, 2020

Could you add a quick test record with an invalid boundary?
Probably simplest would be to add to: https://github.com/webrecorder/pywb/blob/master/pywb/indexer/test/test_indexing.py similar to the other examples there.

@ldko
Copy link
Contributor Author

ldko commented Dec 17, 2020

@ikreymer I have added a few tests: one for a regular well-formed multipart/form-data type that passes in master and this branch and a couple tests where the boundary part is missing which fail in master and pass on this branch. Let me know if you want to see any changes. Thanks!

@codecov
Copy link

codecov bot commented Dec 17, 2020

Codecov Report

Merging #599 (a321486) into master (7b51101) will decrease coverage by 0.51%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #599      +/-   ##
==========================================
- Coverage   87.69%   87.18%   -0.52%     
==========================================
  Files          64       64              
  Lines        8096     8099       +3     
  Branches     1445     1445              
==========================================
- Hits         7100     7061      -39     
- Misses        640      669      +29     
- Partials      356      369      +13     
Impacted Files Coverage Δ
pywb/warcserver/inputrequest.py 90.30% <100.00%> (+4.81%) ⬆️
pywb/utils/geventserver.py 68.42% <0.00%> (-7.90%) ⬇️
pywb/rewrite/rewriteinputreq.py 71.87% <0.00%> (-6.25%) ⬇️
pywb/apps/frontendapp.py 82.85% <0.00%> (-5.15%) ⬇️
pywb/apps/rewriterapp.py 83.22% <0.00%> (-4.68%) ⬇️
pywb/utils/loaders.py 91.24% <0.00%> (-1.46%) ⬇️
pywb/recorder/multifilewarcwriter.py 76.83% <0.00%> (-1.13%) ⬇️
pywb/rewrite/url_rewriter.py 87.00% <0.00%> (-1.00%) ⬇️
pywb/warcserver/index/aggregator.py 91.47% <0.00%> (+1.55%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b51101...a321486. Read the comment docs.

@ikreymer ikreymer merged commit b66608c into webrecorder:master Dec 17, 2020
@ikreymer
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants