Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforced data size in channels 2.1.7 prevent large file uploads #1240

Closed
agateblue opened this issue Feb 6, 2019 · 29 comments
Closed

Enforced data size in channels 2.1.7 prevent large file uploads #1240

agateblue opened this issue Feb 6, 2019 · 29 comments

Comments

@agateblue
Copy link

Channels was released a few days ago (kudos and thank you for the maintainance effort!) and we noticed a breaking change in our application.

Large file uploads that were working before are now failing, and understand it's related to this specific commit a1ecd5e by @Zarathustra2

I'm not discussing the rationale behind that commit, because enforcing the max size looks indeed better than before from a security/stability perspective. However, it is a breaking change for our app (and possibly others), because file uploads that were working before are now failing.

Since we were not setting DATA_UPLOAD_MAX_MEMORY_SIZE in our settings file, the default value (2.5Mio) applies, which is considerably lower than our average uploaded file size.

In theory, we could stick on channels==2.1.7 and set DATA_UPLOAD_MAX_MEMORY_SIZE to match our requirements, but AFAIU, it would apply to the whole payload, and not only file size. And we'd like to allow large files, but not excessively large POST data.

Also, as stated in Django's documentation about DATA_UPLOAD_MAX_MEMORY_SIZE:

The check is done when accessing request.body or request.POST and is calculated against the total request size excluding any file upload data.`

Based on that, I do think there is a bug in the way the check was implemented in channels, because it does not exclude file data.

I can try working on a PR if you agree with my suggested changes:

  • Document the breaking change in the changelog, so people know what to do when upgrading
  • Remove uploaded files when computed payload size

Let me know your thoughts and I'll start working on a fix :)

@carltongibson
Copy link
Member

Hi @EliotBerriot. OK, yes, from the description is sounds as if something is amiss. It sounds as if the change (plus your requirements) has unveiled a latent issue.

First step: can you put together a reproduce, in a test case or sample project so we can see exactly what's going on?

@Zarathustra2
Copy link
Contributor

Hi,

@carltongibson @EliotBerriot first of all, sorry that my commit has brought you trouble.

Unfortunately, my django knowledge is not good enough to fix it by myself but I would be happy to help you since I created the bug.

I think this is also related to #1171. So in a possible solution we could also move the check of DATA_UPLOAD_MAX_MEMORY_SIZE to the read method and provide lazy evalution as it is originaly implemented in Django.

@carltongibson
Copy link
Member

Yep. That makes sense. @Zarathustra2 this isn’t you fault. 🙂 The more developed handling just isn’t there yet. Very happy to see input. (That test case would still be where I’d start...)

@agateblue
Copy link
Author

@carltongibson @EliotBerriot first of all, sorry that my commit has brought you trouble.

Please don't be, things break and it's nobody's fault :)

The more developed handling just isn’t there yet. Very happy to see input. (That test case would still be where I’d start...)

I'm not sure if I'll find my way around the fix, but I feel confident enough to write a test case, and I'll be working on that this afternoon!

@carltongibson
Copy link
Member

Awesome. Thanks @EliotBerriot! (It’s much easier from there, since there’s something to play with.)

agateblue pushed a commit to agateblue/channels that referenced this issue Feb 7, 2019
agateblue pushed a commit to agateblue/channels that referenced this issue Feb 7, 2019
agateblue pushed a commit to agateblue/channels that referenced this issue Feb 7, 2019
@jpic
Copy link
Contributor

jpic commented Feb 7, 2019

@EliotBerriot did you also try the following suggestion, otherwise can you provide any refutation perhaps ?

we could also move the check of DATA_UPLOAD_MAX_MEMORY_SIZE to the read method and provide lazy evalution as it is originaly implemented in Django.

@agateblue
Copy link
Author

@jpic I didn't, the concrete steps to do that were not really clear for me. Also, I wanted to avoid touching to code I did not understand (this is my first contact with the channels codebase).

Since the read() method is currently a member of django's own http.HttpRequest class, it would also mean either reimplementing it or overriding it and I'm not sure how it would fix the problem. The current implementation fails fast, which is probably better, the real issue IMHO is that it is not compatible with what Django actually does in regard to file handling.

@jpic
Copy link
Contributor

jpic commented Feb 7, 2019

the concrete steps to do that were not really clear for me

Same here, and here's what I figured while searching, you're gonna love this, or i'm completely lost 😂

Turns out that:

  • your tests are valid,
  • they also pass without the check raise in the constructor introduced by a1ecd5e,
  • because the test does not catch the raise that comes from the constructor,
  • see for yourself, after removing the check channels have in the constructor, and removing the pytest.raises:
================================== FAILURES ===================================
_____ RequestTests.test_size_check_ignore_files_but_honor_other_post_data ______

self = <tests.test_http.RequestTests testMethod=test_size_check_ignore_files_but_honor_other_post_data>

    def test_size_check_ignore_files_but_honor_other_post_data(self):
        ....
        with override_settings(DATA_UPLOAD_MAX_MEMORY_SIZE=1):
>           request.POST

tests/test_http.py:233: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
channels/http.py:139: in _get_post
    self._load_post_and_files()
.tox/py37-dj21/lib/python3.7/site-packages/django/http/request.py:310: in _load_post_and_files
    self._post, self._files = self.parse_file_upload(self.META, data)
.tox/py37-dj21/lib/python3.7/site-packages/django/http/request.py:269: in parse_file_upload
    return parser.parse()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <django.http.multipartparser.MultiPartParser object at 0x7fd0e1a38358>

    def parse(self):
         ...
    
                    # Avoid reading more than DATA_UPLOAD_MAX_MEMORY_SIZE.
                    if settings.DATA_UPLOAD_MAX_MEMORY_SIZE is not None:
                        read_size = settings.DATA_UPLOAD_MAX_MEMORY_SIZE - num_bytes_read
    
                    # This is a post field, we can just set it in the post
                    if transfer_encoding == 'base64':
                        raw_data = field_stream.read(size=read_size)
                        num_bytes_read += len(raw_data)
                        try:
                            data = base64.b64decode(raw_data)
                        except binascii.Error:
                            data = raw_data
                    else:
                        data = field_stream.read(size=read_size)
                        num_bytes_read += len(data)
    
                    # Add two here to make the check consistent with the
                    # x-www-form-urlencoded check that includes '&='.
                    num_bytes_read += len(field_name) + 2
                    if (settings.DATA_UPLOAD_MAX_MEMORY_SIZE is not None and
                            num_bytes_read > settings.DATA_UPLOAD_MAX_MEMORY_SIZE):
>                       raise RequestDataTooBig('Request body exceeded settings.DATA_UPLOAD_MAX_MEMORY_SIZE.')
E                       django.core.exceptions.RequestDataTooBig: Request body exceeded settings.DATA_UPLOAD_MAX_MEMORY_SIZE.

@EliotBerriot, both your tests still pass after completely reverting a1ecd5e, see for yourself in https://github.com/jpic/channels/tree/testrevert

My recommendation would be to keep your security tests which we always need to have, but revert a1ecd5e at all.

@carltongibson
Copy link
Member

carltongibson commented Feb 7, 2019

Hey @EliotBerriot and @jpic. Thanks for your efforts here. Super. We need to be 100% clear here so all for the good!

This line here contrasts markedly with Django's:

channels/channels/http.py

Lines 131 to 132 in 8499a42

self._body = body
assert isinstance(self._body, bytes), "Body is not bytes"

body is bytes. So have we already got it in memory, and do we need to bail on that to avoid potential memory exhaustion? (Current status: I need to look.)

@jpic
Copy link
Contributor

jpic commented Feb 7, 2019

I might not be able to provide further relevant clues without actually reading the WSGI and ASGI specs themselves. Meanwhile, there's something that I find interesting in the django code you have pointed out.


            # Limit the maximum request data size that will be handled in-memory.
            if (settings.DATA_UPLOAD_MAX_MEMORY_SIZE is not None and
                    int(self.META.get('CONTENT_LENGTH') or 0) > settings.DATA_UPLOAD_MAX_MEMORY_SIZE):
raise RequestDataTooBig('Request body exceeded settings.DATA_UPLOAD_MAX_MEMORY_SIZE.')

Does that mean that this check is completely "declarative" ? I mean, what happens if I add a long shellcode in the User-Agent header value along with a small Content-Length header ? Will that really prevent the User-Agent header value from being read in memory at all ? If so, how ? Thanks in advance for baring with me.

@carltongibson
Copy link
Member

No problem. 🙂. The read() call is key. In WSGI land we get given a file like object. Which we can then pull into memory. (How the server in front of us handled that is not our problem.)

So yes, we have to look at what we're getting here, and work out where best to place our limits.

@carltongibson
Copy link
Member

@jpic You knew this would be fun right! 😀

@jpic
Copy link
Contributor

jpic commented Feb 7, 2019

Maybe we should open a new issue about this, because Eliot's patch itself works as intended:

  • new tests seem good to have,
  • patch in runtime code does fix a regression,

Wether or not we want to reconsider a1ecd5e & connected code, should not block Eliot's PR because contains a bugfix that looks critical ?

After all, the patch fixes a regression and at the same time adds extra tests to prove that it's still fine, and secures channels in case Django changes the contract.

@jpic
Copy link
Contributor

jpic commented Feb 7, 2019

@carltongibson you bet 😂

About a1ecd5e, I can understand that it's trying to reproduce the Django code you have pointed out, which legitimates it after all. But it's not clear to me how boundary content is excluded from the check in Django, since Content-Length should include the sum of boundary data length.

Is it because they use self._body or just self instead of self.body in the case of multipart/form-data, which completely bypasses the check in the self.body property ?

It seems the code from Django in the body property (pointed out by @carltongibson above) is not applicable in the case of multipart/form-data requests. In that case, it will actually avoid calling self.body, call self.parse_file_upload which in turn calls MultiPartParser.parse, which does the check that Eliot's tests prove still working / covering channels code itself.

So, it turns out I understand the justification for the parent change a1ecd5e as well as Eliot's change as-is now.

Thanks y'all for sharing some of your insight ;)

@carltongibson
Copy link
Member

So, just an update here.

I am looking into this. In particular, how we're loading the request body prior to instantiating the AsgiRequest:

channels/channels/http.py

Lines 208 to 214 in a660d81

# See if the message has body, and if it's the end, launch into
# handling (and a synchronous subthread)
if "body" in message:
body += message["body"]
if not message.get("more_body", False):
await self.handle(body)
return

I want to look into whether we can wrap that somehow in a file-like, which would then allow us to leverage Django's established patterns here. (I don't know at all if that'll work yet but...) It would be nice to lazily pull the request body data into memory.

I'd rather Measure twice, cut once here. I'm not convinced that there's anything critical at stake, in the sense that we need to rush a half-thought patch out: adjusting DATA_UPLOAD_MAX_MEMORY_SIZE is a reasonable approach in the meantime, and I'm presuming there's always a reverse proxy in play (nginx etc) that can (also) impose sensible limits.

@jpic
Copy link
Contributor

jpic commented Feb 12, 2019

@carltongibson security became a non issue for me when:

  • django also bypasses this kind of security for this kind of requests
  • we're always supposed to have at least nginx if not a waf in front on django/channels
  • everytime i make a site that's serious about upload it turns out i have to change the default nginx settings for that purpose

So at the end of the day the pull request has no impact on actual security, and still fixes a regression that's pretty critical when supporting file uploads.

@anx-ckreuzberger
Copy link

Hi,

I've also hit this issue with Django 1.11, Django Rest Framework 3.9, Python 3.5 and Channels 2.1.7 in production (as in: DEBUG = False and using daphne with nginx) - I believe this issue doesn't come up when you're using a different run method for python (e.g., mod passenger).

I've downgraded to 2.1.6 for now, but I would love to see the check for DATA_UPLOAD_MAX_MEMORY_SIZE implemented properly, as in: https://docs.djangoproject.com/en/2.1/ref/settings/#data-upload-max-memory-size

The check is done when accessing request.body or request.POST and is calculated against the total request size excluding any file upload data.

I have my own File Upload Handlers which handle (large) file uploads - see https://docs.djangoproject.com/en/2.1/topics/http/file-uploads/#upload-handlers .

@carltongibson
Copy link
Member

HI @anx-ckreuzberger. #1251 is in progress and will resolve this. Just in need of a small window to finish it off. I'll then roll a new version.

@carltongibson
Copy link
Member

Hi both. In the meantime have you considered running a WSGI server to handle to Django HTTP requests, leaving ASGI just for sockets etc? This would get you Django's battle-hardened handling for e.g. file uploads without leaving you blocked by here.

@jpic
Copy link
Contributor

jpic commented Mar 12, 2019

I wouldn't worry about it, it's not hard to temporarily deploy a fork with the patches we want anyway ;)

However, i'm wondering if #1251 is ready-enough to ask that @anx-ckreuzberger and @EliotBerriot try out with their project and report results ?

@carltongibson
Copy link
Member

Hey @jpic. Not quite. I need to rip-out the body property and adjust POST and FILES slightly. Once those are done, yes it should be about right. (I think it would work™ now but...)

Then I need to tidy-up and rename stuff and then we'll head to a release.

@anx-ckreuzberger
Copy link

In the meantime have you considered running a WSGI server to handle to Django HTTP requests, leaving ASGI just for sockets etc?
That is absolutely acceptable, but not my (our?) desired solution (think about a docker container or systemd service running daphne, now you need two docker containers or systemd services, one for daphne and one for gunicorn/uwsgi/...).

Downgrading to 2.1.6 is "a better fix" for me for now ;)

Long term, I also believe that ASGI is better than WSGI, but that's just my opinion.

@carltongibson
Copy link
Member

Hi folks: if you can give #1251 a run against your app and report any issues that would be super.

@agateblue
Copy link
Author

Thank you @carltongibson, I'll try that in the upcoming days and let you know if I see anything unusual :)

@agateblue
Copy link
Author

@carltongibson I've tested your branch locally on my project, and everything seems to be working so far:

  1. I don't see any regression with core features (e.g our API answers as usual)
  2. Our file upload issue is fixed (I can successfully upload files bigger than 2.5mio)
  3. Our test suite runs without issue (I'm not sure if it's relevant because we don't have E2E tests, so we may not be calling Channels logic at all)

Let me know if you need me to check anything else in particular!

@carltongibson
Copy link
Member

Thanks for taking the time to give it a run @EliotBerriot!

@pythonBerg
Copy link

Hello. I am experiencing this same issue in 2.1.7 and see all the discussion on this thread and #1251. Can someone help me understand the bottom line. Is there a branch or code patch out there? Are there just a few lines I can edit myself. Thanks all...

@carltongibson
Copy link
Member

Hi @pythonBerg. This is just waiting for a bit of bandwidth over the summer to finish off.

The PR in #1251 works. You can install that and give it a run and report back there. That would be super.

As it stands I’m not ready to just push it out to folks as maybe there’s hidden issues. It’s delicate. I want to test it first.

So, in its place I want to add a SpooledTemporaryFile version, that is much more clearly safe, and then offer the experimental version as an option for the brave to try. Once we’ve had a chance to see it in action, it can become the default. (In theory it’s the way to go...)

The hold up is just person-power. A set of releases here for Channels, Daphne and Channels redis is my goal for the summer. Any input from fellow humans here or elsewhere is greatly appreciated.

@carltongibson
Copy link
Member

Fixed in #1352. Will be out later today.

ritlew added a commit to ritlew/django-hls-video that referenced this issue Dec 27, 2019
Update channels as a bug was preventing development that got fixed in
2.3.0 (django/channels#1240)
ddabble added a commit to MAKENTNU/web that referenced this issue Oct 24, 2020
The version constraint on `channels` is due to a bug that was fixed in 2.3.0 (see django/channels#1240).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants