-
Notifications
You must be signed in to change notification settings - Fork 300
Use PycURL for fetching event streams #381
Use PycURL for fetching event streams #381
Conversation
Can one of the admins verify this patch? |
I have also been experiencing this issue. While the upgrades to HAProxy (#374) have resolved some of the 100% CPU issues, I see giant spikes from Python itself from these calls. |
We're definitely going to need to support auth. It's probably worth dropping the other modes (polling & callback) as well. It's probably okay to use PycURL for SSE, and requests for everything else. In which case, we can reuse the requests-based auth stuff, and just pass the token in the header when it's present. |
@brndnmtthws awesome, thanks for the quick feedback! As long as the main thread of work here (switching to PycURL to work around issues in core python urllib) is acceptable to everyone, I can take some more time to get auth working again. I'll also remove the marked-as-deprecated old event subscription code to make this all a bit cleaner. If I make a best-effort pass at restoring auth functionality, can I get someone with a DC/OS setup to test it? Your approach seems perfectly sane- continue to use requests to grab the token, but ensure the right token headers end up on the cURL-based request. Here's the current situation, if it helps to show some real metrics on what this is trying to solve. The red background indicates deploy events in our environment; you can see these cause large CPU usage spikes by the marathon-lb docker process: |
I'd be happy to test it and work with you on the PR. The auth code is here: https://github.com/mesosphere/marathon-lb/blob/master/common.py#L53-L90 It fetches a token periodically (approx. every hour), and sets the |
a628b63
to
e7f0666
Compare
OK, @brndnmtthws - this PR has been updated to the best of my abilities. Changes since last time:
Testing against a non DC/OS stack, this seems to be working fine for us. Our JSON blobs have grown to 8 MB at this point, and this implementation works much smoother than the native Python/requests one did. |
Ping? Thoughts here? If marathon 1.4.x is going to make this problem go away with smaller event payloads, this can likely get dropped, but we're still experiencing this problem all the time with the current version due to huge payloads. |
Earlier issues d2iq-archive#35 and d2iq-archive#114 made things better in this department. However, when running marathon-lb with a large (400+ applications) marathon instance, there are still problems. These problems can be traced back to Python itself, unfortunately: http://stackoverflow.com/questions/21797753/efficiently-reading-lines-from-compressed-chunked-http-stream-as-they-arrive Python requests uses urllib under the covers, and there are implicit issues when 7.5 MB of JSON comes back on a single line, as we're seeing when certain events are emitted. These events are deployment_info and deployment_success at a minimum, there may be more. By switching to PycURL, as noted in the Stack Overflow post, we bypass this whole issue. We use an HTTP library that handles this particular edge case well, reducing CPU usage dramatically when a large event comes in. It also handles gzip compression, which means any 7.5 MB JSON dumps should shrink significantly.
1bda6a2
to
00a58cb
Compare
I apologize for taking so long to review this. I've got a lot of other stuff going on 🙂 I'll dig into it today, and test it w/ auth. |
payload = { | ||
'uid': self.uid, | ||
# This is the expiry of the auth request params | ||
'exp': int(time.time()) + 60, | ||
'exp': now + 60, | ||
} | ||
token = jwt.encode(payload, self.private_key, 'RS256') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid this kind of side effect. The method refresh_auth_header
should always return the content of self.auth_header
.
The __call__
should be defined as:
def __call__(self, auth_request):
auth_request.headers['Authorization'] = self.refresh_auth_header()
return auth_request
Thanks for taking a look and merging! |
Earlier issues #35 and #114 made things better in this department. However,
when running marathon-lb with a large (400+ applications) marathon instance,
there are still problems.
These problems can be traced back to Python itself, unfortunately:
http://stackoverflow.com/questions/21797753/efficiently-reading-lines-from-compressed-chunked-http-stream-as-they-arrive
Python requests uses urllib under the covers, and there are implicit issues
when 7.5 MB of JSON comes back on a single line, as we're seeing when certain
events are emitted. These events are deployment_info and deployment_success at
a minimum, there may be more.
By switching to PycURL, as noted in the Stack Overflow post, we bypass this
whole issue. We use an HTTP library that handles this particular edge case
well, reducing CPU usage dramatically when a large event comes in. It also
handles gzip compression, which means any 7.5 MB JSON dumps should shrink
significantly.
One unsolved problem remains here: the addition of DC/OS authentication support
in #285 is extremely tightly coupled to internal implementation details of the
python requests module. This simply won't work with this code, and I have zero
ability to fix or test it as we don't use DC/OS.