Use PycURL for fetching event streams #381

toofishes · 2016-12-14T02:23:19Z

Earlier issues #35 and #114 made things better in this department. However,
when running marathon-lb with a large (400+ applications) marathon instance,
there are still problems.

These problems can be traced back to Python itself, unfortunately:
http://stackoverflow.com/questions/21797753/efficiently-reading-lines-from-compressed-chunked-http-stream-as-they-arrive

Python requests uses urllib under the covers, and there are implicit issues
when 7.5 MB of JSON comes back on a single line, as we're seeing when certain
events are emitted. These events are deployment_info and deployment_success at
a minimum, there may be more.

By switching to PycURL, as noted in the Stack Overflow post, we bypass this
whole issue. We use an HTTP library that handles this particular edge case
well, reducing CPU usage dramatically when a large event comes in. It also
handles gzip compression, which means any 7.5 MB JSON dumps should shrink
significantly.

One unsolved problem remains here: the addition of DC/OS authentication support
in #285 is extremely tightly coupled to internal implementation details of the
python requests module. This simply won't work with this code, and I have zero
ability to fix or test it as we don't use DC/OS.

mesosphere-ci · 2016-12-14T02:23:21Z

Can one of the admins verify this patch?

tylermarshall · 2016-12-14T02:28:18Z

I have also been experiencing this issue. While the upgrades to HAProxy (#374) have resolved some of the 100% CPU issues, I see giant spikes from Python itself from these calls.

brndnmtthws · 2016-12-14T16:52:16Z

We're definitely going to need to support auth. It's probably worth dropping the other modes (polling & callback) as well.

It's probably okay to use PycURL for SSE, and requests for everything else. In which case, we can reuse the requests-based auth stuff, and just pass the token in the header when it's present.

toofishes · 2016-12-14T17:11:26Z

@brndnmtthws awesome, thanks for the quick feedback! As long as the main thread of work here (switching to PycURL to work around issues in core python urllib) is acceptable to everyone, I can take some more time to get auth working again. I'll also remove the marked-as-deprecated old event subscription code to make this all a bit cleaner.

If I make a best-effort pass at restoring auth functionality, can I get someone with a DC/OS setup to test it? Your approach seems perfectly sane- continue to use requests to grab the token, but ensure the right token headers end up on the cURL-based request.

Here's the current situation, if it helps to show some real metrics on what this is trying to solve. The red background indicates deploy events in our environment; you can see these cause large CPU usage spikes by the marathon-lb docker process:

brndnmtthws · 2016-12-14T17:32:42Z

I'd be happy to test it and work with you on the PR. The auth code is here: https://github.com/mesosphere/marathon-lb/blob/master/common.py#L53-L90

It fetches a token periodically (approx. every hour), and sets the Authorization header with that token.

toofishes · 2016-12-28T22:41:17Z

OK, @brndnmtthws - this PR has been updated to the best of my abilities. Changes since last time:

DC/OS JWT-based auth should now be supported. However, I can't test it against an actual implementation, because I don't have one. Testing help would be welcome!
The old event/listening mode has been dropped, and all code removed that was associated with it.
I kept 'poll' around because it wasn't much of a burden.

Testing against a non DC/OS stack, this seems to be working fine for us. Our JSON blobs have grown to 8 MB at this point, and this implementation works much smoother than the native Python/requests one did.

toofishes · 2017-01-20T18:03:41Z

Ping? Thoughts here?

If marathon 1.4.x is going to make this problem go away with smaller event payloads, this can likely get dropped, but we're still experiencing this problem all the time with the current version due to huge payloads.

Earlier issues d2iq-archive#35 and d2iq-archive#114 made things better in this department. However, when running marathon-lb with a large (400+ applications) marathon instance, there are still problems. These problems can be traced back to Python itself, unfortunately: http://stackoverflow.com/questions/21797753/efficiently-reading-lines-from-compressed-chunked-http-stream-as-they-arrive Python requests uses urllib under the covers, and there are implicit issues when 7.5 MB of JSON comes back on a single line, as we're seeing when certain events are emitted. These events are deployment_info and deployment_success at a minimum, there may be more. By switching to PycURL, as noted in the Stack Overflow post, we bypass this whole issue. We use an HTTP library that handles this particular edge case well, reducing CPU usage dramatically when a large event comes in. It also handles gzip compression, which means any 7.5 MB JSON dumps should shrink significantly.

brndnmtthws · 2017-01-23T18:28:58Z

I apologize for taking so long to review this. I've got a lot of other stuff going on 🙂

I'll dig into it today, and test it w/ auth.

robsonpeixoto · 2017-01-21T12:38:09Z

common.py

            payload = {
                'uid': self.uid,
                # This is the expiry of the auth request params
-                'exp': int(time.time()) + 60,
+                'exp': now + 60,
            }
            token = jwt.encode(payload, self.private_key, 'RS256')


Avoid this kind of side effect. The method refresh_auth_header should always return the content of self.auth_header.

The __call__ should be defined as:

def __call__(self, auth_request): auth_request.headers['Authorization'] = self.refresh_auth_header() return auth_request

toofishes · 2017-01-24T03:34:41Z

Thanks for taking a look and merging!

toofishes force-pushed the pycurl-for-event-stream branch from a628b63 to e7f0666 Compare December 28, 2016 22:37

Dan McGee added 3 commits January 20, 2017 12:06

Enable DC/OS auth with PycURL event stream processing

39fe1d6

Remove deprecated event/listening mode

00a58cb

toofishes force-pushed the pycurl-for-event-stream branch from 1bda6a2 to 00a58cb Compare January 20, 2017 18:08

robsonpeixoto reviewed Jan 23, 2017

View reviewed changes

brndnmtthws merged commit 00a58cb into d2iq-archive:master Jan 23, 2017

brndnmtthws mentioned this pull request Feb 2, 2017

Two http client libraries (pycurl and requests) #413

Closed

toofishes deleted the pycurl-for-event-stream branch June 15, 2017 14:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use PycURL for fetching event streams #381

Use PycURL for fetching event streams #381

toofishes commented Dec 14, 2016

mesosphere-ci commented Dec 14, 2016

tylermarshall commented Dec 14, 2016

brndnmtthws commented Dec 14, 2016

toofishes commented Dec 14, 2016

brndnmtthws commented Dec 14, 2016

toofishes commented Dec 28, 2016

toofishes commented Jan 20, 2017

brndnmtthws commented Jan 23, 2017

robsonpeixoto Jan 21, 2017

toofishes commented Jan 24, 2017

Use PycURL for fetching event streams #381

Use PycURL for fetching event streams #381

Conversation

toofishes commented Dec 14, 2016

mesosphere-ci commented Dec 14, 2016

tylermarshall commented Dec 14, 2016

brndnmtthws commented Dec 14, 2016

toofishes commented Dec 14, 2016

brndnmtthws commented Dec 14, 2016

toofishes commented Dec 28, 2016

toofishes commented Jan 20, 2017

brndnmtthws commented Jan 23, 2017

robsonpeixoto Jan 21, 2017

Choose a reason for hiding this comment

toofishes commented Jan 24, 2017