Skip to content
This repository has been archived by the owner on Dec 4, 2024. It is now read-only.

Use PycURL for fetching event streams #381

Merged
merged 3 commits into from
Jan 23, 2017

Conversation

toofishes
Copy link

Earlier issues #35 and #114 made things better in this department. However,
when running marathon-lb with a large (400+ applications) marathon instance,
there are still problems.

These problems can be traced back to Python itself, unfortunately:
http://stackoverflow.com/questions/21797753/efficiently-reading-lines-from-compressed-chunked-http-stream-as-they-arrive

Python requests uses urllib under the covers, and there are implicit issues
when 7.5 MB of JSON comes back on a single line, as we're seeing when certain
events are emitted. These events are deployment_info and deployment_success at
a minimum, there may be more.

By switching to PycURL, as noted in the Stack Overflow post, we bypass this
whole issue. We use an HTTP library that handles this particular edge case
well, reducing CPU usage dramatically when a large event comes in. It also
handles gzip compression, which means any 7.5 MB JSON dumps should shrink
significantly.

One unsolved problem remains here: the addition of DC/OS authentication support
in #285 is extremely tightly coupled to internal implementation details of the
python requests module. This simply won't work with this code, and I have zero
ability to fix or test it as we don't use DC/OS.

@mesosphere-ci
Copy link

Can one of the admins verify this patch?

@tylermarshall
Copy link
Contributor

I have also been experiencing this issue. While the upgrades to HAProxy (#374) have resolved some of the 100% CPU issues, I see giant spikes from Python itself from these calls.

@brndnmtthws
Copy link
Contributor

We're definitely going to need to support auth. It's probably worth dropping the other modes (polling & callback) as well.

It's probably okay to use PycURL for SSE, and requests for everything else. In which case, we can reuse the requests-based auth stuff, and just pass the token in the header when it's present.

@toofishes
Copy link
Author

@brndnmtthws awesome, thanks for the quick feedback! As long as the main thread of work here (switching to PycURL to work around issues in core python urllib) is acceptable to everyone, I can take some more time to get auth working again. I'll also remove the marked-as-deprecated old event subscription code to make this all a bit cleaner.

If I make a best-effort pass at restoring auth functionality, can I get someone with a DC/OS setup to test it? Your approach seems perfectly sane- continue to use requests to grab the token, but ensure the right token headers end up on the cURL-based request.

Here's the current situation, if it helps to show some real metrics on what this is trying to solve. The red background indicates deploy events in our environment; you can see these cause large CPU usage spikes by the marathon-lb docker process:

marathon-lb-cpu-load

@brndnmtthws
Copy link
Contributor

I'd be happy to test it and work with you on the PR. The auth code is here: https://github.com/mesosphere/marathon-lb/blob/master/common.py#L53-L90

It fetches a token periodically (approx. every hour), and sets the Authorization header with that token.

@toofishes toofishes force-pushed the pycurl-for-event-stream branch from a628b63 to e7f0666 Compare December 28, 2016 22:37
@toofishes
Copy link
Author

OK, @brndnmtthws - this PR has been updated to the best of my abilities. Changes since last time:

  • DC/OS JWT-based auth should now be supported. However, I can't test it against an actual implementation, because I don't have one. Testing help would be welcome!
  • The old event/listening mode has been dropped, and all code removed that was associated with it.
  • I kept 'poll' around because it wasn't much of a burden.

Testing against a non DC/OS stack, this seems to be working fine for us. Our JSON blobs have grown to 8 MB at this point, and this implementation works much smoother than the native Python/requests one did.

@toofishes
Copy link
Author

Ping? Thoughts here?

If marathon 1.4.x is going to make this problem go away with smaller event payloads, this can likely get dropped, but we're still experiencing this problem all the time with the current version due to huge payloads.

Dan McGee added 3 commits January 20, 2017 12:06
Earlier issues d2iq-archive#35 and d2iq-archive#114 made things better in this department. However,
when running marathon-lb with a large (400+ applications) marathon instance,
there are still problems.

These problems can be traced back to Python itself, unfortunately:
http://stackoverflow.com/questions/21797753/efficiently-reading-lines-from-compressed-chunked-http-stream-as-they-arrive

Python requests uses urllib under the covers, and there are implicit issues
when 7.5 MB of JSON comes back on a single line, as we're seeing when certain
events are emitted. These events are deployment_info and deployment_success at
a minimum, there may be more.

By switching to PycURL, as noted in the Stack Overflow post, we bypass this
whole issue. We use an HTTP library that handles this particular edge case
well, reducing CPU usage dramatically when a large event comes in. It also
handles gzip compression, which means any 7.5 MB JSON dumps should shrink
significantly.
@toofishes toofishes force-pushed the pycurl-for-event-stream branch from 1bda6a2 to 00a58cb Compare January 20, 2017 18:08
@brndnmtthws
Copy link
Contributor

I apologize for taking so long to review this. I've got a lot of other stuff going on 🙂

I'll dig into it today, and test it w/ auth.

payload = {
'uid': self.uid,
# This is the expiry of the auth request params
'exp': int(time.time()) + 60,
'exp': now + 60,
}
token = jwt.encode(payload, self.private_key, 'RS256')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid this kind of side effect. The method refresh_auth_header should always return the content of self.auth_header.

The __call__ should be defined as:

def __call__(self, auth_request):
    auth_request.headers['Authorization'] = self.refresh_auth_header()
    return auth_request

@brndnmtthws brndnmtthws merged commit 00a58cb into d2iq-archive:master Jan 23, 2017
@toofishes
Copy link
Author

Thanks for taking a look and merging!

@toofishes toofishes deleted the pycurl-for-event-stream branch June 15, 2017 14:06
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants