Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory/CPU blows up on heavy load when using driver from after commit Fix TestWriteCoalescing (#1188) #1223

Closed
rlk833 opened this issue Oct 19, 2018 · 3 comments

Comments

@rlk833
Copy link
Contributor

rlk833 commented Oct 19, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Cassandra are you using?

Compose ScyllaDB 2.0.3 and 1.7.5

What version of Gocql are you using?

This occurred on some commit after 5a139e8

What did you do?

We always have a heavy load on our app. We often do 1000's/minute of DB calls. Up to and including the above commit we never had a problem. It would hum along keeping a steady working set of memory and cpu for weeks at a time.

We last installed our app on Sept 20 and it used the above commit. We installed an update to our app on last Sat (Oct 13) which used the latest from master at the time. We then had the problem. We reinstalled our updated app but locked it back down to the above commit of gocql and the performance went right back to what we had before; great performance.

What would happen is, sometime usually within an hour, that app would all of a sudden start spiking on memory and cpu usage. Within 10 minutes it would reach the max allowed by our Kubernetes system, which in this case was 16 cores of solid CPU usage and 16GB of processor memory. Kubernetes would then kill us and restart the app. Then things would be fine for a while and then the process would start all over again.

Our app normally hums along at solid 2 CPU core and 500Mb of memory. And it is doing that again when we locked it down to the above commit.

Our customers were up and down because of this for four days till we found the problem. Since it only happens under heavy load we have been unable to test which commit after #1188 actually caused the problem.

Thanks.

@Zariel
Copy link
Contributor

Zariel commented Oct 19, 2018

What version was deployed that caused the issue?

@rlk833
Copy link
Contributor Author

rlk833 commented Oct 20, 2018

I really don't know. It took us five days before we even determined that the driver was causing the problem. but it would of been whatever was in master branch Sat Oct. 13, the build/deploy always takes what is in master. So it would of had to been the commit on Oct 12, for issue #1215. But it could of been any commit between that one and #1188. We didn't build with any in between so we don't know the exact build that caused the problem.

@Zariel
Copy link
Contributor

Zariel commented Oct 25, 2018

going to assume this was due to #1139. Please provide either a commit which causes it, a memory profile or a reproducer and specific versions if this is still an issue.

@Zariel Zariel closed this as completed Oct 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants