Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce heap and GC overhead #84

Open
deadok22 opened this issue Jul 9, 2019 · 2 comments
Open

Reduce heap and GC overhead #84

deadok22 opened this issue Jul 9, 2019 · 2 comments

Comments

@deadok22
Copy link

deadok22 commented Jul 9, 2019

Reduce heap and GC overhead by rendering packets on the caller threads and queuing byte arrays or byte buffers.

The messages are currently rendered to instances of java.lang.String and are queued up in StatsDSender. Rendering the packets on caller threads and queuing byte[] or ByteBuffer instances will:

  • Yield a 2x reduction in heap volume occupied by potentially not-so-short-lived objects
  • Reduce the GC load - smaller message objects are cheaper to copy between generations
  • Reduce the amount of work done on the StatsDSender thread and increase the throughput

We could also skip the intermediate StringBuilder representation and render straight to ByteBuffer.

@truthbk
Copy link
Member

truthbk commented May 5, 2020

We have merged and released #105 to reduce the number of allocations, we now rely on StringBuilders differently to achieve this.

We can probably do even better, and your suggestion is still very much valid, but for now this was a low impact alternative for our customers that would work without any code changes.

We have a some more work in the pipe, so hopefully we can reduce GC load even further. Thank you for your insight @deadok22 🙇

@deadok22
Copy link
Author

deadok22 commented May 6, 2020

Hi @truthbk! Thanks for following up on this.

I took a glance at #105 and I must say I fail to see how that change is a performance improvement in some aspects of it. Let me elaborate:

  • The messages are now relayed from the caller threads to the StatsDSender thread using ArrayBlockingQueue<Message> rather than ArrayBlockingQueue<String>.
    • The String and String[] objects used to represent tags are now retained until the message is sent. In the previous version only 2 potentially long-lived objects were used to represent a message (the String and its char[])
    • The Message object representation comes with an additional object allocation. It is also retained until the message is sent.
  • The messages themselves are now formatted on the StatsDSender thread. The old code performed formatting on the caller threads, so the throughput may have reduced as a result of this change. (see the third suggestion in this issue - the opposite was done in [client] reduce the number of allocations #105)

It seems the new version may incur considerably more load on the GC than the old one in some scenarios.

I just realized what I referred to as the StatsDSender thread is now a different thing (apologies - I have an older version of the code in my head) - a set of threads responsible for rendering the messages into ByteBuffer. So a message's lifecycle looks like this:

  • A caller thread creates a Message object and puts it to a queue
  • A processor thread gets a Message object from that queue, renders it into a ByteBuffer, and puts it to another queue
  • The sender thread gets the ByteBuffer and sends it over to the agent

Although more work is now parallelized and some work is loaded off the caller threads, I'm concerned with the extra synchronization, allocation, and heap overhead the new processing model incurs.

What benchmarks were used for evaluating the new processing model and the changes in #105?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants