Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression in consumer throughput when upgrading 1.10 to 1.16 #1101

Closed
Dieterbe opened this issue May 8, 2018 · 6 comments
Closed

regression in consumer throughput when upgrading 1.10 to 1.16 #1101

Dieterbe opened this issue May 8, 2018 · 6 comments
Labels
consumer stale Issues and pull requests without any recent activity

Comments

@Dieterbe
Copy link
Contributor

Dieterbe commented May 8, 2018

when doing this sarama upgrade:

-  revision = "bd61cae2be85fa6ff40eb23dcdd24567967ac2ae"
-  version = "v1.10.1"
+  revision = "f7be6aa2bc7b2e38edf816b08b582782194a1c02"
+  version = "v1.16.0"

and go version go1.10.1 linux/amd64, kafka 0.10.0.1

egrep -v '^$|^#' server.properties
broker.id=0
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka-logs
num.partitions=8
num.recovery.threads.per.data.dir=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=localhost:2181
zookeeper.connection.timeout.ms=6000

and these sarama settings

# The number of metrics to buffer in internal and external channels -> config.ChannelBufferSize
channel-buffer-size = 1000
# The minimum number of message bytes to fetch in a request -> config.Consumer.Fetch.Min
consumer-fetch-min = 1
# The default number of message bytes to fetch in a request -> config.Consumer.Fetch.Default
consumer-fetch-default = 32768
# The maximum amount of time the broker will wait for Consumer.Fetch.Min bytes to become available before it
consumer-max-wait-time = 1s -> config.Consumer.MaxWaitTime
#The maximum amount of time the consumer expects a message takes to process -> config.Consumer.MaxProcessingTime
consumer-max-processing-time = 1s 
# How many outstanding requests a connection is allowed to have before sending on it blocks -> config.Net.MaxOpenRequests
net-max-open-requests = 100

config.Version = sarama.V0_10_0_0

snappy compression

we notice a decrease in consumption throughput, by about 10% (600kHz rate to 540kHz rate)
see https://snapshot.raintank.io/dashboard/snapshot/uo11mnUs6T9hlqvUQ0VlxkA71SVlTG1e?orgId=2
notice in particular the upper left chart, which becomes clearer when you zoom in like so : https://snapshot.raintank.io/dashboard/snapshot/uo11mnUs6T9hlqvUQ0VlxkA71SVlTG1e?panelId=9&fullscreen&orgId=2

a secondary concern is that memory usage and allocation rate has increased as well (see 3rd row)
but the main one is the throughput.

@Dieterbe Dieterbe changed the title regression in consumer throughput regression in consumer throughput when upgrading 1.10 to 1.16 May 8, 2018
@Dieterbe
Copy link
Contributor Author

Dieterbe commented May 8, 2018

a bit more info here grafana/metrictank#906
I will see if i can find anything in heap and cpu profiles.
i also want to do redo the experiment with some versions in between, to see how recent I can get without hitting the regression.

@eapache
Copy link
Contributor

eapache commented May 9, 2018

Odds are this is due to either #933, #1028, or the whole abstraction layer that was added to support Kafka 1.0's new record batch format. Whether the regression occurs in 1.13, 1.14, or 1.16 should tell us.

@Dieterbe
Copy link
Contributor Author

Dieterbe commented May 10, 2018

https://snapshot.raintank.io/dashboard/snapshot/0nTKuMieusLixFu5rBTm8ZGrMG6fPG0o?orgId=2
this shows 7 runs in this order:
mt-sarama-1.10.1
mt-sarama-1.11.0
mt-sarama-1.12.0
mt-sarama-1.13.0
mt-sarama-1.14.0
mt-sarama-1.15.0
mt-sarama-1.16.0

upper left chart is the most interesting one (consumption rate)
the results are a bit noisy however, i think i will redo them with more data to read out of kafka and less other stuff going on on my system
i'm also thinking of disabling the core functionality of our consumers, so we only stresstest sarama and don't use the rest of our software

@Dieterbe
Copy link
Contributor Author

Dieterbe commented May 15, 2018

another run of the same versions in the same order as before.
now with cpu freq scaling disabled and no_hz full enabled on all cores, and with most business logic disabled, to get clearer results of the performance of sarama itself.
https://snapshot.raintank.io/dashboard/snapshot/3okjgue0NHhParxUc6ySgYyp1yOuh5de?orgId=2
https://snapshot.raintank.io/dashboard/snapshot/dQY7m1Ei9kjF5kk0mo6OvumDvUzX2WDj?orgId=2
what stands out to me here is that the 4th run (1.13.0) performed best in throughput, and also did it with a fairly low cpu usage compared to other versions.
also it seems in this run the biggest regression was from 1.15 to 1.16

i plan to do another run to confirm
EDIT: new run with more data, confirms the same:
https://snapshot.raintank.io/dashboard/snapshot/FFpvRGqn67gVXmDjflCt8VA038h3xMX3?orgId=2
https://snapshot.raintank.io/dashboard/snapshot/BldJGIpnXuKnv9rKnMonULxTJx2F9DaG?orgId=2

@eapache
Copy link
Contributor

eapache commented Jun 14, 2018

Hmm, if you revert #1028 (and the follow-up eae9146) does that solve the majority of the regression?

@ghost
Copy link

ghost commented Feb 21, 2020

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur.
Please check if the master branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

@ghost ghost added the stale Issues and pull requests without any recent activity label Feb 21, 2020
@ghost ghost closed this as completed Mar 23, 2020
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consumer stale Issues and pull requests without any recent activity
Projects
None yet
Development

No branches or pull requests

2 participants