Skip to content
This repository has been archived by the owner on Dec 16, 2021. It is now read-only.

DrKafkaStart failed, NullPointerException getProcessingStartOffsets #87

Open
yhzhao opened this issue Jan 17, 2019 · 16 comments
Open

DrKafkaStart failed, NullPointerException getProcessingStartOffsets #87

yhzhao opened this issue Jan 17, 2019 · 16 comments
Assignees

Comments

@yhzhao
Copy link

yhzhao commented Jan 17, 2019

java -server -cp lib/:doctorkafka-0.2.4.3.jar com.pinterest.doctorkafka.DoctorKafkaMain server dropwizard.yml
ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
INFO [2019-01-17 21:15:44,236] io.dropwizard.server.DefaultServerFactory: Registering jersey handler with root path prefix: /
INFO [2019-01-17 21:15:44,239] io.dropwizard.server.DefaultServerFactory: Registering admin handler with root path prefix: /
INFO [2019-01-17 21:15:44,239] io.dropwizard.assets.AssetsBundle: Registering AssetBundle with name: assets for path /

INFO [2019-01-17 21:15:44,416] com.pinterest.doctorkafka.util.ZookeeperClient: Initialize curator with zkurl:df_zookeeper_dfops_743_1_development_datafabric_kafka_aws_us_ea.aws.athenahealth.com:2181/
INFO [2019-01-17 21:15:44,487] org.apache.curator.utils.Compatibility: Running in ZooKeeper 3.4.x compatibility mode
INFO [2019-01-17 21:15:44,513] org.apache.curator.framework.imps.CuratorFrameworkImpl: Starting
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:host.name=ip-10-129-102-236.us-east-1.aws.athenahealth.com
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.8.0_191
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/jre
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:java.class.path=lib/kafkastats-0.2.4.3.jar:lib/commons-cli-1.3.1.jar:lib/avro-1.8.2.jar:lib/jackson-core-asl-1.9.13.jar:lib/jackson-mapper-asl-1.9.13.jar:lib/paranamer-2.7.jar:lib/snappy-java-1.1.1.3.jar:lib/commons-compress-1.8.1.jar:lib/xz-1.5.jar:lib/slf4j-api-1.7.25.jar:lib/kafka_2.12-1.1.0.jar:lib/kafka-clients-1.1.0.jar:lib/lz4-java-1.4.jar:lib/jackson-databind-2.9.8.jar:lib/jackson-annotations-2.9.8.jar:lib/jackson-core-2.9.8.jar:lib/jopt-simple-5.0.4.jar:lib/metrics-core-2.2.0.jar:lib/scala-library-2.12.7.jar:lib/scala-reflect-2.12.4.jar:lib/scala-logging_2.12-3.7.2.jar:lib/zkclient-0.10.jar:lib/zookeeper-3.4.10.jar:lib/log4j-1.2.16.jar:lib/jline-0.9.94.jar:lib/junit-3.8.1.jar:lib/netty-3.10.5.Final.jar:lib/log4j-core-2.11.1.jar:lib/log4j-api-2.11.1.jar:lib/ostrich_2.12-9.27.0.jar:lib/util-core_2.12-6.43.0.jar:lib/util-function_2.12-6.43.0.jar:lib/scala-parser-combinators_2.12-1.0.4.jar:lib/util-eval_2.12-6.43.0.jar:lib/scala-compiler-2.12.1.jar:lib/scala-xml_2.12-1.0.6.jar:lib/util-logging_2.12-6.43.0.jar:lib/util-app_2.12-6.43.0.jar:lib/util-registry_2.12-6.43.0.jar:lib/util-stats_2.12-6.43.0.jar:lib/util-lint_2.12-6.43.0.jar:lib/caffeine-2.3.4.jar:lib/jsr305-1.3.9.jar:lib/util-jvm_2.12-6.43.0.jar:lib/jackson-module-scala_2.12-2.9.7.jar:lib/jackson-module-paranamer-2.9.7.jar:lib/guava-23.0.jar:lib/error_prone_annotations-2.0.18.jar:lib/j2objc-annotations-1.1.jar:lib/animal-sniffer-annotations-1.14.jar:lib/commons-lang3-3.6.jar:lib/commons-configuration2-2.4.jar:lib/commons-text-1.6.jar:lib/commons-logging-1.2.jar:lib/commons-math3-3.6.1.jar:lib/commons-beanutils-1.9.3.jar:lib/commons-collections-3.2.2.jar:lib/commons-validator-1.6.jar:lib/commons-digester-1.8.1.jar:lib/curator-framework-4.0.1.jar:lib/curator-client-4.0.1.jar:lib/metrics-core-3.2.3.jar:lib/gson-2.8.2.jar:lib/javax.annotation-api-1.3.2.jar:lib/jaxws-api-2.3.1.jar:lib/jaxb-api-2.3.1.jar:lib/javax.activation-api-1.2.0.jar:lib/javax.xml.soap-api-1.4.0.jar:lib/dropwizard-assets-1.3.8.jar:lib/dropwizard-core-1.3.8.jar:lib/dropwizard-util-1.3.8.jar:lib/joda-time-2.9.9.jar:lib/dropwizard-jackson-1.3.8.jar:lib/jackson-datatype-guava-2.9.6.jar:lib/jackson-datatype-jsr310-2.9.6.jar:lib/jackson-datatype-jdk8-2.9.6.jar:lib/jackson-module-parameter-names-2.9.6.jar:lib/jackson-module-afterburner-2.9.6.jar:lib/jackson-datatype-joda-2.9.6.jar:lib/dropwizard-validation-1.3.8.jar:lib/hibernate-validator-5.4.2.Final.jar:lib/validation-api-1.1.0.Final.jar:lib/jboss-logging-3.3.0.Final.jar:lib/classmate-1.3.1.jar:lib/javax.el-3.0.0.jar:lib/javassist-3.22.0-GA.jar:lib/dropwizard-configuration-1.3.8.jar:lib/jackson-dataformat-yaml-2.9.6.jar:lib/snakeyaml-1.18.jar:lib/dropwizard-logging-1.3.8.jar:lib/metrics-logback-4.0.2.jar:lib/jul-to-slf4j-1.7.25.jar:lib/logback-core-1.2.3.jar:lib/logback-classic-1.2.3.jar:lib/jcl-over-slf4j-1.7.25.jar:lib/jetty-util-9.4.14.v20181114.jar:lib/dropwizard-metrics-1.3.8.jar:lib/dropwizard-lifecycle-1.3.8.jar:lib/javax.servlet-api-3.1.0.jar:lib/jetty-server-9.4.14.v20181114.jar:lib/jetty-http-9.4.14.v20181114.jar:lib/jetty-io-9.4.14.v20181114.jar:lib/dropwizard-jersey-1.3.8.jar:lib/jersey-server-2.25.1.jar:lib/jersey-common-2.25.1.jar:lib/javax.ws.rs-api-2.0.1.jar:lib/jersey-guava-2.25.1.jar:lib/hk2-api-2.5.0-b32.jar:lib/hk2-utils-2.5.0-b32.jar:lib/aopalliance-repackaged-2.5.0-b32.jar:lib/javax.inject-2.5.0-b32.jar:lib/hk2-locator-2.5.0-b32.jar:lib/osgi-resource-locator-1.0.1.jar:lib/jersey-client-2.25.1.jar:lib/jersey-media-jaxb-2.25.1.jar:lib/jersey-metainf-services-2.25.1.jar:lib/jersey-bean-validation-2.25.1.jar:lib/metrics-jersey2-4.0.2.jar:lib/metrics-annotation-4.0.2.jar:lib/jackson-jaxrs-json-provider-2.9.6.jar:lib/jackson-jaxrs-base-2.9.6.jar:lib/jackson-module-jaxb-annotations-2.9.6.jar:lib/jersey-container-servlet-2.25.1.jar:lib/jersey-container-servlet-core-2.25.1.jar:lib/jetty-webapp-9.4.14.v20181114.jar:lib/jetty-xml-9.4.14.v20181114.jar:lib/jetty-servlet-9.4.14.v20181114.jar:lib/jetty-security-9.4.14.v20181114.jar:lib/jetty-continuation-9.4.14.v20181114.jar:lib/dropwizard-servlets-1.3.8.jar:lib/dropwizard-jetty-1.3.8.jar:lib/metrics-jetty9-4.0.2.jar:lib/jetty-servlets-9.4.14.v20181114.jar:lib/metrics-jvm-4.0.2.jar:lib/metrics-jmx-4.0.2.jar:lib/metrics-servlets-4.0.2.jar:lib/metrics-healthchecks-4.0.2.jar:lib/metrics-json-4.0.2.jar:lib/profiler-1.0.2.jar:lib/dropwizard-request-logging-1.3.8.jar:lib/logback-access-1.2.3.jar:lib/argparse4j-0.8.1.jar:lib/jetty-setuid-java-1.0.3.jar:lib/dropwizard-auth-1.3.8.jar:doctorkafka-0.2.4.3.jar
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.10.0-862.3.2.el7.x86_64
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:user.name=yuzhao
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/yuzhao
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/yuzhao
INFO [2019-01-17 21:15:44,525] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=df_zookeeper_dfops_743_1_development_datafabric_kafka_aws_us_ea.aws.athenahealth.com:2181/ sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@77681ce4
INFO [2019-01-17 21:15:44,619] org.apache.zookeeper.ClientCnxn: Opening socket connection to server 10.129.107.71/10.129.107.71:2181. Will not attempt to authenticate using SASL (unknown error)
INFO [2019-01-17 21:15:44,627] org.apache.curator.framework.imps.CuratorFrameworkImpl: Default schema
INFO [2019-01-17 21:15:44,636] org.apache.zookeeper.ClientCnxn: Socket connection established to 10.129.107.71/10.129.107.71:2181, initiating session
INFO [2019-01-17 21:15:44,648] io.dropwizard.server.ServerFactory: Starting DoctorKafkaMain
INFO [2019-01-17 21:15:44,650] org.apache.zookeeper.ClientCnxn: Session establishment complete on server 10.129.107.71/10.129.107.71:2181, sessionid = 0x1684d33f9000020, negotiated timeout = 40000
INFO [2019-01-17 21:15:44,669] org.apache.curator.framework.state.ConnectionStateManager: State change: CONNECTED
INFO [2019-01-17 21:15:44,845] org.eclipse.jetty.setuid.SetUIDListener: Opened application@6f044c58{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
INFO [2019-01-17 21:15:44,847] org.eclipse.jetty.server.Server: jetty-9.4.14.v20181114; built: 2018-11-14T21:20:31.478Z; git: c4550056e785fb5665914545889f21dc136ad9e6; jvm 1.8.0_191-b12
INFO [2019-01-17 21:15:45,021] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=df_zookeeper_dfops_743_1_development_datafabric_kafka_aws_us_ea.aws.athenahealth.com:2181/zookeeper_datafabric_development_dfops_743_1_kafka_aws_us_east_1_df_zookeeper sessionTimeout=30000 watcher=org.I0Itec.zkclient.ZkClient@277a8c9
INFO [2019-01-17 21:15:45,022] org.I0Itec.zkclient.ZkEventThread: Starting ZkClient event thread.
INFO [2019-01-17 21:15:45,024] org.I0Itec.zkclient.ZkClient: Waiting for keeper state SyncConnected
INFO [2019-01-17 21:15:45,025] org.apache.zookeeper.ClientCnxn: Opening socket connection to server 10.129.107.71/10.129.107.71:2181. Will not attempt to authenticate using SASL (unknown error)
INFO [2019-01-17 21:15:45,027] org.apache.zookeeper.ClientCnxn: Socket connection established to 10.129.107.71/10.129.107.71:2181, initiating session
INFO [2019-01-17 21:15:45,033] org.apache.zookeeper.ClientCnxn: Session establishment complete on server 10.129.107.71/10.129.107.71:2181, sessionid = 0x1684d33f9000021, negotiated timeout = 30000
INFO [2019-01-17 21:15:45,040] org.I0Itec.zkclient.ZkClient: zookeeper state changed (SyncConnected)
INFO [2019-01-17 21:15:45,055] kafka.utils.Log4jControllerRegistration$: Registered kafka:type=kafka.Log4jController MBean
INFO [2019-01-17 21:15:45,419] org.apache.kafka.clients.consumer.ConsumerConfig: ConsumerConfig values:
auto.commit.interval.ms = 5000
auto.offset.reset = latest
bootstrap.servers = [10.129.105.28:9092, 10.129.104.201:9092, 10.129.103.217:9092]
check.crcs = true
client.id =
connections.max.idle.ms = 540000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = doctorkafka
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 4194304
max.poll.interval.ms = 1
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 305000
retry.backoff.ms = 100
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer

INFO [2019-01-17 21:15:45,486] org.apache.kafka.common.utils.AppInfoParser: Kafka version : 1.1.0
INFO [2019-01-17 21:15:45,486] org.apache.kafka.common.utils.AppInfoParser: Kafka commitId : fdcf75ea326b8e07
INFO [2019-01-17 21:15:45,625] org.apache.kafka.clients.Metadata: Cluster ID: M4N2fNprQYKCrKhKzAH4zQ
21:15:45.676 [pool-6-thread-1] ERROR com.pinterest.doctorkafka.DoctorKafkaMain - DoctorKafka start failed
java.lang.NullPointerException: null
at com.pinterest.doctorkafka.replicastats.ReplicaStatsManager.getProcessingStartOffsets(ReplicaStatsManager.java:193) ~[doctorkafka-0.2.4.3.jar:?]
at com.pinterest.doctorkafka.replicastats.ReplicaStatsManager.readPastReplicaStats(ReplicaStatsManager.java:216) ~[doctorkafka-0.2.4.3.jar:?]
at com.pinterest.doctorkafka.DoctorKafka.start(DoctorKafka.java:51) ~[doctorkafka-0.2.4.3.jar:?]
at com.pinterest.doctorkafka.DoctorKafkaMain.lambda$run$0(DoctorKafkaMain.java:68) ~[doctorkafka-0.2.4.3.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
INFO [2019-01-17 21:15:45,782] io.dropwizard.jersey.DropwizardResourceConfig: The following paths were found for the configured resources:

GET     /api/cluster (com.pinterest.doctorkafka.api.ClusterApi)
DELETE  /api/cluster/{clusterName}/admin/maintenance (com.pinterest.doctorkafka.api.MaintenanceApi)
GET     /api/cluster/{clusterName}/admin/maintenance (com.pinterest.doctorkafka.api.MaintenanceApi)
PUT     /api/cluster/{clusterName}/admin/maintenance (com.pinterest.doctorkafka.api.MaintenanceApi)
GET     /api/cluster/{clusterName}/broker (com.pinterest.doctorkafka.api.BrokerApi)

INFO [2019-01-17 21:15:45,784] org.eclipse.jetty.server.handler.ContextHandler: Started i.d.j.MutableServletContextHandler@1961d75a{/,null,AVAILABLE}
INFO [2019-01-17 21:15:45,797] org.eclipse.jetty.server.AbstractConnector: Started application@6f044c58{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
INFO [2019-01-17 21:15:45,797] org.eclipse.jetty.server.Server: Started @3572ms

^C
21:17:16.003 [Thread-2] ERROR com.pinterest.doctorkafka.DoctorKafkaMain - Failure in stopping operator
java.lang.NullPointerException: null
at com.pinterest.doctorkafka.DoctorKafka.stop(DoctorKafka.java:78) ~[doctorkafka-0.2.4.3.jar:?]
at com.pinterest.doctorkafka.DoctorKafkaMain$OperatorCleanupThread.run(DoctorKafkaMain.java:153) [doctorkafka-0.2.4.3.jar:?]
INFO [2019-01-17 21:17:16,006] org.eclipse.jetty.server.AbstractConnector: Stopped application@6f044c58{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
INFO [2019-01-17 21:17:16,012] org.eclipse.jetty.server.handler.ContextHandler: Stopped i.d.j.MutableServletContextHandler@1961d75a{/,null,UNAVAILABLE}

@yhzhao
Copy link
Author

yhzhao commented Jan 17, 2019

And brokerstats topic has nothing in. Does it mean the kafkastats process running on the brokers are not sending anything to the topic?

@yuyang08
Copy link
Contributor

can you tell the message format version that your cluster uses? Based on the exception, doctorkafka failed to find the starting offsets of kafkastats topic. you can use kafka command line tool kafka-console-consumer.sh to check if there are messages in the kafkastats topic.

https://github.com/pinterest/doctorkafka/blob/master/drkafka/src/main/java/com/pinterest/doctorkafka/replicastats/ReplicaStatsManager.java#L190

@yhzhao
Copy link
Author

yhzhao commented Jan 18, 2019

There is no message in that topic when using that command. What is the expected version of kafka?2.11-2.1.0 seems to work fine (at least I don't see this problem.)

@yhzhao
Copy link
Author

yhzhao commented Jan 18, 2019

2.11 - 1.1.0 seems to have problem.

@yhzhao
Copy link
Author

yhzhao commented Jan 18, 2019

Actually, 0.2.4 version of kafkastas does publish message to brokerstats topic. 0.2.4.3 does not. Feels like a version miss-match.

@BrianGallew
Copy link
Contributor

I'm seeing the same issue. I've got a half dozen Kafka clusters with all their brokers running kafkastats-0.2.4.3. I've verified that they are writing data into the brokerstats topic. I've even truncated data to the last hour to ensure that any old stats data (written by an older kafkastats client) has been purged. But I, too, am getting the traceback in my log:

2019-01-23 20:53:18.892 [pool-6-thread-1] ERROR com.pinterest.doctorkafka.DoctorKafkaMain - DoctorKafka start failed
java.lang.NullPointerException: null
	at com.pinterest.doctorkafka.replicastats.ReplicaStatsManager.getProcessingStartOffsets(ReplicaStatsManager.java:193) ~[doctorkafka-0.2.4.3-jar-with-dependencies.jar:?]
	at com.pinterest.doctorkafka.replicastats.ReplicaStatsManager.readPastReplicaStats(ReplicaStatsManager.java:216) ~[doctorkafka-0.2.4.3-jar-with-dependencies.jar:?]
	at com.pinterest.doctorkafka.DoctorKafka.start(DoctorKafka.java:51) ~[doctorkafka-0.2.4.3-jar-with-dependencies.jar:?]
	at com.pinterest.doctorkafka.DoctorKafkaMain.lambda$run$0(DoctorKafkaMain.java:68) ~[doctorkafka-0.2.4.3-jar-with-dependencies.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_191]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]

@BrianGallew
Copy link
Contributor

BrianGallew commented Jan 23, 2019

If I wrap ReplicaStatsManager.java:193 in a try/except bit, DoctorKafka starts up.

@yuyang08 yuyang08 self-assigned this Jan 23, 2019
@ambud
Copy link
Contributor

ambud commented Jan 23, 2019

@yhzhao could you please check the Kafka message format of the broker stats topic? We are expecting at least 0.10.0 for Dr.Kafka 0.2.4.3 above. Starting 0.2.4.3 we switched to using the Kafka timeindex to find past offsets for broker stats, this helps improve the Dr.Kafka startup performance.
The expected version of Kafka message format 0.10.0 or higher.

From the javadoc of kafkaConsumer.offsetsForTimes()

If the message format version in a partition is before 0.10.0, i.e. the messages do not have timestamps, null will be returned for that partition.

@BrianGallew
Copy link
Contributor

In my case, that has been set to log.message.format.version=0.11.0, so that should be good.

@BrianGallew
Copy link
Contributor

Also, I just set up a brand new doctorkafka installation, all 0.2.4.3, and am having the same issue here. I'm suspecting that the kafkastats daemons are occasionally sending insufficiently complete data that makes doctorkafka sad.

@yuyang08
Copy link
Contributor

yuyang08 commented Jan 25, 2019

@BrianGallew Thanks for the info! We will look into the issue and fix it. cc @ambud

@BrianGallew
Copy link
Contributor

OK, the following diff will reliably 'fix' the problem, but of course it's just a bandage.

    for (Entry<TopicPartition, OffsetAndTimestamp> entry : offsetsForTimes.entrySet()) {
      if (entry.getValue() == null) {
	LOG.info("Faking a zero for {}", entry.getKey());
	partitionMap.put(entry.getKey(), 0L);
      } else {
	partitionMap.put(entry.getKey(), entry.getValue().offset());
      }
    }

I'm not sure, but I think the issue is that partitions which have never been written end up have no offset data, and doctorkafka really wants it to be there.

@BrianGallew
Copy link
Contributor

BrianGallew commented Jan 29, 2019

And, yeah, that's not a diff, that's just lines 192-199 of ReplicaStatsManager.java. Here's an actual diff.

diff --git a/drkafka/src/main/java/com/pinterest/doctorkafka/replicastats/ReplicaStatsManager.java b/drkafka/src/main/java/com/pinterest/doctorkafka/replicastats/ReplicaStatsManager.java
index c680125..eee5b47 100644
--- a/drkafka/src/main/java/com/pinterest/doctorkafka/replicastats/ReplicaStatsManager.java
+++ b/drkafka/src/main/java/com/pinterest/doctorkafka/replicastats/ReplicaStatsManager.java
@@ -192,3 +192,8 @@ public class ReplicaStatsManager {
     for (Entry<TopicPartition, OffsetAndTimestamp> entry : offsetsForTimes.entrySet()) {
-      partitionMap.put(entry.getKey(), entry.getValue().offset());
+      if (entry.getValue() == null) {
+       LOG.info("Faking a zero for {}", entry.getKey());
+       partitionMap.put(entry.getKey(), 0L);
+      } else {
+       partitionMap.put(entry.getKey(), entry.getValue().offset());
+      }
     }

@ambud
Copy link
Contributor

ambud commented Jan 31, 2019

@BrianGallew could you please open a PR with the above diff?

The issue to investigate would be why are there no stats written to certain partitions of the stats topic.

@BrianGallew
Copy link
Contributor

I can (and will!), but are we sure it doesn't just mask a real problem?

@saherahwal
Copy link

I am facing the same problem. Is this issue closed?
Should I just mask the problem as it is in diff above?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants