Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-18646: Null records in fetch response breaks librdkafka #18726

Merged
merged 3 commits into from
Jan 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,8 @@ public static FetchResponseData.PartitionData partitionResponse(int partition, E
return new FetchResponseData.PartitionData()
.setPartitionIndex(partition)
.setErrorCode(error.code())
.setHighWatermark(FetchResponse.INVALID_HIGH_WATERMARK);
.setHighWatermark(FetchResponse.INVALID_HIGH_WATERMARK)
.setRecords(MemoryRecords.EMPTY);
}

/**
Expand Down Expand Up @@ -285,4 +286,4 @@ private static FetchResponseData toMessage(Errors error,
.setSessionId(sessionId)
.setResponses(topicResponseList);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@
]},
{ "name": "PreferredReadReplica", "type": "int32", "versions": "11+", "default": "-1", "ignorable": false, "entityType": "brokerId",
"about": "The preferred read replica for the consumer to use on its next fetch request."},
{ "name": "Records", "type": "records", "versions": "0+", "nullableVersions": "0+", "about": "The record data."}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing nullableVersions would prevent Kafka from "reading" null records. This could potentially pose a risk. Therefore, we might consider a strategy where Kafka strictly enforces that null records are never "written" while still maintaining the ability to "read" them.

In summary, I favor @mumrah's suggestion of adding a null check. Furthermore, we should include a comment explaining the rationale behind preventing Kafka from writing null records.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate on the risk - what is the exact use case where that would happen? I couldn't come up with one.

Copy link
Member Author

@ijuma ijuma Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that released versions of Kafka never return null records - if that ever happened, it would have broken librdkafka. Same for any other implementation of the Kafka protocol.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate on the risk - what is the exact use case where that would happen? I couldn't come up with one.

I believe FetchResponse.json should be considered part of the public interface. Consequently, modifying the "released spec" carries inherent risks. We cannot guarantee that no external implementations adhere to our specification. For instance, other server implementations might return null records, and after this PR, our 4.0 client would no longer be able to read them.

Keep in mind that released versions of Kafka never return null records

that is true and it does not violate the spec, right? I mean "apache kafka can never return null even though the spec says it is valid to return null"

Copy link
Member Author

@ijuma ijuma Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, the actual spec here is that this field can never be null. As I said, there is no Kafka implementation that doesn't work with librdkafka and related clients - it constitutes 30-50% of Kafka clients in the wild.

What happened here is that the spec definition was actually incorrect - it specified the field as nullable even though it couldn't be. And that bug meant we accidentally broke a large part of the ecosystem. I don't see the benefit in not fixing the spec.

Again, let's discuss a concrete use case - you're worried that the Java consumer might break because there is an unknown Kafka broker that implements this differently?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, let's discuss a concrete use case - you're worried that the Java consumer might break because there is an unknown Kafka broker that implements this differently?

I have no evidence that there is a broker implementation that returns null records. If librdkafka and related clients constitute 30-50% of clients, it is acceptable to fix the specification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this behavior in librdkafka has existed for a long time, and we have not seen an issue with null records prior to the recent changes, I think it is safe to say we have never returned null records.

I think fixing the protocol spec makes sense.

{ "name": "Records", "type": "records", "versions": "0+", "about": "The record data."}
]}
]},
{ "name": "NodeEndpoints", "type": "[]NodeEndpoint", "versions": "16+", "taggedVersions": "16+", "tag": 0,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2024,7 +2024,8 @@ private FetchResponse createFetchResponse(int sessionId) {
.setPartitionIndex(1)
.setHighWatermark(1000000)
.setLogStartOffset(0)
.setAbortedTransactions(abortedTransactions));
.setAbortedTransactions(abortedTransactions)
.setRecords(MemoryRecords.EMPTY));
return FetchResponse.parse(FetchResponse.of(Errors.NONE, 25, sessionId,
responseData).serialize(FETCH.latestVersion()), FETCH.latestVersion());
}
Expand All @@ -2048,7 +2049,8 @@ private FetchResponse createFetchResponse(boolean includeAborted) {
.setPartitionIndex(1)
.setHighWatermark(1000000)
.setLogStartOffset(0)
.setAbortedTransactions(abortedTransactions));
.setAbortedTransactions(abortedTransactions)
.setRecords(MemoryRecords.EMPTY));
return FetchResponse.parse(FetchResponse.of(Errors.NONE, 25, INVALID_SESSION_ID,
responseData).serialize(FETCH.latestVersion()), FETCH.latestVersion());
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ import org.apache.kafka.common.message.JoinGroupRequestData.JoinGroupRequestProt
import org.apache.kafka.common.message.LeaveGroupRequestData.MemberIdentity
import org.apache.kafka.common.message.ListOffsetsRequestData.{ListOffsetsPartition, ListOffsetsTopic}
import org.apache.kafka.common.message.OffsetForLeaderEpochRequestData.{OffsetForLeaderPartition, OffsetForLeaderTopic, OffsetForLeaderTopicCollection}
import org.apache.kafka.common.message.{AddOffsetsToTxnRequestData, AlterPartitionReassignmentsRequestData, AlterReplicaLogDirsRequestData, ConsumerGroupDescribeRequestData, ConsumerGroupHeartbeatRequestData, CreateAclsRequestData, CreatePartitionsRequestData, CreateTopicsRequestData, DeleteAclsRequestData, DeleteGroupsRequestData, DeleteRecordsRequestData, DeleteTopicsRequestData, DescribeClusterRequestData, DescribeConfigsRequestData, DescribeGroupsRequestData, DescribeLogDirsRequestData, DescribeProducersRequestData, DescribeTransactionsRequestData, FindCoordinatorRequestData, HeartbeatRequestData, IncrementalAlterConfigsRequestData, JoinGroupRequestData, ListPartitionReassignmentsRequestData, ListTransactionsRequestData, MetadataRequestData, OffsetCommitRequestData, ProduceRequestData, SyncGroupRequestData, WriteTxnMarkersRequestData}
import org.apache.kafka.common.message.{AddOffsetsToTxnRequestData, AlterPartitionReassignmentsRequestData, AlterReplicaLogDirsRequestData, ConsumerGroupDescribeRequestData, ConsumerGroupHeartbeatRequestData, CreateAclsRequestData, CreatePartitionsRequestData, CreateTopicsRequestData, DeleteAclsRequestData, DeleteGroupsRequestData, DeleteRecordsRequestData, DeleteTopicsRequestData, DescribeClusterRequestData, DescribeConfigsRequestData, DescribeGroupsRequestData, DescribeLogDirsRequestData, DescribeProducersRequestData, DescribeTransactionsRequestData, FetchResponseData, FindCoordinatorRequestData, HeartbeatRequestData, IncrementalAlterConfigsRequestData, JoinGroupRequestData, ListPartitionReassignmentsRequestData, ListTransactionsRequestData, MetadataRequestData, OffsetCommitRequestData, ProduceRequestData, SyncGroupRequestData, WriteTxnMarkersRequestData}
import org.apache.kafka.common.protocol.{ApiKeys, Errors}
import org.apache.kafka.common.record.{MemoryRecords, RecordBatch, SimpleRecord}
import org.apache.kafka.common.requests.OffsetFetchResponse.PartitionData
Expand All @@ -59,6 +59,7 @@ import java.util.Collections.singletonList
import org.apache.kafka.common.message.MetadataRequestData.MetadataRequestTopic
import org.apache.kafka.common.message.WriteTxnMarkersRequestData.{WritableTxnMarker, WritableTxnMarkerTopic}
import org.apache.kafka.coordinator.group.GroupConfig
import org.junit.jupiter.api.Test
import org.junit.jupiter.api.function.Executable

import scala.collection.mutable
Expand Down Expand Up @@ -808,6 +809,34 @@ class AuthorizerIntegrationTest extends AbstractAuthorizerIntegrationTest {
sendRequestAndVerifyResponseError(request, resources, isAuthorized = true)
}

@Test
def testFetchConsumerRequest(): Unit = {
createTopicWithBrokerPrincipal(topic)

val request = createFetchRequest
val topicNames = getTopicNames().asJava

def partitionDatas(response: AbstractResponse): Iterable[FetchResponseData.PartitionData] = {
assertTrue(response.isInstanceOf[FetchResponse])
response.asInstanceOf[FetchResponse].responseData(topicNames, ApiKeys.FETCH.latestVersion).values().asScala
}

removeAllClientAcls()
val resources = Set(topicResource.resourceType, clusterResource.resourceType)
val failedResponse = sendRequestAndVerifyResponseError(request, resources, isAuthorized = false)
val failedPartitionDatas = partitionDatas(failedResponse)
assertEquals(1, failedPartitionDatas.size)
// Some clients (like librdkafka) always expect non-null records - even for the cases where an error is returned
failedPartitionDatas.foreach(partitionData => assertEquals(MemoryRecords.EMPTY, partitionData.records))

val readAcls = topicReadAcl(topicResource)
addAndVerifyAcls(readAcls, topicResource)
val succeededResponse = sendRequestAndVerifyResponseError(request, resources, isAuthorized = true)
val succeededPartitionDatas = partitionDatas(succeededResponse)
assertEquals(1, succeededPartitionDatas.size)
succeededPartitionDatas.foreach(partitionData => assertEquals(MemoryRecords.EMPTY, partitionData.records))
}

@ParameterizedTest
@ValueSource(strings = Array("kraft"))
def testIncrementalAlterConfigsRequestRequiresClusterPermissionForBrokerLogger(quorum: String): Unit = {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1802,6 +1802,8 @@ FetchResponseData divergingFetchResponse(
partitionData.divergingEpoch()
.setEpoch(divergingEpoch)
.setEndOffset(divergingEpochEndOffset);

partitionData.setRecords(MemoryRecords.EMPTY);
}
);
}
Expand Down Expand Up @@ -1830,6 +1832,8 @@ FetchResponseData snapshotFetchResponse(
partitionData.snapshotId()
.setEpoch(snapshotId.epoch())
.setEndOffset(snapshotId.offset());

partitionData.setRecords(MemoryRecords.EMPTY);
}
);
}
Expand Down
Loading