Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Kafka flow export in Flow Aggregator #2466

Closed
wants to merge 2 commits into from

Conversation

srikartati
Copy link
Member

@srikartati srikartati commented Jul 24, 2021

This commit supports Kafka flow export in Flow Aggregator
by adding Kafka producer. Given a Kafka broker, Kafka producer
converts the aggregated flow record to a Kafka flow message
and send it to the Kafka broker, which will inturn be sent to
the Kafka consumer, where it could get decoded. Our Kafka producer
and consumer are dependent on sarama Kafka library.

Our Kafka producer in Flow Aggregator has the capability to support
multiple proto schema. However, we are currently supporting one proto
schema, which is named as "AntreaFlowMsg".

We add e2e tests by utilizing the Kafka flow collector provided
in go-ipfix repo. Kafka flow collector is a combination of a lean
Kafka broker and Kafka consumer.

Signed-off-by: Srikar Tati [email protected]

@srikartati srikartati marked this pull request as draft July 24, 2021 05:06
@lgtm-com
Copy link

lgtm-com bot commented Jul 24, 2021

This pull request introduces 1 alert when merging 308835b into fb28c8e - view on LGTM.com

new alerts:

  • 1 for Size computation for allocation may overflow

@srikartati srikartati force-pushed the kafka_export branch 2 times, most recently from 13475b3 to 6dcc491 Compare July 26, 2021 21:19
@lgtm-com
Copy link

lgtm-com bot commented Jul 26, 2021

This pull request introduces 1 alert when merging 6dcc491 into cdc8453 - view on LGTM.com

new alerts:

  • 1 for Size computation for allocation may overflow

@lgtm-com
Copy link

lgtm-com bot commented Jul 27, 2021

This pull request introduces 1 alert when merging 33988d7 into cdc8453 - view on LGTM.com

new alerts:

  • 1 for Size computation for allocation may overflow

@lgtm-com
Copy link

lgtm-com bot commented Jul 28, 2021

This pull request introduces 1 alert when merging 93a6667 into 6f33da3 - view on LGTM.com

new alerts:

  • 1 for Size computation for allocation may overflow

@lgtm-com
Copy link

lgtm-com bot commented Aug 18, 2021

This pull request introduces 1 alert when merging 26e399f into 7f90fc0 - view on LGTM.com

new alerts:

  • 1 for Size computation for allocation may overflow

@srikartati srikartati marked this pull request as ready for review August 18, 2021 19:36
@srikartati
Copy link
Member Author

/test-all

@srikartati srikartati requested review from antoninbas and zyiou August 18, 2021 19:36
@antrea-io antrea-io deleted a comment from lgtm-com bot Aug 18, 2021
@lgtm-com
Copy link

lgtm-com bot commented Aug 18, 2021

This pull request introduces 1 alert when merging 0d86a27 into ea26e6a - view on LGTM.com

new alerts:

  • 1 for Size computation for allocation may overflow

@codecov-commenter
Copy link

codecov-commenter commented Aug 18, 2021

Codecov Report

Merging #2466 (dd94b3d) into main (8d228c1) will decrease coverage by 20.58%.
The diff coverage is 4.76%.

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #2466       +/-   ##
===========================================
- Coverage   61.43%   40.84%   -20.59%     
===========================================
  Files         284      158      -126     
  Lines       23470    19557     -3913     
===========================================
- Hits        14418     7989     -6429     
- Misses       7504    10818     +3314     
+ Partials     1548      750      -798     
Flag Coverage Δ
kind-e2e-tests ?
unit-tests 40.84% <4.76%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/flowaggregator/flowaggregator.go 19.46% <4.76%> (-49.83%) ⬇️
pkg/ovs/openflow/default.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/util/runtime/runtime.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/agent/cniserver/pod_configuration_linux.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/ovs/openflow/logs.go 9.52% <0.00%> (-90.48%) ⬇️
pkg/apis/controlplane/register.go 0.00% <0.00%> (-90.00%) ⬇️
pkg/agent/nodeportlocal/k8s/annotations.go 0.00% <0.00%> (-83.88%) ⬇️
pkg/agent/agent_linux.go 0.00% <0.00%> (-80.00%) ⬇️
pkg/agent/client.go 0.00% <0.00%> (-77.42%) ⬇️
pkg/ovs/ovsconfig/ovs_client_linux.go 0.00% <0.00%> (-76.93%) ⬇️
... and 231 more

@srikartati srikartati requested a review from dreamtalen August 18, 2021 20:30
Comment on lines 170 to 177
# Provide the Kafka broker address to send the aggregated flow records in the
# given protobuf format. Kafka flow export is only enabled if the broker address
# is provided.
#kafkaBrokerAddress: ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support multiple broker addresses here? if so, we can specify the format (e.g. string divided by comma)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the plan is to support a single broker. In the future, when we have the requirement of multiple brokers, we can look at enhancing this configmap parameter.

# of multiple proto formats, this config parameter could take in multiple values.
#kafkaProtoSchema: "AntreaFlowMsg"

# Provide the topic name on Kafka Broker, where the flow messages are intended
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broker -> broker to keep consistent with other usage in description.

@@ -22,7 +22,7 @@ ANTREA_ROOT="$( cd "$( dirname "${BASH_SOURCE[0]}" )/../" && pwd )"
IMAGE_NAME="antrea/codegen:kubernetes-1.21.0"

function docker_run() {
docker pull ${IMAGE_NAME}
#docker pull ${IMAGE_NAME}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment by mistake?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using a local image for codegen with new verison of protoc-gen-go. Therefore, commenting this when running codegen on local machine. Once the image on antrea repo on docker registry is updated (before merging this PR), I will remove this.

@@ -1,29 +1,43 @@
// Copyright 2019 Antrea Authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are file changes of cni related to Kafka flow export support for flow aggregator? If not, I feel this should be put in a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because of version change in protoc-gen-go. All this is generated code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @zyiou (left another comment above). The Protobuf version upgrade should be handled in a separate PR, even if the upgrade is only motivated by this change.

pkg/flowaggregator/flowaggregator.go Show resolved Hide resolved
@srikartati
Copy link
Member Author

Tagging this as v1.3. Will revisit the milestone after one round of reviews if needed.

@lgtm-com
Copy link

lgtm-com bot commented Aug 23, 2021

This pull request introduces 1 alert when merging 6266f94 into 4aaa90e - view on LGTM.com

new alerts:

  • 1 for Size computation for allocation may overflow

build/yamls/flow-aggregator.yml Outdated Show resolved Hide resolved
Comment on lines +49 to +59
CACertFile = "ca.crt"
TLSCertFile = "tls.crt"
TLSKeyFile = "tls.key"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to clarify, ca.crt is to authenticate the Kafka broker, while tls.crt and tls.key are to authenticate the aggregator to the broker?

maybe this should be covered by comments

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two different options. Added comments.

pkg/agent/cniserver/server_test.go Outdated Show resolved Hide resolved
@@ -1,29 +1,43 @@
// Copyright 2019 Antrea Authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @zyiou (left another comment above). The Protobuf version upgrade should be handled in a separate PR, even if the upgrade is only motivated by this change.

Comment on lines 420 to 424
klog.Errorf("Error when initializing kafka producer: %v, will retry in %s", err, fa.activeFlowRecordTimeout)
} else {
klog.Info("Initialized kafka producer")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use structured logging for all new log messages

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@antrea-io antrea-io deleted a comment from lgtm-com bot Aug 25, 2021
@antoninbas
Copy link
Contributor

Switching milestone to v1.4 after talking with Srikar.

@srikartati srikartati force-pushed the kafka_export branch 2 times, most recently from 413db08 to 3ead296 Compare August 31, 2021 00:27
@srikartati srikartati force-pushed the kafka_export branch 2 times, most recently from 8d926fd to b855a0e Compare September 22, 2021 23:06
@srikartati
Copy link
Member Author

Thanks @antoninbas for comments. Addressed them. Please take a look at this PR again.

Comment on lines 177 to 196
# Provide the Kafka broker address to send the aggregated flow records in the
# given protobuf format. Kafka flow export is only enabled if the broker address
# is provided.
#kafkaBrokerAddress: ""

# Provide the Kafka proto schema name. Currently, Flow Aggregator supports only
# one Antrea proto format called AntreaFlowMsg. In the future, with support
# of multiple proto formats, this config parameter could take in multiple values.
#kafkaProtoSchema: "AntreaFlowMsg"

# Provide the topic name on Kafka Broker, where the flow messages are intended
# to be received.
#kafkaBrokerTopic: "AntreaTopic"

# Enable TLS communication between the Kafka producer and the Kafka broker.
#kafkaTLSEnable: false

# Enable or disable the TLS certificate verification. This is to support the
# Kafka TLS certs that are either self-signed, mis-configured or expired.
#kafkaTLSSkipVerify: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we need to repeat the kafka prefix for all sub-parameters

# KafkaParams contains all configuration parameters required to export flow
# records to a Kafka broker.
kafkaParams:
# Provide the Kafka broker address to send the aggregated flow records in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide the Kafka broker address to which the aggregated flow records will be sent, in the given Protobuf format

Comment on lines 182 to 185
# Provide the Kafka proto schema name. Currently, Flow Aggregator supports only
# one Antrea proto format called AntreaFlowMsg. In the future, with support
# of multiple proto formats, this config parameter could take in multiple values.
#kafkaProtoSchema: "AntreaFlowMsg"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/proto/Protobuf

- name: flow-aggregator-kafka-tls
secret:
secretName: flow-aggregator-kafka-tls
defaultMode: 0400
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice we use 256 for the antrea controller, which is the same value but in decimal
it would be good to harmonize: 0400 makes more sense to me, so not sure why we are using 256 elsewhere
should be in a follow-up PR though

Comment on lines 132 to 134
"projects.registry.vmware.com/antrea/ipfix-collector:v0.5.7" \
"projects.registry.vmware.com/antrea/wireguard-go:0.0.20210424")
FLOW_VISIBILITY_IMAGES_LIST=("projects.registry.vmware.com/antrea/ipfix-collector:latest" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ipfix-collector listed twice with different tags?
I think we should always specify a released version. Even if it means more frequent updates, the tests are less likely to break.

Comment on lines +127 to +147
caCertPath = path.Join(certDir, CACertFile)
tlsCertPath = path.Join(certDir, TLSCertFile)
tlsKeyPath = path.Join(certDir, TLSKeyFile)
// The secret may be created after the Pod is created, for example, when cert-manager is used the secret
// is created asynchronously. It waits for a while before it's considered to be failed.
if err = wait.PollImmediate(2*time.Second, certReadyTimeout, func() (bool, error) {
for _, path := range []string{caCertPath, tlsCertPath, tlsKeyPath} {
f, err := os.Open(path)
if err != nil {
klog.Warningf("Couldn't read %s when applying the kafka TLS certificate, retrying", path)
return false, nil
}
f.Close()
}
return true, nil
}); err != nil {
return fmt.Errorf("error reading Kafka TLS CA cert (%s), cert (%s), and key (%s) files present at \"%s\"", CACertFile, TLSCertFile, TLSKeyFile, certDir)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no reload mechanism for the certificates? what we do for the antrea-controller's apiserver I think is that the certificates are reloaded from file periodically and changes are detected

Comment on lines +28 to +31
const (
AntreaFlowMsg string = "AntreaFlowMsg"
AntreaTopic string = "AntreaTopic"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't make it clear what the constant represents when you use it

if possible we should have somthing like this

type ProtobufSchema string
const (
    ProtobufSchemaAntreaFlowMsg ProtobufSchema = "AntreaFlowMsg"
)

if fa.kafkaProducer.GetSaramaProducer() == nil {
err := fa.kafkaProducer.InitSaramaProducer()
if err != nil {
klog.ErrorS(err, "Error when initializing kafka producer", "retry timeout", fa.activeFlowRecordTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uniformize
s/kafka/Kafka

Comment on lines +81 to +86
klog.Warningf("Do not expect source IP: %v to be filled already", flowMsg.SrcIP)
}
flowMsg.SrcIP = ie.Value.(net.IP).String()
case "destinationIPv4Address", "destinationIPv6Address":
if flowMsg.DstIP != "" {
klog.Warningf("Do not expect destination IP: %v to be filled already", flowMsg.DstIP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as part of the transition to structured logging, I don't think we use klog.Warningf, but just klog.InfoS. @tnqn is that correct?

@@ -0,0 +1,66 @@
// Copyright 2021 VMware, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the proto file be versioned if you make breaking changes in the future?

// given protobuf format. Kafka flow export is only enabled
// if the broker address is provided.
// Defaults to ""
BrokerAddress string `yaml:"kafkaBrokerAddress,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field names in this file are not consistent with those defined in conf file. I think that's why the e2e tests do not pass.

Comment on lines +32 to +33
uint64 TimeFlowStartInMilliSecs = 27;
uint32 TimeFlowEndInMilliSecs = 28;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed that the data formats are different. We can remove these as they are not used in our record.

@tnqn
Copy link
Member

tnqn commented Oct 28, 2021

@srikartati Do we still target v1.4 for this PR?

@srikartati srikartati removed this from the Antrea v1.4 release milestone Oct 28, 2021
@srikartati
Copy link
Member Author

@srikartati Do we still target v1.4 for this PR?

We have been working on a different pipeline, so halted this PR. Removed the release milestone.

This commit supports Kafka flow export in Flow Aggregator
by adding Kafka producer. Given a Kafka broker, Kafka producer
converts the aggregated flow record to a Kafka flow message
and send it to the Kafka broker, which will inturn be sent to
the Kafka consumer, where it could get decoded. Our Kafka producer
and consumer are dependent on sarama Kafka library.

Our Kafka producer in Flow Aggregator has the capability to support
multiple proto schema. However, we are currently supporting one proto
schema, which is named as "AntreaFlowMsg".

We add e2e tests by utilizing the Kafka flow collector provided
in go-ipfix repo. Kafka flow collector is a combination of a lean
Kafka broker and Kafka consumer.

Signed-off-by: Srikar Tati <[email protected]>
@srikartati
Copy link
Member Author

Thanks, @zyiou for agreeing to take over this PR. I provided you the write access to my fork.
If that doesn't work, please feel free to open a separate PR.

@srikartati srikartati closed this Jan 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants