Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/kafkareceiver] add Avro encoding for logs format #21067

Closed
vincentfree opened this issue Apr 19, 2023 · 21 comments
Closed

[receiver/kafkareceiver] add Avro encoding for logs format #21067

vincentfree opened this issue Apr 19, 2023 · 21 comments

Comments

@vincentfree
Copy link
Contributor

Component(s)

receiver/kafka

Is your feature request related to a problem? Please describe.

Adding Avro as an event format for logs. Apache Avro like Protobuf is a binary transport format, The added benefit this enhancement is event schemas.

This supports the ability to add structured logging to the Kafka receiver.

Kafka natively supports the use of schemas using Avro, this enhancement adds the handling and validation of Avro events using schemas.

Describe the solution you'd like

The kafka receiver should be able to handle the Avro format with schema validation.

The yaml configuration can look like:

kafka/avroLogs:
  topic: avroLogs
  encoding: avro
  brokers:
    - "coffee:123"
    - "foobar:456"
  client_id: otel-collector
  group_id: otel-collector
  avro:
    schema_url: "file:testdata/avro/schema.avro"
    mapping:
      timestamp: timestamp
      properties: resource.attributes.properties
      hostname: resource.attributes.hostname
      count: attributes. Count
      message: body
      nestedRecord: attributes.nestedRecord
      levelEnum: severityText
      severity: severityNumber

The added Avro struct is needed to support describing the mapping between the structured logging fields to open telemetry attributes & resources.

Describe alternatives you've considered

The current encoding options can be viewed as alternatives, at the same time they do not support structured logging so it's not like for like.

Additional context

The current enconding options can

@vincentfree vincentfree added enhancement New feature or request needs triage New item requiring triage labels Apr 19, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@vincentfree
Copy link
Contributor Author

PR: ing-bank#2

@atoulme
Copy link
Contributor

atoulme commented Apr 19, 2023

Will you also contribute an avro marshaler?

@thmshmm
Copy link
Contributor

thmshmm commented Apr 19, 2023

Currently our priority is integrating sources that send logs to Kafka. We plan to add schema registry support for AVRO schemas as well as a json encoder.

@atoulme
Copy link
Contributor

atoulme commented Apr 19, 2023

So I think that you said yes just now. If so, great! As for a schema, absolutely. Please see as an example the file mapping : https://github.com/open-telemetry/opentelemetry-specification/blob/5443e963b6e0d4bbf202896fd165d9349aa4cabb/specification/protocol/file-exporter.md. I recommend you apply your schema to this repository first.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jun 19, 2023
@thmshmm
Copy link
Contributor

thmshmm commented Jun 19, 2023

still relevant, PR updated

@github-actions github-actions bot removed the Stale label Jun 20, 2023
@veedeo
Copy link

veedeo commented Jul 2, 2023

We would like to have exact same feature but for UDP receiver. any reason it was implemented specifically for Kafka, and not as operator, same way as json_parser?

@pavolloffay
Copy link
Member

Is avro chema for OTEL defined somewhere? Or can it be derived from existing OTLP proto?

@thmshmm
Copy link
Contributor

thmshmm commented Jul 7, 2023

@pavolloffay i can't tell. Our issue is not about a static OTel AVRO schema that can be mapped. What we want to achieve is that we can map "any" AVRO schema into an OTel log.

@thmshmm
Copy link
Contributor

thmshmm commented Jul 7, 2023

@veedeo guess it could be implemented as operator, but to be able to use it in the Kafka receiver, the receiver also needs to support operators.

@vincentfree
Copy link
Contributor Author

Is avro chema for OTEL defined somewhere? Or can it be derived from existing OTLP proto?

As @thmshmm said, this feature is about supporting any user defined incoming structured log format and converting it in the collector to the OTLP format so a user can dispatch/export the log to a downstream system.

This means we don't control the incoming AVRO schema, hench we should be able to handle any type of schema.

I'm not sure if I would add an official AVRO schema in the log record format. Though this works I would focus on using the default transport layers for this(http/grpc) the AVRO flow is then expecting any "other" log structure and convert it into the official one?

@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Sep 7, 2023
@thmshmm
Copy link
Contributor

thmshmm commented Oct 13, 2023

Hi @MovieStoreGuy, @pavolloffay,
I created a slightly modified version of the AVRO encoder. I removed the mapping completely to make it more generic. It now uses the same behaviour as the JSON encoder and maps the complete decoded AVRO object and inserts it as the body of the log record. By this, users can then freely decide how to further process the body, e.g. map fields.

ing-bank@7114af2

I will do some further testing but would love to get some early feedback if this is more aligned with your expected behaviour of the kafka receiver.

@github-actions github-actions bot removed the Stale label Oct 14, 2023
@thmshmm
Copy link
Contributor

thmshmm commented Oct 24, 2023

Works as expected in combination with the transform processor.

@dluongiop
Copy link

Works as expected in combination with the transform processor.

@thmshmm I'm closely following this development. This feature seems to closely match what I'm looking for. I have an AVRO encoded log body that I would like to export via the kafka exporter but it seems to be altered in the kafka exporter before being sent to the topic. Would this feature allows for the AVRO encoded log body to be transparently exported if no transform processor is used?

@thmshmm
Copy link
Contributor

thmshmm commented Nov 13, 2023

Hi @dluongiop, you can use the feature without the transform processor for reading from Kafka. But after deserialization of the Avro binary, which happens already in the receiver, the body will be a map of fields and values. As I understand, you want to keep the Avro data (binary encoded?) as body and send it to Kafka with the exporter again so that you can read the log record body as binary again. This is not possible with the feature.

Basically that would also mean you need an even more generic kafka receiver which just reads data, put it into the body and outputs it. No matter if Avro or something else. From the log definition which says the body can be a byte array that should be possible.

@dluongiop
Copy link

Basically that would also mean you need an even more generic kafka receiver which just reads data, put it into the body and outputs it. No matter if Avro or something else. From the log definition which says the body can be a byte array that should be possible.

@thmshmm Thank you for clarifying. You are right that I want the byte array to be unprocessed/untouched leaving the exporter. I thought when the log body is raw encoded, it would be left alone but that's not the case. The bytes are altered with additions of bytes such as \u0000,u0002 mixed in the byte stream. I've not tracked down where that's happening. Anyway my search continues for a receiver/exporter pair that can be used to route data without processing. It's probably not a common usage pattern but would be nice to be able to send everything to the collector and have it sorted out to various destinations.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jan 15, 2024
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 15, 2024
@vincentfree
Copy link
Contributor Author

vincentfree commented Mar 20, 2024

This feature is still quite valuable for us. Is there any way to get some more engagement for this? @pavolloffay or @atoulme, what could we do to get this PR approved? Were currently co maintaining this in a private repo but I still think that it can be valuable as a format that's officially supported through this receiver

dmitryax pushed a commit that referenced this issue Mar 29, 2024
**Description:** Add new component `avrologencodingextension` to be able
to transform AVRO messages into log record body.

As requested in #31077, this is a parallel request to support the same
functionality as reusable encoding extension.

**Link to tracking Issue:** #21067

**Testing:** Unit-testing as well as testing code within the
`kafakreceiver` receiver.

**Documentation:** Added README within the component.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants