kafka-influxdb-sink cannot handle Avro type array anymore #737

afausti · 2021-01-26T23:45:17Z

Issue Guidelines

What version of the Stream Reactor are you reporting this issue for?

2.1.3

Are you running the correct version of Kafka/Confluent for the Stream reactor release?

Yes, with Confluent Kafka 5.5.2

Have you read the docs?

Yes

What is the expected behaviour?

I would expect kafka-influxdb-sink 2.1.3 to support Avro records of type array (see PR #522) In fact I contributed that PR in the past, but I didn't include a test for it :(

I can try to fix this with some guidance.

What was observed?

docker-compose exec schema-registry kafka-avro-console-producer --bootstrap-server broker:29092 --topic foo --property value.schema='{"type":"record", "name":"foo", "fields":[{"name":"bar","type":"string"}, {"name":"baz","type":{"type":"array","items":"float"}}]}'

{"bar": "John Doe","baz": [1,2,3]}

What is your Connect cluster configuration (connect-avro-distributed.properties)?

$docker-compose exec connect cat ./etc/schema-registry/connect-avro-distributed.properties

bootstrap.servers=localhost:9092
group.id=connect-cluster
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
config.storage.topic=connect-configs
offset.storage.topic=connect-offsets
status.storage.topic=connect-statuses
config.storage.replication.factor=1
offset.storage.replication.factor=1
status.storage.replication.factor=1
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
plugin.path=/usr/share/java,/usr/share/confluent-hub-components

What is your connector properties configuration (my-connector.properties)?

docker-compose run kafkaconnect config influxdb-sink
Creating angelofausti_kafkaconnect_run ... done
{
    "connect.influx.db": "mydb",
    "connect.influx.error.policy": "THROW",
    "connect.influx.kcql": "INSERT INTO foo SELECT * FROM foo WITHTIMESTAMP sys_time()",
    "connect.influx.max.retries": "10",
    "connect.influx.password": "",
    "connect.influx.retry.interval": "60000",
    "connect.influx.timestamp": "sys_time()",
    "connect.influx.url": "http://influxdb:8086",
    "connect.influx.username": "-",
    "connect.progress.enabled": "false",
    "connector.class": "com.datamountaineer.streamreactor.connect.influx.InfluxSinkConnector",
    "name": "influxdb-sink",
    "tasks.max": "1",
    "topics": "foo"
}

Please provide full log files (redact and sensitive information)

connect            | [2021-01-26 23:12:40,756] INFO Empty list of records received. (com.datamountaineer.streamreactor.connect.influx.InfluxSinkTask)
connect            | [2021-01-26 23:12:42,631] ERROR Encountered error Can't select field:'baz' because it leads to value:'[1.0, 2.0, 3.0]' (java.util.ArrayList)is 
not a valid type for InfluxDb. (com.datamountaineer.streamreactor.connect.influx.writers.InfluxDbWriter)
connect            | java.lang.RuntimeException: Can't select field:'baz' because it leads to value:'[1.0, 2.0, 3.0]' (java.util.ArrayList)is not a valid type for InfluxDb.
connect            |    at com.datamountaineer.streamreactor.connect.influx.converters.InfluxPoint$.writeField(InfluxPoint.scala:91)
connect            |    at com.datamountaineer.streamreactor.connect.influx.converters.InfluxPoint$.$anonfun$addValuesAndTags$6(InfluxPoint.scala:36)
connect            |    at scala.util.Success.flatMap(Try.scala:251)
connect            |    at com.datamountaineer.streamreactor.connect.influx.converters.InfluxPoint$.$anonfun$addValuesAndTags$5(InfluxPoint.scala:35)
connect            |    at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
connect            |    at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
connect            |    at scala.collection.immutable.List.foldLeft(List.scala:89)
connect            |    at com.datamountaineer.streamreactor.connect.influx.converters.InfluxPoint$.addValuesAndTags(InfluxPoint.scala:34)
connect            |    at com.datamountaineer.streamreactor.connect.influx.converters.InfluxPoint$.$anonfun$build$6(InfluxPoint.scala:23)
connect            |    at scala.util.Success.flatMap(Try.scala:251)
connect            |    at com.datamountaineer.streamreactor.connect.influx.converters.InfluxPoint$.build(InfluxPoint.scala:20)
connect            |    at com.datamountaineer.streamreactor.connect.influx.writers.InfluxBatchPointsBuilder.$anonfun$build$4(InfluxBatchPointsBuilder.scala:94)
connect            |    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:273)
connect            |    at scala.collection.Iterator.foreach(Iterator.scala:943)
connect            |    at scala.collection.Iterator.foreach$(Iterator.scala:943)
connect            |    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
connect            |    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
connect            |    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
connect            |    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
connect            |    at scala.collection.TraversableLike.map(TraversableLike.scala:273)
connect            |    at scala.collection.TraversableLike.map$(TraversableLike.scala:266)
connect            |    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
connect            |    at com.datamountaineer.streamreactor.connect.influx.writers.InfluxBatchPointsBuilder.$anonfun$build$3(InfluxBatchPointsBuilder.scala:94)
connect            |    at scala.Option.map(Option.scala:230)
connect            |    at com.datamountaineer.streamreactor.connect.influx.writers.InfluxBatchPointsBuilder.$anonfun$build$2(InfluxBatchPointsBuilder.scala:94)
connect            |    at scala.util.Success.flatMap(Try.scala:251)
connect            |    at com.datamountaineer.streamreactor.connect.influx.writers.InfluxBatchPointsBuilder.$anonfun$build$1(InfluxBatchPointsBuilder.scala:91)
connect            |    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:273)
connect            |    at scala.collection.Iterator.foreach(Iterator.scala:943)
connect            |    at scala.collection.Iterator.foreach$(Iterator.scala:943)
connect            |    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
connect            |    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
connect            |    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
connect            |    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
connect            |    at scala.collection.TraversableLike.map(TraversableLike.scala:273)
connect            |    at scala.collection.TraversableLike.map$(TraversableLike.scala:266)
connect            |    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
connect            |    at com.datamountaineer.streamreactor.connect.influx.writers.InfluxBatchPointsBuilder.build(InfluxBatchPointsBuilder.scala:88)
connect            |    at com.datamountaineer.streamreactor.connect.influx.writers.InfluxDbWriter.write(InfluxDbWriter.scala:45)
connect            |    at com.datamountaineer.streamreactor.connect.influx.InfluxSinkTask.$anonfun$put$2(InfluxSinkTask.scala:77)
connect            |    at com.datamountaineer.streamreactor.connect.influx.InfluxSinkTask.$anonfun$put$2$adapted(InfluxSinkTask.scala:77)
connect            |    at scala.Option.foreach(Option.scala:407)
connect            |    at com.datamountaineer.streamreactor.connect.influx.InfluxSinkTask.put(InfluxSinkTask.scala:77)
connect            |    at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:549)
connect            |    at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:329)
connect            |    at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)
connect            |    at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:204)
connect            |    at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:185)
connect            |    at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:235)
connect            |    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
connect            |    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
connect            |    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
connect            |    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
connect            |    at java.lang.Thread.run(Thread.java:748)

The text was updated successfully, but these errors were encountered:

stheppi · 2021-01-28T11:47:39Z

Hi @afausti ,
InfluxDb client linked does not have an API for adding an Array as a point.

This code was not touched (even if you look at the history).

stream-reactor/kafka-connect-influxdb/src/main/scala/com/datamountaineer/streamreactor/connect/influx/converters/InfluxPoint.scala

Line 78 in 8c3239b

    
           def writeField(builder: Point.Builder)(field: String, v: Any): Try[Point.Builder] = v match {

So I am not sure when the influxdb sink ever supported inserting an array.

I see indeed you added the point handling at higher level when array support were added. So it's best to inject the handling lower in the code i shared. We will add it back

afausti · 2021-01-28T16:19:18Z

@stheppi nice, thank you, yes the solution in that PR was to extract the elements of the array and write them as new fields in InfluxDB. It used to work like this:

docker-compose exec schema-registry kafka-avro-console-producer --bootstrap-server broker:29092 --topic foo --property value.schema='{"type":"record", "name":"foo", "fields":[{"name":"bar","type":"string"}, {"name":"baz","type":{"type":"array","items":"float"}}]}'
{"bar": "John Doe","baz": [1.0,2.0,3.0]}
Ctrl+D

docker-compose exec influxdb influx -database mydb -execute "SELECT * FROM foo"
name: foo
time                bar      baz0 baz1 baz2
----                ---      ---- ---- ----
1611707507555316950 John Doe 1.0  2.0  3.0

afausti · 2021-02-13T00:03:48Z

Please consider adding a test to ensure that the connector can handle arrays with NaN values. An array like baz=[1.0, NaN, 3.0] should create the field set baz0=1.0, baz2=3.0 in InfluxDB. According to PR #734 influxdb-java is expected to skip fields with NaN values before writing to InfluxDB.

afausti · 2021-05-26T23:16:44Z

@stheppi just checking if there plans on adding this feature back. Thank you!

stheppi · 2021-06-04T11:11:45Z

@afausti yes, we will be working to add it back

afausti · 2021-06-08T19:45:09Z

I can confirm that the new array handling implementation works fine for me. I've built kafka-connect-influxdb from master and followed these steps produce an Avro encoded message with an array and verified that it is flattened in InfluxDB.

This issue can be closed.

andrewstevenson · 2021-06-10T08:54:26Z

@afausti thanks!

afausti mentioned this issue Jun 8, 2021

Added array handling support for influxdb connector #785

Merged

andrewstevenson closed this as completed Jun 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kafka-influxdb-sink cannot handle Avro type array anymore #737

kafka-influxdb-sink cannot handle Avro type array anymore #737

afausti commented Jan 26, 2021

stheppi commented Jan 28, 2021 •

edited

Loading

afausti commented Jan 28, 2021 •

edited

Loading

afausti commented Feb 13, 2021

afausti commented May 26, 2021

stheppi commented Jun 4, 2021

afausti commented Jun 8, 2021 •

edited

Loading

andrewstevenson commented Jun 10, 2021

kafka-influxdb-sink cannot handle Avro type array anymore #737

kafka-influxdb-sink cannot handle Avro type array anymore #737

Comments

afausti commented Jan 26, 2021

Issue Guidelines

What version of the Stream Reactor are you reporting this issue for?

Are you running the correct version of Kafka/Confluent for the Stream reactor release?

Have you read the docs?

What is the expected behaviour?

What was observed?

What is your Connect cluster configuration (connect-avro-distributed.properties)?

What is your connector properties configuration (my-connector.properties)?

Please provide full log files (redact and sensitive information)

stheppi commented Jan 28, 2021 • edited Loading

afausti commented Jan 28, 2021 • edited Loading

afausti commented Feb 13, 2021

afausti commented May 26, 2021

stheppi commented Jun 4, 2021

afausti commented Jun 8, 2021 • edited Loading

andrewstevenson commented Jun 10, 2021

stheppi commented Jan 28, 2021 •

edited

Loading

afausti commented Jan 28, 2021 •

edited

Loading

afausti commented Jun 8, 2021 •

edited

Loading