Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

group.min.session.timeout.ms is null using zilla in front of redpanda #581

Closed
vordimous opened this issue Nov 17, 2023 · 1 comment
Closed
Labels
bug Something isn't working

Comments

@vordimous
Copy link
Contributor

Describe the bug
the group.min.session.timeout.ms config option is null instead of a string value

Stacktrace:

org.agrona.concurrent.AgentTerminationException: java.lang.NumberFormatException: Cannot parse null string
    at io.aklivity.zilla.runtime.engine/io.aklivity.zilla.runtime.engine.internal.registry.DispatchAgent.doWork(DispatchAgent.java:707)
    at org.agrona.core/org.agrona.concurrent.AgentRunner.doDutyCycle(AgentRunner.java:291)
    at org.agrona.core/org.agrona.concurrent.AgentRunner.run(AgentRunner.java:164)
    at java.base/java.lang.Thread.run(Thread.java:1623)
Caused by: java.lang.NumberFormatException: Cannot parse null string
    at java.base/java.lang.Integer.parseInt(Integer.java:627)
    at java.base/java.lang.Integer.parseInt(Integer.java:781)
    at io.aklivity.zilla.runtime.binding.kafka/io.aklivity.zilla.runtime.binding.kafka.internal.stream.KafkaClientGroupFactory$DescribeClient.onDecodeDescribeResponse(KafkaClientGroupFactory.java:3055)
    at io.aklivity.zilla.runtime.binding.kafka/io.aklivity.zilla.runtime.binding.kafka.internal.stream.KafkaClientGroupFactory.decodeDescribeResponse(KafkaClientGroupFactory.java:789)
    at io.aklivity.zilla.runtime.binding.kafka/io.aklivity.zilla.runtime.binding.kafka.internal.stream.KafkaClientGroupFactory$DescribeClient.decodeNetwork(KafkaClientGroupFactory.java:2926)
    at io.aklivity.zilla.runtime.binding.kafka/io.aklivity.zilla.runtime.binding.kafka.internal.stream.KafkaClientGroupFactory$DescribeClient.onNetworkData(KafkaClientGroupFactory.java:2562)
    at io.aklivity.zilla.runtime.binding.kafka/io.aklivity.zilla.runtime.binding.kafka.internal.stream.KafkaClientGroupFactory$DescribeClient.onNetwork(KafkaClientGroupFactory.java:2477)
    at io.aklivity.zilla.runtime.binding.kafka/io.aklivity.zilla.runtime.binding.kafka.internal.stream.KafkaClientConnectionPool.doData(KafkaClientConnectionPool.java:308)
    at io.aklivity.zilla.runtime.binding.kafka/io.aklivity.zilla.runtime.binding.kafka.internal.stream.KafkaClientConnectionPool$KafkaClientStream.doStreamData(KafkaClientConnectionPool.java:841)
    at io.aklivity.zilla.runtime.binding.kafka/io.aklivity.zilla.runtime.binding.kafka.internal.stream.KafkaClientConnectionPool$KafkaClientConnection.onConnectionData(KafkaClientConnectionPool.java:1440)
    at io.aklivity.zilla.runtime.binding.kafka/io.aklivity.zilla.runtime.binding.kafka.internal.stream.KafkaClientConnectionPool$KafkaClientConnection.onConnectionMessage(KafkaClientConnectionPool.java:1363)
    at io.aklivity.zilla.runtime.engine/io.aklivity.zilla.runtime.engine.internal.registry.DispatchAgent.handleReadReply(DispatchAgent.java:1244)
    at io.aklivity.zilla.runtime.engine/io.aklivity.zilla.runtime.engine.internal.registry.DispatchAgent.handleRead(DispatchAgent.java:1045)
    at io.aklivity.zilla.runtime.engine/io.aklivity.zilla.runtime.engine.internal.concurent.ManyToOneRingBuffer.read(ManyToOneRingBuffer.java:181)
    at io.aklivity.zilla.runtime.engine/io.aklivity.zilla.runtime.engine.internal.registry.DispatchAgent.doWork(DispatchAgent.java:701)
    ... 3 more
    Suppressed: java.lang.Exception: [engine/data#3]        [0x03030000000000a4] streams=[consumeAt=0x000125e0 (0x00000000000125e0), produceAt=0x000125e0 (0x00000000000125e0)]
            at io.aklivity.zilla.runtime.engine/io.aklivity.zilla.runtime.engine.internal.registry.DispatchAgent.doWork(DispatchAgent.java:705)
            ... 3 more

To Reproduce
Steps to reproduce the behavior:

  1. run the taxi-demo
  2. zilla will crash restart loop
@vordimous
Copy link
Contributor Author

fetching http://127.0.0.1:19644/v1/cluster_config?include_defaults=true does have the property defined:

{
  "cpu_profiler_enabled": false,
  "kafka_memory_batch_size_estimate_for_fetch": 1048576,
  "kafka_memory_share_for_fetch": 0.5,
  "legacy_unsafe_log_warning_interval_sec": 300,
  "kafka_throughput_controlled_api_keys": [
    "produce",
    "fetch"
  ],
  "kafka_quota_balancer_min_shard_throughput_ratio": 0.01,
  "kafka_quota_balancer_node_period_ms": 750,
  "kafka_quota_balancer_window_ms": 5000,
  "controller_log_accummulation_rps_capacity_move_operations": null,
  "controller_log_accummulation_rps_capacity_acls_and_users_operations": null,
  "controller_log_accummulation_rps_capacity_topic_operations": null,
  "rps_limit_topic_operations": 1000,
  "enable_controller_log_rate_limiting": false,
  "metrics_reporter_report_interval": 86400000,
  "enable_metrics_reporter": true,
  "storage_strict_data_init": false,
  "storage_space_alert_free_threshold_percent": 5,
  "health_monitor_max_metadata_age": 10000,
  "health_manager_tick_interval": 180000,
  "internal_topic_replication_factor": 3,
  "leader_balancer_mute_timeout": 300000,
  "leader_balancer_mode": "random_hill_climbing",
  "partition_autobalancing_node_availability_timeout_sec": 900,
  "zstd_decompress_workspace_bytes": 8388608,
  "kafka_qdc_max_depth": 100,
  "kafka_qdc_idle_depth": 10,
  "kafka_qdc_enable": false,
  "kafka_qdc_latency_alpha": 0.002,
  "partition_autobalancing_concurrent_moves": 50,
  "cloud_storage_chunk_prefetch": 0,
  "partition_autobalancing_min_size_threshold": null,
  "cloud_storage_chunk_eviction_strategy": "eager",
  "cloud_storage_disable_chunk_reads": false,
  "cloud_storage_min_chunks_per_segment_threshold": 5,
  "cloud_storage_cache_chunk_size": 16777216,
  "cloud_storage_max_materialized_segments_per_shard": null,
  "cloud_storage_max_partition_readers_per_shard": null,
  "metrics_reporter_tick_interval": 60000,
  "cloud_storage_max_segment_readers_per_shard": null,
  "cloud_storage_cache_max_objects": 100000,
  "cloud_storage_cache_size_percent": 20.0,
  "retention_local_trim_overage_coeff": 2.0,
  "retention_local_target_capacity_percent": 80.0,
  "retention_local_target_capacity_bytes": null,
  "retention_local_target_ms_default": 86400000,
  "retention_local_target_bytes_default": null,
  "cloud_storage_upload_ctrl_min_shares": 100,
  "cloud_storage_upload_ctrl_d_coeff": 0.0,
  "cloud_storage_upload_ctrl_p_coeff": -2.0,
  "cloud_storage_azure_adls_port": null,
  "cloud_storage_azure_shared_key": null,
  "cloud_storage_cache_check_interval": 5000,
  "retention_local_trim_interval": 30000,
  "cloud_storage_azure_container": null,
  "cloud_storage_disable_metadata_consistency_checks": true,
  "cloud_storage_disable_upload_consistency_checks": false,
  "cloud_storage_topic_purge_grace_period_ms": 30000,
  "cloud_storage_materialized_manifest_ttl_ms": 10000,
  "cloud_storage_manifest_cache_size": 1048576,
  "cloud_storage_spillover_manifest_size": null,
  "cloud_storage_credentials_host": null,
  "cloud_storage_backend": "unknown",
  "cloud_storage_segment_size_target": null,
  "cloud_storage_recovery_temporary_retention_bytes_default": 1073741824,
  "enable_rack_awareness": false,
  "cloud_storage_enable_compacted_topic_reupload": true,
  "node_isolation_heartbeat_timeout": 3000,
  "cloud_storage_disable_upload_loop_for_tests": false,
  "cloud_storage_enable_segment_merging": true,
  "storage_min_free_bytes": 10485760,
  "cloud_storage_idle_threshold_rps": 1.0,
  "kafka_throughput_control": [],
  "cloud_storage_idle_timeout_ms": 10000,
  "leader_balancer_transfer_limit_per_shard": 512,
  "cloud_storage_housekeeping_interval_ms": 300000,
  "cloud_storage_metadata_sync_timeout_ms": 10000,
  "cloud_storage_readreplica_manifest_sync_timeout_ms": 30000,
  "cloud_storage_manifest_max_upload_interval_sec": 60,
  "readers_cache_eviction_timeout_ms": 30000,
  "cloud_storage_max_connection_idle_time_ms": 5000,
  "recovery_append_timeout_ms": 5000,
  "cloud_storage_api_endpoint_port": 443,
  "cloud_storage_upload_loop_initial_backoff_ms": 100,
  "cloud_storage_credentials_source": "config_file",
  "cloud_storage_api_endpoint": null,
  "cloud_storage_secret_key": null,
  "kafka_quota_balancer_min_shard_throughput_bps": 256,
  "cloud_storage_enable_remote_write": false,
  "kafka_enable_describe_log_dirs_remote_storage": true,
  "default_window_sec": 1000,
  "kafka_rpc_server_stream_recv_buf": null,
  "kafka_qdc_min_depth": 1,
  "compacted_log_segment_size": 268435456,
  "kafka_client_group_fetch_byte_rate_quota": [],
  "kafka_client_group_byte_rate_quota": [],
  "kafka_connections_max_per_ip": null,
  "transaction_coordinator_partitions": 50,
  "members_backend_retry_ms": 5000,
  "compaction_ctrl_max_shares": 1000,
  "election_timeout_ms": 1500,
  "auto_create_topics_enabled": true,
  "compaction_ctrl_min_shares": 10,
  "rps_limit_acls_and_users_operations": 1000,
  "kafka_noproduce_topics": [
    "__audit"
  ],
  "cloud_storage_disable_tls": false,
  "topic_partitions_reserve_shard0": 2,
  "kafka_batch_max_bytes": 1048576,
  "cloud_storage_hydrated_chunks_per_segment_ratio": 0.7,
  "compaction_ctrl_update_interval_ms": 30000,
  "kafka_request_max_bytes": 104857600,
  "node_management_operation_timeout_ms": 5000,
  "cloud_storage_segment_size_min": null,
  "rpc_server_tcp_send_buf": null,
  "enable_transactions": true,
  "kafka_enable_partition_reassignment": true,
  "raft_heartbeat_disconnect_failures": 3,
  "tx_timeout_delay_ms": 1000,
  "compaction_ctrl_p_coeff": -12.5,
  "kafka_mtls_principal_mapping_rules": null,
  "enable_schema_id_validation": "none",
  "leader_balancer_idle_timeout": 120000,
  "kafka_enable_authorization": null,
  "sasl_kerberos_principal": "redpanda",
  "cloud_storage_azure_storage_account": null,
  "sasl_kerberos_keytab": "/var/lib/redpanda/redpanda.keytab",
  "find_coordinator_timeout_ms": 2000,
  "cloud_storage_enable_remote_read": false,
  "rps_limit_move_operations": 1000,
  "kafka_qdc_window_size_ms": 1500,
  "cloud_storage_upload_loop_max_backoff_ms": 10000,
  "kafka_nodelete_topics": [
    "__audit",
    "__consumer_offsets",
    "_schemas"
  ],
  "node_status_reconnect_max_backoff_ms": 15000,
  "memory_enable_memory_sampling": true,
  "sasl_kerberos_config": "/etc/krb5.conf",
  "id_allocator_log_capacity": 100,
  "controller_log_accummulation_rps_capacity_configuration_operations": null,
  "retention_local_strict": false,
  "tm_sync_timeout_ms": 10000,
  "compaction_ctrl_d_coeff": 0.2,
  "aggregate_metrics": false,
  "storage_ignore_cstore_hints": false,
  "kafka_schema_id_validation_cache_capacity": 128,
  "compaction_ctrl_i_coeff": 0.0,
  "quota_manager_gc_sec": 30000,
  "storage_ignore_timestamps_in_future_sec": null,
  "cpu_profiler_sample_period_ms": 100,
  "group_initial_rebalance_delay": 0,
  "storage_compaction_index_memory": 134217728,
  "disk_reservation_percent": 25.0,
  "storage_max_concurrent_replay": 1024,
  "kafka_rpc_server_tcp_send_buf": null,
  "cloud_storage_upload_ctrl_update_interval_ms": 60000,
  "alter_topic_cfg_timeout_ms": 5000,
  "segment_fallocation_step": 33554432,
  "cloud_storage_graceful_transfer_timeout_ms": 5000,
  "sasl_mechanisms": [
    "SCRAM"
  ],
  "cloud_storage_manifest_upload_timeout_ms": 10000,
  "partition_autobalancing_mode": "node_add",
  "storage_reserve_min_segments": 2,
  "storage_read_readahead_count": 10,
  "cloud_storage_disable_read_replica_loop_for_tests": false,
  "kafka_max_bytes_per_fetch": 67108864,
  "enable_sasl": false,
  "kvstore_max_segment_size": 16777216,
  "cloud_storage_roles_operation_timeout_ms": 30000,
  "retention_bytes": null,
  "release_cache_on_segment_roll": false,
  "memory_abort_on_alloc_failure": true,
  "kvstore_flush_interval": 10,
  "enable_pid_file": true,
  "compaction_ctrl_backlog_size": null,
  "append_chunk_size": 16384,
  "reclaim_batch_cache_min_free": 67108864,
  "reclaim_stable_window": 10000,
  "fetch_session_eviction_timeout_ms": 60000,
  "enable_leader_balancer": true,
  "raft_transfer_leader_recovery_timeout_ms": 10000,
  "reclaim_max_size": 4194304,
  "metrics_reporter_url": "https://m.rp.vectorized.io/v2",
  "reclaim_min_size": 131072,
  "max_kafka_throttle_delay_ms": 30000,
  "raft_smp_max_non_local_requests": null,
  "rps_limit_configuration_operations": 1000,
  "raft_recovery_throttle_disable_dynamic_mode": false,
  "log_segment_ms_max": 31536000000,
  "cloud_storage_max_connections": 20,
  "cloud_storage_trust_file": null,
  "storage_target_replay_bytes": 10737418240,
  "raft_replicate_batch_window_size": 1048576,
  "replicate_append_timeout_ms": 3000,
  "reclaim_growth_window": 3000,
  "cloud_storage_max_segments_pending_deletion_per_partition": 5000,
  "kafka_group_recovery_timeout_ms": 30000,
  "partition_autobalancing_tick_moves_drop_threshold": 0.2,
  "default_topic_partitions": 1,
  "features_auto_enable": true,
  "wait_for_leader_timeout_ms": 5000,
  "max_compacted_log_segment_size": 5368709120,
  "create_topic_timeout_ms": 2000,
  "cloud_storage_upload_ctrl_max_shares": 1000,
  "log_segment_size_max": null,
  "cluster_id": "cf963d5c-a4cd-4c9f-93cd-76fbe3ea2cc5",
  "transaction_coordinator_delete_retention_ms": 604800000,
  "partition_autobalancing_tick_interval_ms": 30000,
  "space_management_max_segment_concurrency": 10,
  "storage_read_buffer_size": 131072,
  "usage_disk_persistance_interval_sec": 300,
  "metadata_status_wait_timeout_ms": 2000,
  "kafka_qdc_max_latency_ms": 80,
  "kafka_qdc_window_count": 12,
  "transaction_coordinator_cleanup_policy": "delete",
  "group_max_session_timeout_ms": 300000,
  "kafka_connections_max": null,
  "cloud_storage_enabled": false,
  "abort_timed_out_transactions_interval_ms": 10000,
  "default_topic_replications": 1,
  "join_retry_timeout_ms": 5000,
  "kafka_connection_rate_limit": null,
  "log_compaction_interval_ms": 10000,
  "cloud_storage_bucket": null,
  "cloud_storage_spillover_manifest_max_segments": null,
  "max_transactions_per_coordinator": 18446744073709551615,
  "max_concurrent_producer_ids": 18446744073709551615,
  "cloud_storage_access_key": null,
  "kafka_tcp_keepalive_probes": 3,
  "partition_autobalancing_max_disk_usage_percent": 80,
  "cloud_storage_cluster_metadata_upload_interval_ms": 60000,
  "transactional_id_expiration_ms": 604800000,
  "legacy_permit_unsafe_log_operation": true,
  "log_segment_size": 134217728,
  "rpc_server_listen_backlog": null,
  "space_management_enable": true,
  "group_topic_partitions": 3,
  "disable_public_metrics": false,
  "use_fetch_scheduler_group": true,
  "raft_replica_max_pending_flush_bytes": 262144,
  "kafka_tcp_keepalive_timeout": 120,
  "fetch_max_bytes": 57671680,
  "usage_window_width_interval_sec": 3600,
  "fetch_reads_debounce_timeout": 10,
  "storage_space_alert_free_threshold_bytes": 0,
  "log_segment_ms": 1209600000,
  "log_compression_type": "producer",
  "log_message_timestamp_type": "CreateTime",
  "log_cleanup_policy": "delete",
  "abort_index_segment_size": 50000,
  "kafka_throughput_limit_node_in_bps": null,
  "tx_log_stats_interval_s": 10,
  "rm_sync_timeout_ms": 10000,
  "controller_backend_housekeeping_interval_ms": 1000,
  "kafka_connection_rate_limit_overrides": [],
  "raft_max_concurrent_append_requests_per_follower": 16,
  "metadata_dissemination_retries": 30,
  "node_status_interval": 100,
  "default_num_windows": 10,
  "target_fetch_quota_byte_rate": null,
  "metadata_dissemination_retry_delay_ms": 320,
  "legacy_group_offset_retention_enabled": false,
  "group_offset_retention_check_ms": 600000,
  "group_offset_retention_sec": 604800,
  "log_segment_size_min": 1,
  "group_new_member_join_timeout": 30000,
  "group_min_session_timeout_ms": 6000,
  "superusers": [],
  "raft_recovery_default_read_size": 524288,
  "raft_flush_timer_interval_ms": 100,
  "kafka_qdc_depth_alpha": 0.8,
  "kafka_admin_topic_api_rate": null,
  "raft_io_timeout_ms": 10000,
  "cloud_storage_segment_max_upload_interval_sec": 3600,
  "metadata_dissemination_interval_ms": 3000,
  "kafka_connections_max_overrides": [],
  "usage_num_windows": 24,
  "raft_timeout_now_timeout_ms": 1000,
  "id_allocator_batch_size": 1000,
  "enable_usage": false,
  "sasl_kerberos_principal_mapping": [
    "DEFAULT"
  ],
  "raft_heartbeat_timeout_ms": 3000,
  "full_raft_configuration_recovery_pattern": [],
  "log_disable_housekeeping_for_tests": false,
  "controller_snapshot_max_age_sec": 60,
  "rps_limit_node_management_operations": 1000,
  "raft_heartbeat_interval_ms": 150,
  "cloud_storage_segment_upload_timeout_ms": 30000,
  "log_segment_ms_min": 60000,
  "admin_api_require_auth": false,
  "topic_fds_per_partition": 5,
  "enable_idempotence": true,
  "disable_batch_cache": false,
  "kafka_throughput_limit_node_out_bps": null,
  "segment_appender_flush_timeout_ms": 1000,
  "topic_memory_per_partition": 1048576,
  "target_quota_byte_rate": 2147483648,
  "raft_max_recovery_memory": null,
  "kafka_qdc_depth_update_ms": 7000,
  "delete_retention_ms": 604800000,
  "space_management_max_log_concurrency": 20,
  "topic_partitions_per_shard": 1000,
  "controller_log_accummulation_rps_capacity_node_management_operations": null,
  "cloud_storage_azure_adls_endpoint": null,
  "disable_metrics": false,
  "kafka_rpc_server_tcp_recv_buf": null,
  "raft_learner_recovery_rate": 104857600,
  "kafka_tcp_keepalive_probe_interval_seconds": 60,
  "transaction_coordinator_log_segment_size": 1073741824,
  "cloud_storage_initial_backoff_ms": 100,
  "rpc_server_tcp_recv_buf": null,
  "cloud_storage_cache_size": 0,
  "log_segment_size_jitter_percent": 5,
  "cloud_storage_region": null
}

@vordimous vordimous added the bug Something isn't working label Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant