Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Beam errors out when using PortableRunner (Flink Runner) – Cannot run program "docker" #30663

Open
4 of 16 tasks
harrymyburgh opened this issue Mar 18, 2024 · 3 comments
Open
4 of 16 tasks

Comments

@harrymyburgh
Copy link

What happened?

I am trying to deploy a Beam job (Python Beam) that runs on a PortableRunner (Flink Runner) in my Kubernetes cluster.
I have not experienced issues prior with Beam using the Flink Runner. However, today I tried to set up Beam to be a consumer from Apache Kafka using ReadFromKafka from apache_beam.io.kafka.

My Flink Cluster is managed by the Apache Flink Kubernetes Operator.

My Beam jobs are managed by a Beam Flink Job Manager, which posts Beam jobs to the Flink master. The Job Manager uses the image apache/beam_flink1.16_job_server:2.54.0.

My Flink Task Managers each contain a sidecar for a Beam worker pool, which is spun up using the image apache/beam_python3.11_sdk:2.54.0 and the arg --worker_pool.

When I start my beam job, I get the following error on the job manager logs:

Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory

These are my Beam pipeline options:

--job_name=beam_example_pipeline
--runner=PortableRunner
--job_endpoint=beam-flink-job-server:8099
--artifact_endpoint=beam-flink-job-server:8098
--environment_type=EXTERNAL
--environment_config=localhost:50000
--parallelism=1
--streaming

Some resources I've found suggest that the Kafka transform has its own environment type which is set to (and overrides any environment you set?) --environment_type=DOCKER, which is what causes the issues. However, I could be wrong, so please say so if I am.

All of this taking place on a Kubernetes cluster, where, to my knowledge, Docker in Docker is not recommended. I do not want to use a PROCESS environment_type, I require EXTERNAL. How can I resolve this issue? Is this a bug with Beam?

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@harrymyburgh harrymyburgh changed the title Beam errors out when using PortableRunner (Flink Runner) – Cannot run program "docker" [Bug] Beam errors out when using PortableRunner (Flink Runner) – Cannot run program "docker" Mar 18, 2024
@felipecaputo
Copy link

I'm currently facing the same problem myself, using a beam yaml pipeline, when it comes to run the SQL transformation it intents to run a docker with java in flink-main-container

WARN  org.apache.beam.runners.fnexecution.environment.DockerCommand [] - Unable to pull docker image apache/beam_java17_sdk:2.56.0, cause: Cannot run program "docker": error=2, No such file or directory
ERROR org.apache.flink.runtime.operators.BatchTask                 [] - Error in task code:  CHAIN MapPartition (MapPartition at [3]Sql/BeamCoGBKJoinRel_77/Join.Impl/CoGroup.ExpandCrossProduct/{extractKeylhs, CoGroupByKey}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/1)
java.lang.Exception: The user defined 'open()' method caused an exception: java.io.IOException: Cannot run program "docker": error=2, No such file or directory
org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:508) ~[flink-dist-1.17.2.jar:1.17.2]
org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:357) ~[flink-dist-1.17.2.jar:1.17.2]
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952) [flink-dist-1.17.2.jar:1.17.2]
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931) [flink-dist-1.17.2.jar:1.17.2]
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745) [flink-dist-1.17.2.jar:1.17.2]
org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) [flink-dist-1.17.2.jar:1.17.2]
java.lang.Thread.run(Unknown Source) [?:?]

The tasks were executed in the python-harness container runned smoothly, but when it come to the Sql transformation, it failed
image
image

@Raaaaaaaay86
Copy link

Facing same problem. Does anyone has solution? I'm trying to run a basic Kafka pipeline on my M1 MacBook.

This is the docker-compose which I'm running as an environment.

version: v3

services:
  kafka-1:
    container_name: my_kafka_cluster_node_1
    image: 'bitnami/kafka:3.5.1'
    deploy:
      resources:
        limits:
          memory: 2G
    networks:
      my_network:
        ipv4_address: 175.26.0.5
    ports:
      - 9097:9097
    environment:
      - KAFKA_KRAFT_CLUSTER_ID=CqhzvKYNSvmn-8NmZWgOHb
      - KAFKA_ENABLE_KRAFT=true
      - KAFKA_CFG_NODE_ID=1
      - KAFKA_CFG_PROCESS_ROLES=broker,controller
      - [email protected]:9093
      - KAFKA_CFG_LISTENERS=INTERNAL://175.26.0.5:9092,CONTROLLER://175.26.0.5:9093,EXTERNAL://:9097
      - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
      - KAFKA_CFG_ADVERTISED_LISTENERS=INTERNAL://175.26.0.5:9092,EXTERNAL://localhost:9097
      - KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
      - KAFKA_CFG_INTER_BROKER_LISTENER_NAME=INTERNAL

  flink-jobmanager:
    image: flink:1.18
    networks:
      my_network:
        ipv4_address: 175.26.0.23
    ports:
      - "8081:8081"
    command: jobmanager
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: flink-jobmanager
        jobmanager.memory.process.size: 5gb

  flink-taskmanager:
    image: flink:1.18
    networks:
      my_network:
        ipv4_address: 175.26.0.24
    depends_on:
      - flink-jobmanager
    command: taskmanager
    scale: 1
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: flink-jobmanager
        taskmanager.numberOfTaskSlots: 20
        taskmanager.memory.process.size: 5gb
  
  beam-worker-pool:
    image: apache/beam_python3.11_sdk:latest
    networks:
      my_network:
        ipv4_address: 175.26.0.25
    ports:
      - "50000:50000"
    command: --worker_pool
    environment:
      - BEAM_WORKER_POOL_IN_DOCKER_VM=1

networks:
  my_network:
    driver: bridge
    ipam:
      driver: default
      config:
        - subnet: 175.26.0.0/24

and I run a beam_flink1.18_job_server seperatly by docker command.

docker run -d --network my_network -p 8099:8099 -p 8098:8098 -p 8097:8097 -v /tmp/f-conf:/flink-conf apache/beam_flink1.18_job_server:latest --flink-conf-dir=/flink-conf --flink-master=cre-center-flink-jobmanager:8081

and this is my Beam pipeline

import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.io.kafka import ReadFromKafka

if __name__ == "__main__":
  options = PipelineOptions([
    "--runner=PortableRunner",
    "--job_endpoint=localhost:8099",
    "--environment_type=EXTERNAL",
    "--environment_config=175.26.0.25:50000"
  ])

  with beam.Pipeline(options=options) as pipeline:
    (
    pipeline
    | "Read Data" >> ReadFromKafka(
      consumer_config={
        "bootstrap.servers": "175.26.0.5:9092",
        "auto.offset.reset": "earliest",
      },
      topics=["member_entered_store"])
    )

finally, I'll execute the pipeline by running command on MacBook

python beam_playground/main.py

and I receive same error in beam_flink1.18_job_server container

Cannot run program "docker": error=2, No such file or directory

@liferoad
Copy link
Collaborator

I think the flink cluster should have the docker installed since it needs to download the Python SDK worker image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants