-
Notifications
You must be signed in to change notification settings - Fork 39
Scaling Examples
The following example pipelines demonstrate how to use each of the transports to scale a Baleen process. A more detailed introduction can be found in Baleen scaling. It is assumed that the reader is familiar with setting up simple Baleen pipelines and so little explanation is given here regarding the basics of collection readers, annotators and consumers.
The configuration for each transport assumes the default (for example if working with the Docker examples at the foot of this page) but will likely need to be changed real applications.
pipelines: - name: receiver multiplicity: 2 file: ./receiver.yml - name: sender file: .sender.yml
pipelines: # Optional receiver on master #- name: receiver # multiplicity: 2 # file: ./receiver.yml - name: sender file: ./sender.yml
pipelines: - name: receiver multiplicity: 4 file: ./receiver.yml
(for vertical scaling only)
collectionreader: class: FolderReader folders: - ./files consumers: - class: uk.gov.dstl.baleen.transports.memory.MemoryTransportSender # topic: transport # capacity: # blacklist: # whitelist:
collectionreader: class: uk.gov.dstl.baleen.transports.memory.MemoryTransportReceiver # topic: transport # blacklist: # whitelist: annotators: - myAnnotators consumers: - print.Entities
activemq: # host: localhost # port: 61616 # protocol: tcp # brokerargs: # topic: baleen # user: # pass: collectionreader: class: FolderReader folders: - ./files consumers: - class: uk.gov.dstl.baleen.transports.activemq.ActiveMQTransportSender # topic: transport # capacity: # blacklist: # whitelist:
activemq: # host: localhost # port: 61616 # protocol: tcp # brokerargs: # topic: baleen # user: # pass: collectionreader: class: uk.gov.dstl.baleen.transports.activemq.ActiveMQTransportReceiver # topic: transport # blacklist: # whitelist: annotators: - myAnnotators consumers: - print.Entities
kafka: # host: localhost # port: 9092 # consumerGroupId: 1 collectionreader: class: FolderReader folders: - ./files consumers: - class: uk.gov.dstl.baleen.transports.kafka.KafkaTransportSender # topic: transport # capacity: # blacklist: # whitelist:
kafka: # host: localhost # port: 9092 # consumerGroupId: 1 collectionreader: class: uk.gov.dstl.baleen.transports.kafka.KafkaTransportReceiver # topic: transport # blacklist: # whitelist: # timeout: 1000 # maxPollDocs: 1 # offset: earliest annotators: - myAnnotators consumers: - print.Entities
rabbitmq: # host: localhost # port: 5672 # virtualHost: / user: guest pass: guest # https: false # Only relevent for http true # trustAll: false # Only relevent for http true and trustAll false # sslprotocol: TLSv1.1 # keystorePass: # keystorePath: # truststorePass: # truststorePath: collectionreader: class: FolderReader folders: - ./files consumers: - class: uk.gov.dstl.baleen.transports.rabbitmq.RabbitMQTransportSender # topic: transport # capacity: # blacklist: # whitelist: # exchange: # routingKey
rabbitmq: # host: localhost # port: 5672 # virtualHost: / user: guest pass: guest # https: false # Only relevent for http true # trustAll: false # Only relevent for http true and trustAll false # sslprotocol: TLSv1.1 # keystorePass: # keystorePath: # truststorePass: # truststorePath: collectionreader: class: uk.gov.dstl.baleen.transports.rabbitmq.RabbitMQTransportReceiver # topic: transport # blacklist: # whitelist: annotators: - myAnnotators consumers: - print.Entities
redis: # host: localhost # port: 6379 collectionreader: class: FolderReader folders: - ./files consumers: - class: uk.gov.dstl.baleen.transports.redis.RedisTransportSender # topic: transport # capacity: # blacklist: # whitelist:
redis: # host: localhost # port: 6379 collectionreader: class: uk.gov.dstl.baleen.transports.redis.RedisTransportReceiver # topic: transport # blacklist: # whitelist: annotators: - myAnnotators consumers: - print.Entities
For testing purposes with vertical scaling the following Docker commands may be useful to run the transports on the localhost. Note that in Windows it may be necessary to change the host to the Docker container IP in the sender and receiver yaml files.
docker run -d -e 'ACTIVEMQ_CONFIG_MINMEMORY=512' -e 'ACTIVEMQ_CONFIG_MAXMEMORY=2048' -p 61616:61616 -p 8161:8161 webcenter/activemq:5.14.3
You can then go to http://localhost:8161 in a browser to use the management console with credentials admin:admin
docker run -d -p 2181:2181 -p 9092:9092 -e ADVERTISED_HOST=`localhost` -e ADVERTISED_PORT=9092 spotify/kafka
or on Windows if kafka is not visible on the localhost:
docker run -d -p 2181:2181 -p 9092:9092 -e ADVERTISED_HOST=`docker-machine ip \`docker-machine active\`` -e ADVERTISED_PORT=9092 spotify/kafka
NB we use this image as it includes the Zookeeper service in the same image.
docker run -d --hostname my-rabbit -p 15672:15672 -p 5672:5672 rabbitmq:3-management
You can then go to http://localhost:15672 in a browser to use the management console with credentials guest:guest
docker run -d -p 6379:6379 redis:3.2.9