Skip to content

Latest commit

 

History

History
569 lines (487 loc) · 25.6 KB

README.md

File metadata and controls

569 lines (487 loc) · 25.6 KB

alt Logo

Titanoboa Step Functions

This repository contains sample ready-made steps for titanoboa (github repository is here ):

AWS

🧬 Bioinformatics 🔬

Http Client

JDBC Client

Kafka Producer & Consumer

PDF Generation

SFTP Client

Smtp Client

SSH Client


AWS EC2

Provides functions to list, start and stop EC2 instances. Primarily uses amazonica library. Refer to the library's documentation for detailed information on the supported properties.

Installation

  1. Add following maven coordinates into titanoboa's external dependencies file: Clojars Project
  2. Require namespace: io.titanoboa.tasklet.aws.ec2

Usage

List EC2 Instances

:workload-fn

io.titanoboa.tasklet.aws.ec2/list-instances

Sample Step Definition

{:type :aws-ec2-list,
 :supertype :tasklet,
 :description "Lists all EC2 instances for all reservations.\nReturns :ec2-instances key with list of instances as a value:\n{:ec2-instances [{instance1 map} {instance2 map} ...]}",
 :properties {:credentials {:access-key "", :secret-key "", :endpoint "eu-central-1"}},
 :workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.ec2/list-instances", :type "clojure"}}

Start EC2 Instances

:workload-fn

io.titanoboa.tasklet.aws.ec2/start-instances

Sample Step Definition

{:type :aws-ec2-start,
 :supertype :tasklet,
 :description "Starts an EC2 instance.\nReturns :starting-instances key with status value map.",
 :workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.ec2/start-instances", :type "clojure"}
 :properties {:credentials {:access-key "", :secret-key "", :endpoint "eu-central-1"}, :instance-ids ["i-0a123a454b678aeb6"]}
}

Stop EC2 Instances

:workload-fn

io.titanoboa.tasklet.aws.ec2/stop-instances

Sample Step Definition

{:type :aws-ec2-stop,
 :supertype :tasklet,
 :description "Stops an EC2 instance.\nReturns :stopping-instances key with status value map.",
 :properties {:credentials {:access-key "", :secret-key "", :endpoint "eu-central-1"}, :instance-ids ["i-0a123a454b678aeb6"]},
 :workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.ec2/stop-instances", :type "clojure"}
}


AWS S3

Provides functions to read, download and upload S3 objects. Primarily uses amazonica library. Refer to the library's documentation for detailed information on the supported properties.

Installation

  1. Add following maven coordinates into titanoboa's external dependencies file: Clojars Project
  2. Require namespace: io.titanoboa.tasklet.aws.s3

Usage

Read S3 Object

:workload-fn

io.titanoboa.tasklet.aws.s3/read

Sample Step Definition

{:type :aws-s3-read,
 :supertype :tasklet,
 :description "Reads textual content of a s3 file and returns it as a job property :s3-object",
 :workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.s3/read", :type "clojure"}
 :properties {:key "index.html", :credentials {:access-key "", :secret-key "", :endpoint "eu-central-1"}, :bucket ""}}

Download S3 Object

:workload-fn

io.titanoboa.tasklet.aws.s3/download

Sample Step Definition

{:type :aws-s3-download,
 :supertype :tasklet,
 :description "Downloads a file from s3 bucket to job directory under the specified name.",
 :properties {:key "index.html", :credentials {:access-key "", :secret-key "", :endpoint "eu-central-1"}, :save-as "path/to/file", :bucket "bucket-name"},
 :workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.s3/download", :type "clojure"}}

Upload S3 Object

:workload-fn

io.titanoboa.tasklet.aws.s3/upload

Sample Step Definition

{:type        :aws-s3-upload,
 :supertype   :tasklet,
 :description "Uploads specified file from job directory into the given s3 bucket.",
 :properties  {:key "index.bkp", :credentials {:access-key "", :secret-key "", :endpoint "eu-central-1"}, :file-path "index.html", :bucket ""},
 :workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.s3/upload", :type "clojure"}}


AWS SES

Provides functions to send email via AWS SES. Primarily uses amazonica library. Refer to the library's documentation for detailed information on the supported properties.

Installation

  1. Add following maven coordinates into titanoboa's external dependencies file: Clojars Project
  2. Require namespace: io.titanoboa.tasklet.aws.ses

Usage

:workload-fn

io.titanoboa.tasklet.aws.ses/send-email

Sample Step Definition

{:type :aws-ses,
 :supertype :tasklet,
 :description "Sends an email via SES.\nReturns :message-id key with message id value.\n",
 :properties {:credentials {:access-key "", :secret-key "", :endpoint "eu-west-1"}, :from "[email protected]",
              :message {:body {:html "testing 1-2-3-4", :text "testing 1-2-3-4"}, :subject "greetings from titanoboa"}, :to ["[email protected]"]},
 :workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.ses/send-email", :type "clojure"}} 


AWS SNS

Provides functions to send notification via AWS SNS. Primarily uses amazonica library. Refer to the library's documentation for detailed information on the supported properties.

Installation

  1. Add following maven coordinates into titanoboa's external dependencies file: Clojars Project
  2. Require namespace: io.titanoboa.tasklet.aws.sns

Usage

:workload-fn

io.titanoboa.tasklet.aws.sns/publish

Sample Step Definition

{:type :aws-sns,
 :supertype :tasklet,
 :description "Publishes a message into an SNS topic.",
 :workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.aws.sns/publish",
                                        :type "clojure"},
 :properties  {:topic-arn "arn:aws:sns:us-east-1:676820690883:my-topic",
               :subject "test",
               :message "",
               :message-attributes {"attr" "value"}}}


AWS SQS

Provides functions to send message via AWS SQS. Primarily uses amazonica library. Refer to the library's documentation for detailed information on the supported properties.

Installation

  1. Add following maven coordinates into titanoboa's external dependencies file: Clojars Project
  2. Require namespace: io.titanoboa.tasklet.aws.sqs

Usage

:workload-fn

io.titanoboa.tasklet.aws.sqs/send-message

Sample Step Definition

{:type :aws-sqs,
 :supertype :tasklet,
 :description "Sends a text message to a queue.",
 :workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.aws.sqs/send-message",
                                        :type "clojure"},
 :properties {:credentials {:access-key "",
                            :secret-key "",
                            :endpoint "eu-central-1"},
              :message-attributes {},
              :message-body "",
              :queue-url ""}}


JDBC Client

Performs a JDBC query and returns corresponding data. Note that code of jdbc tasklet is part of standard Titanoboa distribution and is not in this repository.

Installation

  1. Add whatever jdbc driver you need to use to titanoboa's ./lib folder
  2. Require namespace: titanoboa.tasklet.jdbc in titanoboa's external dependencies file. You may also need to require titanoboa.system.jdbc (see point 3.)
  3. Do not forget to also define and configure corresponding jdbc system for DB connection pooling in your server configuration (in this example there is a connection pool system :test-db that is using titanoboa.system.jdbc/jdbc-pool

Usage

:workload-fn

titanoboa.tasklet.jdbc/query

Sample Step Definition

{:type :jdbc
 :supertype :tasklet
 :workload-fn #titanoboa.exp/Expression {:value "titanoboa.tasklet.jdbc/query"}
 :properties {:response-property-name :db-data
              :data-source-ks [:test-db :system :pool]
              :query {:select [:o.ordernumber :o.TotalAmount :c.FirstName :c.LastName :c.City :c.Country],
                      :from [[:customers :c]]
                      :left-join [[:orders :o] [:= :c.id :o.customerid]]
                      :order-by [[:o.totalamount :desc :nulls-last]]
                      :limit 50}}}
              

Expected step properties are as follows:

  • :query - either a query string or a map in honeysql format
  • :data-source-ks key set pointing to the JDBC data source object among the running systems, when used with titanoboa.system.jdbc/jdbc-pool the format is [:< jdbc pool systemu> :system :pool] so e.g. if the jdbc system is :test-db then it is [:test-db :system :pool]
  • :response-property-name is self-explanatory


Http Client

Makes an http(s) call and returns (parsed) response. Primarily uses clj-http library. Refer to the library's documentation for detailed information on the generation process and all supported properties.

Installation

  1. Add following maven coordinates into titanoboa's external dependencies file: Clojars Project
  2. Require namespace: io.titanoboa.tasklet.httpclient

Usage

:workload-fn

io.titanoboa.tasklet.httpclient/request

Sample Step Definition

{:type :http-client
 :supertype :tasklet
 :workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.httpclient/request" :type "clojure"}
 :properties {:url "https://jsonplaceholder.typicode.com/posts/1"
              :request-method :get
              :as :json
              :proxy-host "127.0.0.1"
              :proxy-port 8118
              :response-property-name :rest-response
              :body-only? false
              :connection-pool {:timeout 5 :threads 4 :insecure? false :default-per-route 10}}}
              


Smtp Client

Sends email via smtp. Primarily uses postal library. Refer to the library's documentation for detailed information on the generation process and all supported properties.

Installation

  1. Add following maven coordinates into titanoboa's external dependencies file: Clojars Project
  2. Require namespace: io.titanoboa.tasklet.smtp

Usage

:workload-fn

titanoboa.tasklet.smtp/send

Sample Step Definition

{:type :smtp
 :supertype :tasklet
 :workload-fn #titanoboa.exp/Expression{:value "titanoboa.tasklet.smtp/send"}
 :properties {:connection {:host "localhost"
                           :port 25
                           :user ""
                           :pass ""
                           :ssl false
                           :tls false}
              :email {:from "[email protected]"
                      :to "[email protected]"
                      :cc ["[email protected]", "[email protected]", "[email protected]"]
                      :bcc "[email protected]"
                      :subject "Cat!"
                      :date #titanoboa.exp/Expression{:value "(java.util.Date.)"}
                      :message-id ""
                      :user-agent ""
                      :body [{:type "text/plain"
                              :content "Hey folks,\n\nCheck out these pictures of my cat!"}
                             {:type :inline
                              :content #titanoboa.exp/Expression{:value "(File. \"/tmp/lester-flying-photoshop\")"}
                              :content-type "image/jpeg"
                              :file-name "lester-flying.jpeg"}
                             {:type :attachment
                              :content #titanoboa.exp/Expression{:value "(File. \"/tmp/lester-upside-down.jpeg\")"}}]}}}              


SSH and SFTP

SSH and SFTP Client. Primarily uses clj-ssh library. Refer to the library's documentation for detailed information on the generation process and all supported properties.

Installation

  1. Add following maven coordinates into titanoboa's external dependencies file: Clojars Project
  2. Require namespace: io.titanoboa.tasklet.ssh

Usage

SSH

:workload-fn

io.titanoboa.tasklet.ssh/ssh

Sample Step Definition

{:type :ssh,
 :supertype :tasklet,
 :description "SSH Client",
 :properties {:ssh-agent-settings {:use-system-ssh-agent false}, 
              :identities {:private-key-path "/path/to/key.pem"}, 
               :ssh-cmd-map {:in "echo hello"}, 
               :host "xxx.eu-central-1.compute.amazonaws.com", 
               :session-options {:username "ec2-user", :strict-host-key-checking "no", :preferred-authentications "publickey"}},
 :workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.ssh/ssh", :type "clojure"}}

SFTP

:workload-fn

io.titanoboa.tasklet.ssh/sftp

Sample Step Definition

{:type :sftp,
 :supertype :tasklet,
 :description "SFTP Client",
 :properties {:ssh-agent-settings {:use-system-ssh-agent false}, 
               :identities {:private-key-path "/path/to/key.pem"}, 
               :sftp-cmds-vec [[:ls "/home/ec2-user/"]], 
               :host "xxx.eu-central-1.compute.amazonaws.com", 
               :session-options {:username "ec2-user", 
                                 :strict-host-key-checking "no", 
                                 :preferred-authentications "publickey"}},
 :workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.ssh/sftp", :type "clojure"}}


PDF

Generates a pdf file based on job properties. Primarily uses clj-pdf library. Refer to the library's documentation for detailed information on the generation process and all supported properties.

Installation

  1. Add following maven coordinates into titanoboa's external dependencies file: Clojars Project
  2. Require namespace: io.titanoboa.tasklet.pdf

Usage

:workload-fn

io.titanoboa.tasklet.pdf/generate-pdf

Sample Properties

{:pdf-sections [[:list {:roman true}
          [:chunk {:style :bold} "a bold item"]
          "another item"
          "yet another item"]
   [:phrase "some text"]
   [:phrase "some more text"]
   [:paragraph "yet more text"]] 
 :file-name "example.pdf" 
 :pdf-metadata {:bottom-margin 10, :creator "Jane Doe", :doc-header ["inspired by" "William Shakespeare"], :right-margin 50, :left-margin 10, :footer "page", :header "page header", :size "a4", :title "Test doc", :author "John Doe", :top-margin 20, :subject "Some subject"}}

Sample Step Definition

{:type :pdf-generation
 :supertype :tasklet
 :properties
 {:pdf-sections [[:list {:roman true}
          [:chunk {:style :bold} "a bold item"]
          "another item"
          "yet another item"]
   [:phrase "some text"]
   [:phrase "some more text"]
   [:paragraph "yet more text"]] 
 :file-name "example.pdf" 
 :pdf-metadata {:bottom-margin 10, :creator "Jane Doe", :doc-header ["inspired by" "William Shakespeare"], :right-margin 50, :left-margin 10, :footer "page", :header "page header", :size "a4", :title "Test doc", :author "John Doe", :top-margin 20, :subject "Some subject"}}
 :workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.pdf/generate-pdf", :type "clojure"}}


Kafka Producer & Consumer

A simple Kafka producer and consumer. Primarily uses dvlopt/kafka library. Refer to the library's documentation for detailed information on the generation process and all supported properties.

Installation

  1. Add following maven coordinates into titanoboa's external dependencies file: Clojars Project
  2. Require namespace: io.titanoboa.tasklet.kafka

Usage

Producer

:workload-fn

io.titanoboa.tasklet.kafka/produce

Sample Step Definition

{:type        :kafka-produce,
 :supertype   :tasklet,
 :workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.kafka/produce",
                                        :type  "clojure"},
 :properties  {:kafka-producer-config {:dvlopt.kafka/nodes             [["localhost"
                                                                         9092]],
                                       :dvlopt.kafka/serializer.key    :long,
                                       :dvlopt.kafka/serializer.value  :string,
                                       :dvlopt.kafka.out/configuration {"client.id"        "my-producer",
                                                                        "transactional.id" "some transaction id"}},
               :records               [{:topic "test-topic",
                                        :key   123,
                                        :value "Hello World!"}]}} 

Consumer

:workload-fn

io.titanoboa.tasklet.kafka/consume

Sample Step Definition

{:type        :kafka-consume,
 :supertype   :tasklet,
 :workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.kafka/consume",
                                        :type  "clojure"},
 :properties  {:kafka-topics          ["test-topic"],
               :poll-options          {:dvlopt.kafka/timeout [1
                                                              :seconds]},
               :kafka-consumer-config {:dvlopt.kafka/nodes              [["localhost"
                                                                          9092]],
                                       :dvlopt.kafka/deserializer.key   :long,
                                       :dvlopt.kafka/deserializer.value :string,
                                       :dvlopt.kafka.in/configuration   {"auto.offset.reset"  "earliest",
                                                                         "enable.auto.commit" false,
                                                                         "max.poll.records"   "50",
                                                                         "group.id"           "my-group"}}}}


🧬 K-mer count

Few simple functions to help with K-mer counting and analysis of FASTQ data files. Also contains functions for splitter (map) and agregator (reduce) type of steps to help with parallel processing.

Note that a thought needs to be put into what underlying file system that would be used (e.g. HDFS, EFS etc.) and whether a physical splitting of the file would be performed prior to the counting.

Installation

  1. Add following maven coordinates into titanoboa's external dependencies file: Clojars Project
  2. Require namespace: io.titanoboa.tasklet.kmer

Usage

K-Mer count

:workload-fn

io.titanoboa.tasklet.kmer/kmer-count

Sample Job Properties

{:create-folder? false,
              :fastq-file "/path/to/fastq/file",
              :start 0,
              :end 12,
              :k 3,
              :top-n 10}

Map/Reduce Steps

Map :workload-fn

io.titanoboa.tasklet.kmer/split-fastq

Reduce :workload-fn

io.titanoboa.tasklet.kmer/reduce-kmers

Sample Job Properties

{:fastq-file "/path/to/fastq/file",
              :k 3,
              :split-to 12}

Sample Map/Reduce Workflow Definition

{:first-step "splitter",
 :name "kmer-map-reduce",
 :revision 4,
 :type nil,
 :properties {:fastq-file "/mnt/efs/sars2/reclojure.fastq",
              :k 3,
              :split-to 12,
              :top-n 10},
 :steps [{:id "splitter",
          :type :map,
          :supertype :map,
          :next [["*" "aggregator"]],
          :workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.kmer/split-fastq",
                                                 :type "clojure"},
          :properties {:jobdef-name "k-mer-count",
                       :sys-key :core,
                       :standalone-system? false},
          :revision 1}
         {:id "aggregator",
          :type :reduce,
          :supertype :reduce,
          :workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.kmer/reduce-kmers",
                                                 :type "clojure"},
          :next [],
          :properties {:map-step-id "splitter", :commit-interval 100},
          :revision 1}]}