Skip to content
This repository has been archived by the owner on Jan 6, 2023. It is now read-only.

onyx log problem for exception #563

Closed
ghost opened this issue Apr 6, 2016 · 13 comments
Closed

onyx log problem for exception #563

ghost opened this issue Apr 6, 2016 · 13 comments
Assignees

Comments

@ghost
Copy link

ghost commented Apr 6, 2016

when i use onyx to create some realtime job,
it seems it hang up before submit job when encounter some exception.
i hope i can display the exception, but i don’t know what is the point.

for example, when i put the (/ 3 0) after start-env and start-peer-group, it hang up.
i can catch the exception, but i think it’s not convenient.

thanks for your great works.

(def env-config
  {:zookeeper/address "127.0.0.1:4188"
   :zookeeper/server? true
   :zookeeper.server/port 4188
   :onyx.bookeeper/server? true
   :onyx.bookeeper/local-quorum? true
   :onyx.bookeeper/local-quorum-ports [4196 4197 4198]
   :onyx/tenancy-id id
   :onyx.log/config {}})

(def peer-config
  {:zookeeper/address "127.0.0.1:4188"
   :onyx/tenancy-id id
   :onyx.peer/job-scheduler :onyx.job-scheduler/balanced
   :onyx.messaging/impl :aeron
   :onyx.messaging/peer-port 40200
   :onyx.messaging/bind-addr "localhost"
   :onyx.log/config {}})

(def env (onyx.api/start-env env-config))
(def peer-group (onyx.api/start-peer-group peer-config))
(timbre/set-config! {})
(println "hello1")
(/ 3 0)
(println "hello2")
@gardnervickers
Copy link
Member

If your evaluating a file and an exception occurs, Clojure will not continue evaluating the file. I do not understand what the issue is here, what are you trying to do?

@ghost
Copy link
Author

ghost commented Apr 6, 2016

when i run the code,
if i made some typo after [start-env] sentence.
the log didn't print the Exception information, it hang up instead.

without print any information

so i can't know what's happened exactly.

as the above example shown,
the log print hello1 only, it don't print "divided by zero" exception after the start-env.
but i put the (/ 3 0) to the file beginning location, it print the "divided by zero" exception.

so i think the onyx.api/start-env change the log behavior.
it capture the exception and hang up, but it didn't show any clue to me to debug it.

sorry for my bad english expression. i'm a chinese with very basic english skill.

thanks for your great job

@ghost
Copy link
Author

ghost commented Apr 6, 2016

i attach an test file so that you can test it conveniently.
in repl, it will report:

CompilerException java.lang.ClassNotFoundException: onyx.plugin.sql, compiling:  (/data/larluo/git/9now/workspace/onyx-umeng/src/onyx_umeng/core.clj:78:0) 

but i run it directly by lein run -m onyx-umeng.core
, it will hang up, and tell me nothing, where the error information gone?.

(defproject onyx-umeng "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.8.0"]
                 [clj-http "2.1.0"]
                 [clj-time "0.11.0"]
                 [enlive "1.1.6"]
                 [cheshire "5.5.0"]
         [dire "0.5.4"]
                 [org.clojure/core.async "0.2.374"]
                 [org.onyxplatform/onyx "0.9.0"]
                 [org.onyxplatform/onyx-datomic "0.9.0.0"]
                 [org.onyxplatform/onyx-redis "0.9.0.0"]
                 [org.onyxplatform/onyx-sql "0.9.0.0"]
                 [com.datomic/datomic-free "0.9.5173" :exclusions [joda-time]]
                 [com.oracle/ojdbc6 "11.2.0"]
                 [com.taoensso/timbre "4.3.1"] ])

(ns onyx-umeng.core
  (:require [cheshire.core :as cheshire]
            [clj-http.client :as http]
            [clojure.core.async :as async :refer [>!! <!! timeout chan close!]]
            [taoensso.timbre :as timbre]
            [onyx.plugin.core-async]
            [onyx.plugin.datomic]
            [onyx.plugin.redis]
            [onyx.api]
            [dire.core :as dire]
            [datomic.api :as datomic]
            [clojure.string :as str]
            [clj-time.core :as time]
            [clj-time.format :as time-format]
            [clj-time.local :as time-local]
            [onyx.static.planning]))

;;;;;;;;;;;;;;;;;;;
;; FUNCTION
;;;;;;;;;;;;;;;;;;
(def workflow [[:umeng-http    :oracle-prepare]
               [:oracle-prepare :out-to-oracle]])

(defn umeng-http [segment]
  (Thread/sleep 1000)
  {:ts (doto (time-local/format-local-time (time-local/local-now) :date) (timbre/info "@ts")) 
   :json {:installation 2827909
          :yesterday_new_users 2711
          :today_new_users 4
          :yesterday_active_users 20489
          :today_active_users 26
          :yesterday_launches 62668
          :today_launches 34}}     )

(defn oracle-prepare [{:keys [ts json] :as segment}]
  {:rows [json] :where [:day_id ts]} )

;;;;;;;;;;;;;;
;; setup env
;;;;;;;;;;;;;
(def id (java.util.UUID/randomUUID))
(def env-config
  {:zookeeper/address "127.0.0.1:4288"
   :zookeeper/server? true
   :zookeeper.server/port 4288
   :onyx.bookeeper/server? true
   :onyx.bookeeper/local-quorum? true
   :onyx.bookeeper/local-quorum-ports [4296 4297 4298]
   :onyx/tenancy-id id
   :onyx.log/config {}})

(def peer-config
  {:zookeeper/address "127.0.0.1:4288"
   :onyx/tenancy-id id
   :onyx.peer/job-scheduler :onyx.job-scheduler/balanced
   :onyx.messaging/impl :aeron
   :onyx.messaging/peer-port 40202
   :onyx.messaging/bind-addr "localhost"
   :onyx.log/config {}})

(def env (onyx.api/start-env env-config))
(def peer-group (onyx.api/start-peer-group peer-config))
(def n-peers (count (set (mapcat identity workflow))))
(def v-peers (onyx.api/start-peers n-peers peer-group))

#_(def db {:classname "oracle.jdbc.OracleDriver"
           :subprotocol "oracle"
           :subname "thin:@10.0.146.58:1521:orcl"
           :user "qsample"
           :password "qsample91"})
#_(clojure.java.jdbc/execute! db ["INSERT INTO DM_WX_DATAPUSH_UMENG_TEST (DAY_ID) VALUES ('2016-04-07')"])
#_(clojure.java.jdbc/query db ["select * from DM_WX_DATAPUSH_UMENG_TEST WHERE ROWNUM <= 10"] )
;;;;;;;;;;;;;;;;;;
;; catalog
;;;;;;;;;;;;;;;;
(timbre/info "catalog")
(def batch-size 10)
(def catalog
  [{:onyx/name :umeng-http
    :onyx/fn ::umeng-http
    :onyx/type :function
    :onyx/batch-timeout 10000
    :onyx/batch-size batch-size
    :onyx/max-peers 1
    :onyx/doc "get the eleme data from the rest api"}
   {:onyx/name :redis-prepare
    :onyx/fn ::redis-prepare
    :onyx/type :function
    :onyx/batch-size batch-size
    :onyx/max-peers 1}
   {:onyx/name :out-to-oracle
    :onyx/plugin onyx.plugin.sql/upsert-rows
    :onyx/type :out
    :onyx/medium :sql
    :sql/class "oracle.jdbc.driver.OracleDriver"
    :sql/subprotocol "oracle"
    :sql/subname "thin:@10.0.146.58:1521:orcl"
    :sql/user "qsample"
    :sql/password "qsample91"
    :onyx/max-peers 1
    :onyx/batch-size batch-size
    :onyx/doc "write data to file system"}])

;;;;;;;;;;;;;;;;;;;;;;;;
;; WORKFLOW + LIFECYCLE
;;;;;;;;;;;;;;;;;;;;;;;;
(timbre/info "workflow + lifecycle")
(def job-id
  (:job-id
   (onyx.api/submit-job
    peer-config
    {:catalog catalog
     :workflow workflow
     :lifecycles [{:lifecycle/task :out-to-oracle :lifecycle/calls :onyx.plugin.datomic/onyx.plugin.sql/write-rows-calls}]
     :task-scheduler :onyx.task-scheduler/balanced})))

;;;;;;;;;;;;;;;;;;;;
;; CLOSE
;;;;;;;;;;;;;;;;;;;;
(onyx.api/await-job-completion peer-config job-id)

(doseq [v-peer v-peers]
  (onyx.api/shutdown-peer v-peer))
(onyx.api/shutdown-peer-group peer-group)
(onyx.api/shutdown-env env)

@gardnervickers
Copy link
Member

In the file you provided, it does not look like you imported onyx.plugin.sql.
Onyx produces a log file at onyx.log. Your exception may be getting logged inside that file.

@ghost
Copy link
Author

ghost commented Apr 7, 2016

yeah, i know the problem that exist in my code.and i also check the log file location.

the strange thing is the onyx.log didn't log the exception.
it just hang up, and tell me nothing .
so i need to add the print to know the location.

thanks.

@MichaelDrogalis
Copy link
Contributor

Hi @larluo. I ran your example:

(def env-config
  {:zookeeper/address "127.0.0.1:4188"
   :zookeeper/server? true
   :zookeeper.server/port 4188
   :onyx.bookeeper/server? true
   :onyx.bookeeper/local-quorum? true
   :onyx.bookeeper/local-quorum-ports [4196 4197 4198]
   :onyx/tenancy-id id
   :onyx.log/config {}})

(def peer-config
  {:zookeeper/address "127.0.0.1:4188"
   :onyx/tenancy-id id
   :onyx.peer/job-scheduler :onyx.job-scheduler/balanced
   :onyx.messaging/impl :aeron
   :onyx.messaging/peer-port 40200
   :onyx.messaging/bind-addr "localhost"
   :onyx.log/config {}})

(def env (onyx.api/start-env env-config))
(def peer-group (onyx.api/start-peer-group peer-config))
(timbre/set-config! {})
(println "hello1")
(/ 3 0)
(println "hello2")

Evaluating the buffer line by line, Onyx starts, then you reset Timbre's logging to the empty map, then print a value, then evaluate a form that will throw, then print another line. Clojure throws java.lang.ArithmeticException: Divide by zero when I execute this code, both with and without the line (timbre/set-config! {}). It sounds like something in your environment is eating exceptions. Check onyx.log for a clue, but you should be getting exceptions interactively in your repl if they're not being thrown by Onyx at runtime.

See the Logging section of the User Guide, and the Logging file in Onyx, which controls Timbre.

I'm going to close since I couldn't reproduce it. Reopen if you have any more questions. Thanks!

@ghost
Copy link
Author

ghost commented Apr 7, 2016

when i run it in repl, the exception occur,
but when i run by lein run -m onyx-test.core,
it simple hang up and the log did not log the exception

@MichaelDrogalis
Copy link
Contributor

Weird - I'm seeing this too. Thanks for the report. We saw this at one point and figured it was just happening on CircleCI. I'll let you know what I find.

@MichaelDrogalis
Copy link
Contributor

I traced it down to launching the Curator Test Server. Still researching, but it's monkeying with the logs somehow,

@MichaelDrogalis
Copy link
Contributor

I can confirm that this isn't Onyx messing with it. I created a new project, added the following deps:

[[org.clojure/clojure "1.8.0"]                                                                                                              
 [org.apache.curator/curator-framework "2.9.1"]
 [org.apache.curator/curator-test "2.9.1"]
 [org.apache.zookeeper/zookeeper "3.4.6"]

Then ran the following program with lein run -m:

(ns testing-server.core
  (:import [org.apache.curator.test TestingServer]))

(defn -main [& args]
  (prn "hi")                                                                                                                                                
  (prn (TestingServer. 40231))
  (prn (/ 1 0))
  (prn "bye"))

It silently hangs.

@gardnervickers
Copy link
Member

Looks like the exceptions are getting swallowed by the log4j config that curator's bringing. A fix for now would be to specify a logging configuration.

(import '[org.apache.log4j BasicConfigurator])
and then your main method (or somewhere) needs to call

(defn -main [& args]
  (BasicConfigurator/configure)
  (prn "hi")
  (prn (TestingServer. 40231))
  (prn (/ 1 0))
  (prn "bye"))

The process still hangs though, something inside of TestServer is not getting cleaned up properly.

@MichaelDrogalis
Copy link
Contributor

@gardnervickers I reproduced what you said. The process came down when I called .close on the server after the presence of an exception. Our ZooKeeper component calls .close at all times, so I think we're safe here. I pushed a fix in #565. I guess the question is whether or not its appropriate for us to invoke Log4J in the spot we are. I don't know enough about the Java logging ecosystem to say.

I'll test it out and see what happens to the logs.

MichaelDrogalis added a commit that referenced this issue Apr 7, 2016
Addresses #563 by configuring Log4J not to swallow exceptions.
@MichaelDrogalis
Copy link
Contributor

Fixed. Staged for release in 0.9.1.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants