-
Notifications
You must be signed in to change notification settings - Fork 17
jmhsieh/elasticflume
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Using ElasticSearch Flume integration Pre-Conditions: * have Flume installed, or at least cloned from the Flume git repo, if not, go here http://github.com/cloudera/flume , and build it (currently using 'ant', but follow their docs). From here on, this Flume directory will be referred to as FLUME_HOME * Have ElasticSearch installed locally, we'll assume that from a Getting Started point of view you have a local ElasticSearch server running locally, if not go here http://github.com/elasticsearch/elasticsearch Getting Started with elasticflume 0. First, setup some environment variables to your local paths, to make the following steps simpler: export FLUME_HOME=<path to where you have Flume checkedout/installed> export ELASTICSEARCH_HOME=<path to where you have ElasticSearch checked out> export ELASTICFLUME_HOME=path to where you have elasticflume checked out> (Be careful with these last 2 env vars because they are deceivingly similar) 1. Build it using Maven: 1.1 Install the Flume library into your local Maven repo (because it's not available in central) Note: the below assumes you have done a 'git clone' of the Flume source, and have built it. mvn install:install-file -DgroupId=com.cloudera -DartifactId=flume -Dversion=0.9.1-dev -Dclassifier=core -Dfile=$FLUME_HOME/build/flume-0.9.1-dev-core.jar -Dpackaging=jar 1.2 Build elasticflume cd $ELASTICFLUME_HOME mvn package 2. Now add the elasticflume jar into the classpath too, I do this personally with a symlink for testing, but copying is probably a better idea.. :): ln -s $ELASTICFLUME_HOME/target/elasticflume-1.0.0-SNAPSHOT-jar-with-dependencies.jar $FLUME_HOME/lib/ 3. Ensure your Flume config is correct, check the $FLUME_HOME/conf/flume-conf.xml correctly identifies your local master, you may have to copy the template file that's in that directory to be 'flume-conf.xml' and then add the following: <property> <name>flume.master.servers</name> <value>localhost</value> <description>A comma-separated list of hostnames, one for each machine in the Flume Master. </description> </property> ... (the above may not be necessary, because it's the default, but I had to do it for some reason). You will also need to register the elasticflume plugin via creating a new a property block: <property> <name>flume.plugin.classes</name> <value>org.elasticsearch.flume.ElasticSearchSink</value> <description>Comma separated list of plugins</description> </property> 4. Startup Flume Master, and Flume nodes, you will need 2 different shells here. cd $FLUME_HOME bin/flume master VERIFY that you see in the startup log for the master the following log line, if you don't see this, you've missed at least Step 3: 2010-09-14 14:20:53,861 [main] INFO conf.SinkFactoryImpl: Found sink builder elasticSearchSink in org.elasticsearch.flume.ElasticSearchSink bin/flume node_nowatch 5. Setup a basic console based source so you can type in data manually and have it indexed (pretending to be a log message) cd $FLUME_HOME bin/flume shell -c localhost -e "exec config localhost 'console' 'elasticSearchSink'" NOTE: For some reason my local testing Flume installaton used a default node name of my IP address, and not 'localhost' which it is often. If things are not working properly, you should check by: bin/flume shell -c localhost -e "getnodestatus" If you see a node listed using an IP address, then you may need to then map that to localhost inside flume with a logical name by doing this: bin/flume shell -c localhost -e "map <IP ADDRESS> localhost" 6. NOW FOR THE TEST! :) In the console window you started the "node_nowatch" above, type (and yes, straight after all those log messages, just start typing, trust me..): hello world hello there good sir (ie. that is, type the 2 lines ensuring you press return after each) 7. Verify you can search for your "Hello World" log, in another console, use curl to search your local elasticsearch node: curl -XGET 'http://localhost:9200/flume/_search?pretty=true' -d ' { "query" : { "term" : { "message" : "hello" } } } ' You should get a pretty printed JSON formatted search results, something like: { "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 1.1976817, "hits" : [ { "_index" : "flume", "_type" : "LOG", "_id" : "4e5a6f5b-1dd3-4bb6-9fd9-c8d785f39680", "_score" : 1.1976817, "_source" : {"message":"hello world","timestamp":"2010-09-14T03:19:36.857Z","host":"192.168.1.170","priority":"INFO"} }, { "_index" : "flume", "_type" : "LOG", "_id" : "c77c18cc-af40-4362-b20b-193e5a3f6ff5", "_score" : 0.8465736, "_source" : {"message":"hello there good sir","timestamp":"2010-09-14T03:28:04.168Z","host":"192.168.1.170","priority":"INFO"} } ] } } 8. Go to the ElasticSearch website and learn all about the REST and other APIs for searching an ElasticSearch index.
About
Integration between Cloudera's Flume and ElasticSearch
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published