GitHub - geomesa/geomesa-twitter: Collect and ingest twitter streaming data into geomesa

GeoMesa Twitter

GeoMesa Twitter provides three example projects to collect, ingest, and analyze Twitter data in GeoMesa.

Getting Started

You'll first need to clone and build the GeoMesa 1.0.0-SNAPSHOT from LocationTech GeoMesa Github.

Checkout, build, and install GeoMesa with maven:

git clone [email protected]:locationtech/geomesa.git
cd geomesa
mvn clean install

Next clone geomesa-twitter and build geomesa-twitter:

git clone [email protected]:geomesa/geomesa-twitter.git
cd geomesa-twitter
mvn clean install

Collected Twitter Data

Sign up for twitter and get developer keys...more info coming soon

Ingesting Twitter into GeoMesa

Run the ingest jar to see a list of arguments for ingest:

$> java -jar twitter-ingest/target/twitter-ingest-accumulo1.5-1.0-SNAPSHOT.jar 
0 [main] INFO geomesa.example.twitter.ingest.Runner  - Error parsing arguments: The following options are required: --instanceId, -i --zookeepers, -z --tableName, -t --password, -p --user, -u --featureName, -f 
Usage: <main class> [options] files
  Options:
  * --featureName, -f
       featureName to assign to the data
        --indexSchemaFormat
       Schema for indexing data
  * --instanceId, -i
       Name of the Accumulo Instance
  * --password, -p
       Accumulo password
    --shards, -s
       number of shards to use for data
  * --tableName, -t
       Accumulo table name
        --useExtendedFeatures
       parse extended features or the minimal set
  * --user, -u
       Accumulo user name
  * --zookeepers, -z
       Comma separated list of zookeepers

After collecting some twitter data, you can ingest files which have one tweet as JSON per line:

java -jar twitter-ingest/target/twitter-ingest-accumulo1.5-1.0-SNAPSHOT.jar -u user -p password -z zoo1,zoo2,zoo3 -t geomesa_catalog -i my instance -f twitter tweet_file1.json tweet_file2.json tweet_file3.json

Running the Spark Job

After setting up spark on your cluster, copy the shade jar over and run it:

/opt/spark/bin/spark-submit --master yarn-client --num-executors 40 --executor-cores 4 twitter-spark-accumulo1.5-1.0-SNAPSHOT.jar  --deploy-mode client --class geomesa.example.twitter.spark.Runner twitter

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
twitter-collect		twitter-collect
twitter-ingest		twitter-ingest
twitter-spark		twitter-spark
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GeoMesa Twitter

Getting Started

Collected Twitter Data

Ingesting Twitter into GeoMesa

Running the Spark Job

About

Releases

Packages

Contributors 4

Languages

License

geomesa/geomesa-twitter

Folders and files

Latest commit

History

Repository files navigation

GeoMesa Twitter

Getting Started

Collected Twitter Data

Ingesting Twitter into GeoMesa

Running the Spark Job

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages