-
Notifications
You must be signed in to change notification settings - Fork 207
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
00b91c7
commit 1c909ef
Showing
1 changed file
with
44 additions
and
31 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,47 +1,60 @@ | ||
Taming Text, by Grant Ingersoll, Thomas Morton and Drew Farris is designed to teach software engineers the basic concepts of | ||
working with text to solve search and Natural Language Processing problems. The book focuses on teaching using existing | ||
open source libraries like Apache Solr, Apache Mahout and Apache OpenNLP to manipulate text. To learn more, visit http://www.manning.com/ingersoll. | ||
Taming Text, by Grant Ingersoll, Thomas Morton and Drew Farris is | ||
designed to teach software engineers the basic concepts of working | ||
with text to solve search and Natural Language Processing problems. | ||
The book focuses on teaching using existing open source libraries like | ||
Apache Solr, Apache Mahout and Apache OpenNLP to manipulate text. To | ||
learn more, visit http://www.manning.com/ingersoll. | ||
|
||
Getting Started | ||
--------------- | ||
|
||
Throughout this document, TT_HOME is the directory containing the checkout of the Taming Text code base. | ||
Throughout this document, TT_HOME is the directory containing the | ||
checkout of the Taming Text code base. | ||
|
||
Taming Text uses Maven for building and running the code. To get started, you will | ||
need: | ||
Taming Text uses Maven for building and running the code. To get | ||
started, you will need: | ||
|
||
1. JDK 1.6+ | ||
2. Maven 3.0 or higher | ||
3. The OpenNLP English models, available at http://maven.tamingtext.com/opennlp-models/models-1.5. Place them in the TT_HOME directory in a directory named opennlp-models. | ||
This can be done by using the following commands on UNIX: | ||
From the TT_HOME directory: | ||
mkdir opennlp-models | ||
cd opennlp-models | ||
wget -nd -np -r http://maven.tamingtext.com/opennlp-models/models-1.5/ | ||
rm index.html* | ||
1. JDK 1.6+ | ||
2. Maven 3.0 or higher | ||
3. The OpenNLP English models, available at | ||
http://maven.tamingtext.com/opennlp-models/models-1.5. | ||
|
||
4. Get WordNet 3.0 and place it in the TT_HOME directory. | ||
This can be done by using the following commands on UNIX: | ||
From the TT_HOME directory: | ||
wget -nd -np -m http://maven.tamingtext.com/wordnet/ | ||
rm index.html* | ||
tar -xf Wordnet-3.0.tar.gz | ||
Place the models in the TT_HOME directory in a directory named | ||
opennlp-models. | ||
|
||
This can be done by using the following commands on UNIX from the | ||
TT_HOME directory: | ||
|
||
Building the Source | ||
mkdir opennlp-models | ||
cd opennlp-models | ||
wget -nd -np -r http://maven.tamingtext.com/opennlp-models/models-1.5/ | ||
rm index.html* | ||
|
||
To build the source, in TT_HOME: | ||
4. Get WordNet 3.0 and place it in the TT_HOME directory. | ||
|
||
This can be done by using the following commands on UNIX from the | ||
TT_HOME directory: | ||
|
||
1. mvn compile | ||
wget -nd -np -m http://maven.tamingtext.com/wordnet/ | ||
rm index.html* | ||
tar -xf Wordnet-3.0.tar.gz | ||
|
||
Running the Tests | ||
Building the Source | ||
------------------- | ||
|
||
1. mvn test | ||
To build the source, in TT_HOME: | ||
|
||
Next Steps | ||
======= | ||
mvn clean package | ||
|
||
* mvn package // Prepares the jar files, etc. for execution | ||
Running the Examples | ||
-------------------- | ||
|
||
* Many of the examples can be run via the 'tt' script in the TT_HOME/bin directory. Running this script without arguments will display a list of the example names. | ||
Many of the examples can be run via the 'tt' script in the TT_HOME/bin | ||
directory. Running this script without arguments will display a list | ||
of the example names. | ||
|
||
* Some of the samples are powered by pre-configured instances of solr. These can be started with the TT_HOME/bin/start-solr.sh script, which takes a single argument, the name of the instance to start. Available instances include solr-qa, solr-clustering and solr-tagging. | ||
Some of the samples are powered by pre-configured instances of | ||
solr. These can be started with the TT_HOME/bin/start-solr.sh script, | ||
which takes a single argument, the name of the instance to | ||
start. Available instances include solr-qa, solr-clustering and | ||
solr-tagging. |