Skip to content

Commit

Permalink
more doc fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
fjy committed Feb 17, 2016
1 parent 368988d commit 7da6594
Show file tree
Hide file tree
Showing 9 changed files with 56 additions and 30 deletions.
1 change: 1 addition & 0 deletions NOTICE
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
Druid - a distributed column store.
Copyright 2012-2016 Metamarkets Group Inc.
Copyright 2015-2016 Yahoo! Inc.
Copyright 2015-2016 Imply Data, Inc.

-------------------------------------------------------------------------------

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ More information about Druid can be found on <http://www.druid.io>.

### Documentation

You can find the [latest Druid Documentation](http://druid.io/docs/latest/) on
You can find the [documentation for the latest Druid release](http://druid.io/docs/latest/) on
the [project website](http://druid.io/docs/latest/).

If you would like to contribute documentation, please do so under
Expand Down
5 changes: 5 additions & 0 deletions docs/content/development/experimental.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
---
layout: doc_page
---

# About Experimental Features

Experimental features are features we have developed but have not fully tested in a production environment. If you choose to try them out, there will likely be edge cases that we have not covered. We would love feedback on any of these features, whether they are bug reports, suggestions for improvement, or letting us know they work as intended.

<div class="note caution">
APIs for experimental features may change in backwards incompatible ways.
</div>

To enable experimental features, include their artifacts in the configuration runtime.properties file, e.g.,

Expand Down
2 changes: 1 addition & 1 deletion docs/content/ingestion/data-formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ We welcome any contributions to new formats.

## Formatting the Data

The following are some samples of the data used in the [Wikipedia example](../tutorials/tutorial-loading-streaming-data.html).
The following are some samples of the data used in the [Wikipedia example](../tutorials/quickstart.html).

_JSON_

Expand Down
6 changes: 3 additions & 3 deletions docs/content/ingestion/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ A Druid ingestion spec consists of 3 components:

```json
{
"dataSchema" : {...}
"ioConfig" : {...}
"dataSchema" : {...},
"ioConfig" : {...},
"tuningConfig" : {...}
}
```
Expand Down Expand Up @@ -93,7 +93,7 @@ If `type` is not included, the parser defaults to `string`.

### Avro Stream Parser

This is for realtime ingestion. Make sure to include "io.druid.extensions:druid-avro-extensions" as an extension.
This is for realtime ingestion. Make sure to include `druid-avro-extensions` as an extension.

| Field | Type | Description | Required |
|-------|------|-------------|----------|
Expand Down
25 changes: 16 additions & 9 deletions docs/content/operations/other-hadoop.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
layout: doc_page
---
# Work with different versions of Hadoop
# Working with different versions of Hadoop

## Include Hadoop dependencies
## Including Hadoop dependencies

There are two different ways to let Druid pick up your Hadoop version, choose the one that fits your need.

Expand All @@ -13,15 +13,22 @@ You can create a Hadoop dependency directory and tell Druid to load your Hadoop

To make this work, follow the steps below

(1) Specify `druid.extensions.hadoopDependenciesDir` (root directory for Hadoop related dependencies). If you don't specify it, Druid will use its default value, see [Configuration](../configuration/index.html).
(1) Specify `druid.extensions.hadoopDependenciesDir` (root directory for Hadoop related dependencies) in your `common.runtime.properties` file. If you don't
specify it, Druid will use a default value. See [Configuration](../configuration/index.html) for more details.

(2) Set-up Hadoop dependencies directories under root Hadoop dependency directory. Under the root directory, you should create sub-directories for each Hadoop dependencies. Inside each sub-directory, created a sub-sub-directory whose name is the version of Hadoop it contains, and inside that sub-sub-directory, put Hadoop jars in it. This file structure is almost same as normal Druid extensions described in [Including-Extensions](../including-extensions.html), except that there is an extra layer of folder that specifies the version of Hadoop. (If you don't want to manually setup this directory, Druid also provides a [pull-deps](../pull-deps.html) tool that can help you generate these directories automatically)
(2) Set up the Hadoop dependencies directories under root Hadoop dependency directory. Under the root directory, you should
create sub-directories for each Hadoop dependencies. Inside each sub-directory, created a sub-sub-directory whose name
is the version of Hadoop it contains, and inside that sub-sub-directory, put Hadoop jars in it. This file structure is
almost same as normal Druid extensions described in [Including Extensions](../operations/including-extensions.html),
except that there is an extra layer of folder that specifies the version of Hadoop. (If you don't want to manually setup
this directory, Druid also provides a [pull-deps](../operations/pull-deps.html) tool that can help you generate these
directories automatically).

Example:

Suppose you specify `druid.extensions.hadoopDependenciesDir=/usr/local/druid/hadoop-dependencies`, and you want to prepare both `hadoop-client` 2.3.0 and 2.4.0 for Druid,

Then you can either use [pull-deps](../pull-deps.html) or manually set up Hadoop dependencies directories such that under ```hadoop-dependencies```, it looks like this,
Then you can either use [pull-deps](../operations/pull-deps.html) or manually set up Hadoop dependencies directories such that under ```hadoop-dependencies```, it looks like this:

```
hadoop-dependencies/
Expand All @@ -44,7 +51,7 @@ hadoop-dependencies/
..... lots of jars
```

As you can see, under ```hadoop-client```, there are two sub-directories, each denotes a version of ```hadoop-client```. During runtime, Druid will look for these directories and load appropriate ```hadoop-client``` based on `hadoopDependencyCoordinates` passed to [Hadoop Index Task](../ingestion/tasks.html).
As you can see, under ```hadoop-client```, there are two sub-directories, each denotes a version of ```hadoop-client```. During runtime, Druid will look for these directories and load appropriate ```hadoop-client``` based on `hadoopDependencyCoordinates` passed to the [Hadoop Index Task](../ingestion/tasks.html).

### Append your Hadoop jars to the Druid classpath

Expand All @@ -54,17 +61,17 @@ If you really don't like the way above, and you just want to use one specific Ha

(2) Append your Hadoop jars to the classpath, Druid will load them into the system. This mechanism is relatively easy to reason about, but it also means that you have to ensure that all dependency jars on the classpath are compatible. That is, Druid makes no provisions while using this method to maintain class loader isolation so you must make sure that the jars on your classpath are mutually compatible.

## Working with Hadoop 2.x
#### Hadoop 2.x

The default version of Hadoop bundled with Druid is 2.3.

To override the default Hadoop version, both the Hadoop Index Task and the standalone Hadoop indexer support the parameter `hadoopDependencyCoordinates`(See [Index Hadoop Task](../ingestion/tasks.html). You can pass another set of Hadoop coordinates through this parameter (e.g. You can specify coordinates for Hadoop 2.4.0 as `["org.apache.hadoop:hadoop-client:2.4.0"]`), which will overwrite the default Hadoop coordinates Druid uses.
To override the default Hadoop version, both the Hadoop Index Task and the standalone Hadoop indexer support the parameter `hadoopDependencyCoordinates`(See [Index Hadoop Task](../ingestion/tasks.html)). You can pass another set of Hadoop coordinates through this parameter (e.g. You can specify coordinates for Hadoop 2.4.0 as `["org.apache.hadoop:hadoop-client:2.4.0"]`), which will overwrite the default Hadoop coordinates Druid uses.

The Hadoop Index Task takes this parameter has part of the task JSON and the standalone Hadoop indexer takes this parameter as a command line argument.

If you are still having problems, include all relevant hadoop jars at the beginning of the classpath of your indexing or historical nodes.

## Working with CDH
#### CDH

Members of the community have reported dependency conflicts between the version of Jackson used in CDH and Druid. Currently, our best workaround is to edit Druid's pom.xml dependencies to match the version of Jackson in your Hadoop version and recompile Druid.

Expand Down
29 changes: 16 additions & 13 deletions docs/content/operations/pull-deps.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
layout: doc_page
---

# pull-deps Tool

`pull-deps` is a tool that can pull down dependencies to the local repository and lay dependencies out into the extension directory as needed.
Expand All @@ -9,31 +10,31 @@ layout: doc_page

`-c` or `--coordinate` (Can be specified multiply times)

Extension coordinate to pull down, followed by a maven coordinate, e.g. io.druid.extensions:mysql-metadata-storage
Extension coordinate to pull down, followed by a maven coordinate, e.g. io.druid.extensions:mysql-metadata-storage

`-h` or `--hadoop-coordinate` (Can be specified multiply times)

Hadoop dependency to pull down, followed by a maven coordinate, e.g. org.apache.hadoop:hadoop-client:2.4.0
Hadoop dependency to pull down, followed by a maven coordinate, e.g. org.apache.hadoop:hadoop-client:2.4.0

`--no-default-hadoop`

Don't pull down the default hadoop coordinate, i.e., org.apache.hadoop:hadoop-client:2.3.0. If `-h` option is supplied, then default hadoop coordinate will not be downloaded.
Don't pull down the default hadoop coordinate, i.e., org.apache.hadoop:hadoop-client:2.3.0. If `-h` option is supplied, then default hadoop coordinate will not be downloaded.

`--clean`

Remove exisiting extension and hadoop dependencies directories before pulling down dependencies.
Remove exisiting extension and hadoop dependencies directories before pulling down dependencies.

`-l` or `--localRepository`

A local repostiry that Maven will use to put downloaded files. Then pull-deps will lay these files out into the extensions directory as needed.
A local repostiry that Maven will use to put downloaded files. Then pull-deps will lay these files out into the extensions directory as needed.

`-r` or `--remoteRepository`

Add a remote repository to the default remote repository list, which includes https://repo1.maven.org/maven2/ and https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local
Add a remote repository to the default remote repository list, which includes https://repo1.maven.org/maven2/ and https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local

`-d` or `--defaultVersion`

Version to use for extension coordinate that doesn't have a version information. For example, if extension coordinate is `io.druid.extensions:mysql-metadata-storage`, and default version is `0.8.0`, then this coordinate will be treated as `io.druid.extensions:mysql-metadata-storage:0.8.0`
Version to use for extension coordinate that doesn't have a version information. For example, if extension coordinate is `io.druid.extensions:mysql-metadata-storage`, and default version is `0.9.0`, then this coordinate will be treated as `io.druid.extensions:mysql-metadata-storage:0.9.0`

To run `pull-deps`, you should

Expand All @@ -43,9 +44,9 @@ To run `pull-deps`, you should

Example:

Suppose you want to download ```druid-examples```, ```mysql-metadata-storage``` and ```hadoop-client```(both 2.3.0 and 2.4.0) with a specific version, you can run `pull-deps` command with `-c io.druid.extensions:druid-examples:0.8.0`, `-c io.druid.extensions:mysql-metadata-storage:0.8.0`, `-h org.apache.hadoop:hadoop-client:2.3.0` and `-h org.apache.hadoop:hadoop-client:2.4.0`, an example command would be:
Suppose you want to download ```druid-examples```, ```mysql-metadata-storage``` and ```hadoop-client```(both 2.3.0 and 2.4.0) with a specific version, you can run `pull-deps` command with `-c io.druid.extensions:druid-examples:0.9.0`, `-c io.druid.extensions:mysql-metadata-storage:0.9.0`, `-h org.apache.hadoop:hadoop-client:2.3.0` and `-h org.apache.hadoop:hadoop-client:2.4.0`, an example command would be:

```java -classpath "/my/druid/library/*" io.druid.cli.Main tools pull-deps --clean -c io.druid.extensions:mysql-metadata-storage:0.8.0 -c io.druid.extensions:druid-examples:0.8.0 -h org.apache.hadoop:hadoop-client:2.3.0 -h org.apache.hadoop:hadoop-client:2.4.0```
```java -classpath "/my/druid/library/*" io.druid.cli.Main tools pull-deps --clean -c io.druid.extensions:mysql-metadata-storage:0.9.0 -c io.druid.extensions:druid-examples:0.9.0 -h org.apache.hadoop:hadoop-client:2.3.0 -h org.apache.hadoop:hadoop-client:2.4.0```

Because `--clean` is supplied, this command will first remove the directories specified at `druid.extensions.directory` and `druid.extensions.hadoopDependenciesDir`, then recreate them and start downloading the extensions there. After finishing downloading, if you go to the extension directories you specified, you will see

Expand All @@ -57,14 +58,14 @@ extensions
│   ├── commons-digester-1.8.jar
│   ├── commons-logging-1.1.1.jar
│   ├── commons-validator-1.4.0.jar
│   ├── druid-examples-0.8.0.jar
│   ├── druid-examples-0.9.0.jar
│   ├── twitter4j-async-3.0.3.jar
│   ├── twitter4j-core-3.0.3.jar
│   └── twitter4j-stream-3.0.3.jar
└── mysql-metadata-storage
├── jdbi-2.32.jar
├── mysql-connector-java-5.1.34.jar
└── mysql-metadata-storage-0.8.0.jar
└── mysql-metadata-storage-0.9.0.jar
```

```
Expand All @@ -89,6 +90,8 @@ hadoop-dependencies/
..... lots of jars
```

Note that if you specify `--defaultVersion`, you don't have to put version information in the coordinate. For example, if you want both `druid-examples` and `mysql-metadata-storage` to use version `0.8.0`, you can change the command above to
Note that if you specify `--defaultVersion`, you don't have to put version information in the coordinate. For example, if you want both `druid-examples` and `mysql-metadata-storage` to use version `0.9.0`, you can change the command above to

```java -classpath "/my/druid/library/*" io.druid.cli.Main tools pull-deps --defaultVersion 0.8.0 --clean -c io.druid.extensions:mysql-metadata-storage -c io.druid.extensions:druid-examples -h org.apache.hadoop:hadoop-client:2.3.0 -h org.apache.hadoop:hadoop-client:2.4.0```
```
java -classpath "/my/druid/library/*" io.druid.cli.Main tools pull-deps --defaultVersion 0.9.0 --clean -c io.druid.extensions:mysql-metadata-storage -c io.druid.extensions:druid-examples -h org.apache.hadoop:hadoop-client:2.3.0 -h org.apache.hadoop:hadoop-client:2.4.0
```
4 changes: 4 additions & 0 deletions docs/content/querying/dimensionspecs.md
Original file line number Diff line number Diff line change
Expand Up @@ -396,6 +396,10 @@ or without setting "locale" (in this case, the current value of the default loca

### Lookup DimensionSpecs

<div class="note caution">
Lookups are an <a href="../development/experimental.html">experimental</a> feature.
</div>

Lookup DimensionSpecs can be used to define directly a lookup implementation as dimension spec.
Generally speaking there is two different kind of lookups implementations.
The first kind is passed at the query time like `map` implementation.
Expand Down
12 changes: 9 additions & 3 deletions docs/content/querying/lookups.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@ layout: doc_page
---
# Lookups

<div class="note caution">
Lookups are an <a href="../development/experimental.html">experimental</a> feature.
</div>

Lookups are a concept in Druid where dimension values are (optionally) replaced with new values.
See [dimension specs](../querying/dimensionspecs.html) for more information. For the purpose of these documents,
a "key" refers to a dimension value to match, and a "value" refers to its replacement.
Expand Down Expand Up @@ -61,8 +65,8 @@ described as per the sections on this page. For example:
]
```

Proper functionality of Namespaced lookups requires the following extension to be loaded on the broker, peon, and historical nodes:
`io.druid.extensions:druid-namespace-lookup`
Proper functionality of Namespaced lookups requires the following extension to be loaded on the broker, peon, and historical nodes:
`druid-namespace-lookup`

## Cache Settings

Expand Down Expand Up @@ -287,6 +291,8 @@ The following are the handling for kafka consumer properties in `druid.query.ren

To test this setup, you can send key/value pairs to a kafka stream via the following producer console:

`./bin/kafka-console-producer.sh --property parse.key=true --property key.separator="->" --broker-list localhost:9092 --topic testTopic`
```
./bin/kafka-console-producer.sh --property parse.key=true --property key.separator="->" --broker-list localhost:9092 --topic testTopic
```

Renames can then be published as `OLD_VAL->NEW_VAL` followed by newline (enter or return)

0 comments on commit 7da6594

Please sign in to comment.