From 7da6594bfe8347302eeab1044d1d9214a6e131c2 Mon Sep 17 00:00:00 2001 From: fjy Date: Fri, 12 Feb 2016 14:19:28 -0800 Subject: [PATCH] more doc fixes --- NOTICE | 1 + README.md | 2 +- docs/content/development/experimental.md | 5 ++++ docs/content/ingestion/data-formats.md | 2 +- docs/content/ingestion/index.md | 6 ++--- docs/content/operations/other-hadoop.md | 25 ++++++++++++-------- docs/content/operations/pull-deps.md | 29 +++++++++++++----------- docs/content/querying/dimensionspecs.md | 4 ++++ docs/content/querying/lookups.md | 12 +++++++--- 9 files changed, 56 insertions(+), 30 deletions(-) diff --git a/NOTICE b/NOTICE index 9460784b640f..0b11258a6038 100644 --- a/NOTICE +++ b/NOTICE @@ -1,6 +1,7 @@ Druid - a distributed column store. Copyright 2012-2016 Metamarkets Group Inc. Copyright 2015-2016 Yahoo! Inc. +Copyright 2015-2016 Imply Data, Inc. ------------------------------------------------------------------------------- diff --git a/README.md b/README.md index e4294e95852b..76f5ce8de48d 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ More information about Druid can be found on . ### Documentation -You can find the [latest Druid Documentation](http://druid.io/docs/latest/) on +You can find the [documentation for the latest Druid release](http://druid.io/docs/latest/) on the [project website](http://druid.io/docs/latest/). If you would like to contribute documentation, please do so under diff --git a/docs/content/development/experimental.md b/docs/content/development/experimental.md index 5d52309fd2fa..91c36ebc47ab 100644 --- a/docs/content/development/experimental.md +++ b/docs/content/development/experimental.md @@ -1,9 +1,14 @@ --- layout: doc_page --- + # About Experimental Features + Experimental features are features we have developed but have not fully tested in a production environment. If you choose to try them out, there will likely be edge cases that we have not covered. We would love feedback on any of these features, whether they are bug reports, suggestions for improvement, or letting us know they work as intended. +
+APIs for experimental features may change in backwards incompatible ways. +
To enable experimental features, include their artifacts in the configuration runtime.properties file, e.g., diff --git a/docs/content/ingestion/data-formats.md b/docs/content/ingestion/data-formats.md index 598270f4e9ee..afdeb8564ab5 100644 --- a/docs/content/ingestion/data-formats.md +++ b/docs/content/ingestion/data-formats.md @@ -9,7 +9,7 @@ We welcome any contributions to new formats. ## Formatting the Data -The following are some samples of the data used in the [Wikipedia example](../tutorials/tutorial-loading-streaming-data.html). +The following are some samples of the data used in the [Wikipedia example](../tutorials/quickstart.html). _JSON_ diff --git a/docs/content/ingestion/index.md b/docs/content/ingestion/index.md index 986ef75b2638..3a4c424aecf6 100644 --- a/docs/content/ingestion/index.md +++ b/docs/content/ingestion/index.md @@ -8,8 +8,8 @@ A Druid ingestion spec consists of 3 components: ```json { - "dataSchema" : {...} - "ioConfig" : {...} + "dataSchema" : {...}, + "ioConfig" : {...}, "tuningConfig" : {...} } ``` @@ -93,7 +93,7 @@ If `type` is not included, the parser defaults to `string`. ### Avro Stream Parser -This is for realtime ingestion. Make sure to include "io.druid.extensions:druid-avro-extensions" as an extension. +This is for realtime ingestion. Make sure to include `druid-avro-extensions` as an extension. | Field | Type | Description | Required | |-------|------|-------------|----------| diff --git a/docs/content/operations/other-hadoop.md b/docs/content/operations/other-hadoop.md index e0c10859a1a0..6dac44cc5a57 100644 --- a/docs/content/operations/other-hadoop.md +++ b/docs/content/operations/other-hadoop.md @@ -1,9 +1,9 @@ --- layout: doc_page --- -# Work with different versions of Hadoop +# Working with different versions of Hadoop -## Include Hadoop dependencies +## Including Hadoop dependencies There are two different ways to let Druid pick up your Hadoop version, choose the one that fits your need. @@ -13,15 +13,22 @@ You can create a Hadoop dependency directory and tell Druid to load your Hadoop To make this work, follow the steps below -(1) Specify `druid.extensions.hadoopDependenciesDir` (root directory for Hadoop related dependencies). If you don't specify it, Druid will use its default value, see [Configuration](../configuration/index.html). +(1) Specify `druid.extensions.hadoopDependenciesDir` (root directory for Hadoop related dependencies) in your `common.runtime.properties` file. If you don't +specify it, Druid will use a default value. See [Configuration](../configuration/index.html) for more details. -(2) Set-up Hadoop dependencies directories under root Hadoop dependency directory. Under the root directory, you should create sub-directories for each Hadoop dependencies. Inside each sub-directory, created a sub-sub-directory whose name is the version of Hadoop it contains, and inside that sub-sub-directory, put Hadoop jars in it. This file structure is almost same as normal Druid extensions described in [Including-Extensions](../including-extensions.html), except that there is an extra layer of folder that specifies the version of Hadoop. (If you don't want to manually setup this directory, Druid also provides a [pull-deps](../pull-deps.html) tool that can help you generate these directories automatically) +(2) Set up the Hadoop dependencies directories under root Hadoop dependency directory. Under the root directory, you should +create sub-directories for each Hadoop dependencies. Inside each sub-directory, created a sub-sub-directory whose name +is the version of Hadoop it contains, and inside that sub-sub-directory, put Hadoop jars in it. This file structure is +almost same as normal Druid extensions described in [Including Extensions](../operations/including-extensions.html), +except that there is an extra layer of folder that specifies the version of Hadoop. (If you don't want to manually setup +this directory, Druid also provides a [pull-deps](../operations/pull-deps.html) tool that can help you generate these +directories automatically). Example: Suppose you specify `druid.extensions.hadoopDependenciesDir=/usr/local/druid/hadoop-dependencies`, and you want to prepare both `hadoop-client` 2.3.0 and 2.4.0 for Druid, -Then you can either use [pull-deps](../pull-deps.html) or manually set up Hadoop dependencies directories such that under ```hadoop-dependencies```, it looks like this, +Then you can either use [pull-deps](../operations/pull-deps.html) or manually set up Hadoop dependencies directories such that under ```hadoop-dependencies```, it looks like this: ``` hadoop-dependencies/ @@ -44,7 +51,7 @@ hadoop-dependencies/ ..... lots of jars ``` -As you can see, under ```hadoop-client```, there are two sub-directories, each denotes a version of ```hadoop-client```. During runtime, Druid will look for these directories and load appropriate ```hadoop-client``` based on `hadoopDependencyCoordinates` passed to [Hadoop Index Task](../ingestion/tasks.html). +As you can see, under ```hadoop-client```, there are two sub-directories, each denotes a version of ```hadoop-client```. During runtime, Druid will look for these directories and load appropriate ```hadoop-client``` based on `hadoopDependencyCoordinates` passed to the [Hadoop Index Task](../ingestion/tasks.html). ### Append your Hadoop jars to the Druid classpath @@ -54,17 +61,17 @@ If you really don't like the way above, and you just want to use one specific Ha (2) Append your Hadoop jars to the classpath, Druid will load them into the system. This mechanism is relatively easy to reason about, but it also means that you have to ensure that all dependency jars on the classpath are compatible. That is, Druid makes no provisions while using this method to maintain class loader isolation so you must make sure that the jars on your classpath are mutually compatible. -## Working with Hadoop 2.x +#### Hadoop 2.x The default version of Hadoop bundled with Druid is 2.3. -To override the default Hadoop version, both the Hadoop Index Task and the standalone Hadoop indexer support the parameter `hadoopDependencyCoordinates`(See [Index Hadoop Task](../ingestion/tasks.html). You can pass another set of Hadoop coordinates through this parameter (e.g. You can specify coordinates for Hadoop 2.4.0 as `["org.apache.hadoop:hadoop-client:2.4.0"]`), which will overwrite the default Hadoop coordinates Druid uses. +To override the default Hadoop version, both the Hadoop Index Task and the standalone Hadoop indexer support the parameter `hadoopDependencyCoordinates`(See [Index Hadoop Task](../ingestion/tasks.html)). You can pass another set of Hadoop coordinates through this parameter (e.g. You can specify coordinates for Hadoop 2.4.0 as `["org.apache.hadoop:hadoop-client:2.4.0"]`), which will overwrite the default Hadoop coordinates Druid uses. The Hadoop Index Task takes this parameter has part of the task JSON and the standalone Hadoop indexer takes this parameter as a command line argument. If you are still having problems, include all relevant hadoop jars at the beginning of the classpath of your indexing or historical nodes. -## Working with CDH +#### CDH Members of the community have reported dependency conflicts between the version of Jackson used in CDH and Druid. Currently, our best workaround is to edit Druid's pom.xml dependencies to match the version of Jackson in your Hadoop version and recompile Druid. diff --git a/docs/content/operations/pull-deps.md b/docs/content/operations/pull-deps.md index 9c1a87b6b9c3..ad5c794f24b9 100644 --- a/docs/content/operations/pull-deps.md +++ b/docs/content/operations/pull-deps.md @@ -1,6 +1,7 @@ --- layout: doc_page --- + # pull-deps Tool `pull-deps` is a tool that can pull down dependencies to the local repository and lay dependencies out into the extension directory as needed. @@ -9,31 +10,31 @@ layout: doc_page `-c` or `--coordinate` (Can be specified multiply times) - Extension coordinate to pull down, followed by a maven coordinate, e.g. io.druid.extensions:mysql-metadata-storage +Extension coordinate to pull down, followed by a maven coordinate, e.g. io.druid.extensions:mysql-metadata-storage `-h` or `--hadoop-coordinate` (Can be specified multiply times) - Hadoop dependency to pull down, followed by a maven coordinate, e.g. org.apache.hadoop:hadoop-client:2.4.0 +Hadoop dependency to pull down, followed by a maven coordinate, e.g. org.apache.hadoop:hadoop-client:2.4.0 `--no-default-hadoop` - Don't pull down the default hadoop coordinate, i.e., org.apache.hadoop:hadoop-client:2.3.0. If `-h` option is supplied, then default hadoop coordinate will not be downloaded. +Don't pull down the default hadoop coordinate, i.e., org.apache.hadoop:hadoop-client:2.3.0. If `-h` option is supplied, then default hadoop coordinate will not be downloaded. `--clean` - Remove exisiting extension and hadoop dependencies directories before pulling down dependencies. +Remove exisiting extension and hadoop dependencies directories before pulling down dependencies. `-l` or `--localRepository` - A local repostiry that Maven will use to put downloaded files. Then pull-deps will lay these files out into the extensions directory as needed. +A local repostiry that Maven will use to put downloaded files. Then pull-deps will lay these files out into the extensions directory as needed. `-r` or `--remoteRepository` - Add a remote repository to the default remote repository list, which includes https://repo1.maven.org/maven2/ and https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local +Add a remote repository to the default remote repository list, which includes https://repo1.maven.org/maven2/ and https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local `-d` or `--defaultVersion` - Version to use for extension coordinate that doesn't have a version information. For example, if extension coordinate is `io.druid.extensions:mysql-metadata-storage`, and default version is `0.8.0`, then this coordinate will be treated as `io.druid.extensions:mysql-metadata-storage:0.8.0` +Version to use for extension coordinate that doesn't have a version information. For example, if extension coordinate is `io.druid.extensions:mysql-metadata-storage`, and default version is `0.9.0`, then this coordinate will be treated as `io.druid.extensions:mysql-metadata-storage:0.9.0` To run `pull-deps`, you should @@ -43,9 +44,9 @@ To run `pull-deps`, you should Example: -Suppose you want to download ```druid-examples```, ```mysql-metadata-storage``` and ```hadoop-client```(both 2.3.0 and 2.4.0) with a specific version, you can run `pull-deps` command with `-c io.druid.extensions:druid-examples:0.8.0`, `-c io.druid.extensions:mysql-metadata-storage:0.8.0`, `-h org.apache.hadoop:hadoop-client:2.3.0` and `-h org.apache.hadoop:hadoop-client:2.4.0`, an example command would be: +Suppose you want to download ```druid-examples```, ```mysql-metadata-storage``` and ```hadoop-client```(both 2.3.0 and 2.4.0) with a specific version, you can run `pull-deps` command with `-c io.druid.extensions:druid-examples:0.9.0`, `-c io.druid.extensions:mysql-metadata-storage:0.9.0`, `-h org.apache.hadoop:hadoop-client:2.3.0` and `-h org.apache.hadoop:hadoop-client:2.4.0`, an example command would be: -```java -classpath "/my/druid/library/*" io.druid.cli.Main tools pull-deps --clean -c io.druid.extensions:mysql-metadata-storage:0.8.0 -c io.druid.extensions:druid-examples:0.8.0 -h org.apache.hadoop:hadoop-client:2.3.0 -h org.apache.hadoop:hadoop-client:2.4.0``` +```java -classpath "/my/druid/library/*" io.druid.cli.Main tools pull-deps --clean -c io.druid.extensions:mysql-metadata-storage:0.9.0 -c io.druid.extensions:druid-examples:0.9.0 -h org.apache.hadoop:hadoop-client:2.3.0 -h org.apache.hadoop:hadoop-client:2.4.0``` Because `--clean` is supplied, this command will first remove the directories specified at `druid.extensions.directory` and `druid.extensions.hadoopDependenciesDir`, then recreate them and start downloading the extensions there. After finishing downloading, if you go to the extension directories you specified, you will see @@ -57,14 +58,14 @@ extensions │   ├── commons-digester-1.8.jar │   ├── commons-logging-1.1.1.jar │   ├── commons-validator-1.4.0.jar -│   ├── druid-examples-0.8.0.jar +│   ├── druid-examples-0.9.0.jar │   ├── twitter4j-async-3.0.3.jar │   ├── twitter4j-core-3.0.3.jar │   └── twitter4j-stream-3.0.3.jar └── mysql-metadata-storage ├── jdbi-2.32.jar ├── mysql-connector-java-5.1.34.jar - └── mysql-metadata-storage-0.8.0.jar + └── mysql-metadata-storage-0.9.0.jar ``` ``` @@ -89,6 +90,8 @@ hadoop-dependencies/ ..... lots of jars ``` -Note that if you specify `--defaultVersion`, you don't have to put version information in the coordinate. For example, if you want both `druid-examples` and `mysql-metadata-storage` to use version `0.8.0`, you can change the command above to +Note that if you specify `--defaultVersion`, you don't have to put version information in the coordinate. For example, if you want both `druid-examples` and `mysql-metadata-storage` to use version `0.9.0`, you can change the command above to -```java -classpath "/my/druid/library/*" io.druid.cli.Main tools pull-deps --defaultVersion 0.8.0 --clean -c io.druid.extensions:mysql-metadata-storage -c io.druid.extensions:druid-examples -h org.apache.hadoop:hadoop-client:2.3.0 -h org.apache.hadoop:hadoop-client:2.4.0``` +``` +java -classpath "/my/druid/library/*" io.druid.cli.Main tools pull-deps --defaultVersion 0.9.0 --clean -c io.druid.extensions:mysql-metadata-storage -c io.druid.extensions:druid-examples -h org.apache.hadoop:hadoop-client:2.3.0 -h org.apache.hadoop:hadoop-client:2.4.0 +``` diff --git a/docs/content/querying/dimensionspecs.md b/docs/content/querying/dimensionspecs.md index 1587326ae36d..a707fefc1ced 100644 --- a/docs/content/querying/dimensionspecs.md +++ b/docs/content/querying/dimensionspecs.md @@ -396,6 +396,10 @@ or without setting "locale" (in this case, the current value of the default loca ### Lookup DimensionSpecs +
+Lookups are an experimental feature. +
+ Lookup DimensionSpecs can be used to define directly a lookup implementation as dimension spec. Generally speaking there is two different kind of lookups implementations. The first kind is passed at the query time like `map` implementation. diff --git a/docs/content/querying/lookups.md b/docs/content/querying/lookups.md index 812ef3757e0a..185954f8452c 100644 --- a/docs/content/querying/lookups.md +++ b/docs/content/querying/lookups.md @@ -3,6 +3,10 @@ layout: doc_page --- # Lookups +
+Lookups are an experimental feature. +
+ Lookups are a concept in Druid where dimension values are (optionally) replaced with new values. See [dimension specs](../querying/dimensionspecs.html) for more information. For the purpose of these documents, a "key" refers to a dimension value to match, and a "value" refers to its replacement. @@ -61,8 +65,8 @@ described as per the sections on this page. For example: ] ``` -Proper functionality of Namespaced lookups requires the following extension to be loaded on the broker, peon, and historical nodes: -`io.druid.extensions:druid-namespace-lookup` +Proper functionality of Namespaced lookups requires the following extension to be loaded on the broker, peon, and historical nodes: +`druid-namespace-lookup` ## Cache Settings @@ -287,6 +291,8 @@ The following are the handling for kafka consumer properties in `druid.query.ren To test this setup, you can send key/value pairs to a kafka stream via the following producer console: -`./bin/kafka-console-producer.sh --property parse.key=true --property key.separator="->" --broker-list localhost:9092 --topic testTopic` +``` +./bin/kafka-console-producer.sh --property parse.key=true --property key.separator="->" --broker-list localhost:9092 --topic testTopic +``` Renames can then be published as `OLD_VAL->NEW_VAL` followed by newline (enter or return)