[DOCS] Streamlined GS indexing topic. (#45714) (#45867)

* Streamlined GS indexing topic. * Incorporated review feedback * Applied formatting per the style guidelines.
elastic · Aug 22, 2019 · 5ec7c85 · 5ec7c85
1 parent fc786a4
commit 5ec7c85
Showing 1 changed file with 26 additions and 54 deletions.
diff --git a/docs/reference/getting-started.asciidoc b/docs/reference/getting-started.asciidoc
@@ -22,7 +22,7 @@ how {es} works. If you're already familiar with {es} and want to see how it work
 with the rest of the stack, you might want to jump to the
 {stack-gs}/get-started-elastic-stack.html[Elastic Stack
 Tutorial] to see how to set up a system monitoring solution with {es}, {kib},
-{beats},  and {ls}.
+{beats}, and {ls}.
 
 TIP: The fastest way to get started with {es} is to
 https://www.elastic.co/cloud/elasticsearch-service/signup[start a free 14-day
@@ -135,8 +135,8 @@ Windows:
 The additional nodes are assigned unique IDs. Because you're running all three
 nodes locally, they automatically join the cluster with the first node.
 
-. Use the `cat health` API to verify that your three-node cluster is up running.
-The `cat` APIs return information about your cluster and indices in a
+. Use the cat health API to verify that your three-node cluster is up running.
+The cat APIs return information about your cluster and indices in a
 format that's easier to read than raw JSON.
 +
 You can interact directly with your cluster by submitting HTTP requests to
@@ -155,8 +155,8 @@ GET /_cat/health?v
 --------------------------------------------------
 // CONSOLE
 +
-The response should indicate that the status of the _elasticsearch_ cluster
-is _green_ and it has three nodes:
+The response should indicate that the status of the `elasticsearch` cluster
+is `green` and it has three nodes:
 +
 [source,txt]
 --------------------------------------------------
@@ -191,8 +191,8 @@ Once you have a cluster up and running, you're ready to index some data.
 There are a variety of ingest options for {es}, but in the end they all
 do the same thing: put JSON documents into an {es} index.
 
-You can do this directly with a simple POST request that identifies
-the index you want to add the document to and specifies one or more
+You can do this directly with a simple PUT request that specifies
+the index you want to add the document, a unique document ID, and one or more
 `"field": "value"` pairs in the request body:
 
 [source,js]
@@ -204,9 +204,9 @@ PUT /customer/_doc/1
 --------------------------------------------------
 // CONSOLE
 
-This request automatically creates the _customer_ index if it doesn't already
+This request automatically creates the `customer` index if it doesn't already
 exist, adds a new document that has an ID of `1`, and stores and
-indexes the _name_ field.
+indexes the `name` field.
 
 Since this is a new document, the response shows that the result of the
 operation was that version 1 of the document was created:
@@ -264,46 +264,22 @@ and shows the original source fields that were indexed.
 // TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ ]
 // TESTRESPONSE[s/"_primary_term" : \d+/"_primary_term" : $body._primary_term/]
 
-
 [float]
 [[getting-started-batch-processing]]
-=== Batch processing
-
-In addition to being able to index, update, and delete individual documents, Elasticsearch also provides the ability to perform any of the above operations in batches using the {ref}/docs-bulk.html[`_bulk` API]. This functionality is important in that it provides a very efficient mechanism to do multiple operations as fast as possible with as few network roundtrips as possible.
-
-As a quick example, the following call indexes two documents (ID 1 - John Doe and ID 2 - Jane Doe) in one bulk operation:
-
-[source,js]
---------------------------------------------------
-POST /customer/_doc/_bulk?pretty
-{"index":{"_id":"1"}}
-{"name": "John Doe" }
-{"index":{"_id":"2"}}
-{"name": "Jane Doe" }
---------------------------------------------------
-// CONSOLE
-
-This example updates the first document (ID of 1) and then deletes the second document (ID of 2) in one bulk operation:
-
-[source,sh]
---------------------------------------------------
-POST /customer/_doc/_bulk
-{"update":{"_id":"1"}}
-{"doc": { "name": "John Doe becomes Jane Doe" } }
-{"delete":{"_id":"2"}}
---------------------------------------------------
-// CONSOLE
-// TEST[continued]
+=== Indexing documents in bulk
 
-Note above that for the delete action, there is no corresponding source document after it since deletes only require the ID of the document to be deleted.
+If you have a lot of documents to index, you can submit them in batches with
+the {ref}/docs-bulk.html[bulk API]. Using bulk to batch document
+operations is significantly faster than submitting requests individually as it minimizes network roundtrips. 
 
-The Bulk API does not fail due to failures in one of the actions. If a single action fails for whatever reason, it will continue to process the remainder of the actions after it. When the bulk API returns, it will provide a status for each action (in the same order it was sent in) so that you can check if a specific action failed or not.
+The optimal batch size depends a number of factors: the document size and complexity, the indexing and search load, and the resources available to your cluster. A good place to start is with batches of 1,000 to 5,000 documents
+and a total payload between 5MB and 15MB. From there, you can experiment
+to find the sweet spot.
 
-[float]
-=== Sample dataset
-
-Now that we've gotten a glimpse of the basics, let's try to work on a more realistic dataset. I've prepared a sample of fictitious JSON documents of customer bank account information. Each document has the following schema:
+To get some data into {es} that you can start searching and analyzing:
 
+. Download the https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true[`accounts.json`] sample data set. The documents in this randomly-generated data set represent user accounts with the following information:
++
 [source,js]
 --------------------------------------------------
 {
@@ -322,31 +298,29 @@ Now that we've gotten a glimpse of the basics, let's try to work on a more reali
 --------------------------------------------------
 // NOTCONSOLE
 
-For the curious, this data was generated using http://www.json-generator.com/[`www.json-generator.com/`], so please ignore the actual values and semantics of the data as these are all randomly generated.
-
-You can download the sample dataset (accounts.json) from https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true[here]. Extract it to our current directory and let's load it into our cluster as follows:
-
+. Index the account data into the `bank` index with the following `_bulk` request:
++
 [source,sh]
 --------------------------------------------------
 curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"
 curl "localhost:9200/_cat/indices?v"
 --------------------------------------------------
 // NOTCONSOLE
-
++
 ////
 This replicates the above in a document-testing friendly way but isn't visible
 in the docs:
-
++
 [source,js]
 --------------------------------------------------
 GET /_cat/indices?v
 --------------------------------------------------
 // CONSOLE
 // TEST[setup:bank]
 ////
-
-And the response:
-
++
+The response indicates that 1,000 documents were indexed successfully.
++
 [source,txt]
 --------------------------------------------------
 health status index uuid                   pri rep docs.count docs.deleted store.size pri.store.size
@@ -356,8 +330,6 @@ green open   bank  l7sSYV2cQXmu6_4rJWVIww   5   1       1000            0    128
 // TESTRESPONSE[s/128.6kb/\\d+(\\.\\d+)?[mk]?b/]
 // TESTRESPONSE[s/l7sSYV2cQXmu6_4rJWVIww/.+/ non_json]
 
-Which means that we just successfully bulk indexed 1000 documents into the bank index.
-
 [[getting-started-search]]
 == Start searching