diff --git a/docs/reference/transform/api-quickref.asciidoc b/docs/reference/transform/api-quickref.asciidoc
new file mode 100644
index 0000000000000..7750331a0273d
--- /dev/null
+++ b/docs/reference/transform/api-quickref.asciidoc
@@ -0,0 +1,21 @@
+[role="xpack"]
+[[df-api-quickref]]
+== API quick reference
+
+All {dataframe-transform} endpoints have the following base:
+
+[source,js]
+----
+/_data_frame/transforms/
+----
+// NOTCONSOLE
+
+* {ref}/put-data-frame-transform.html[Create {dataframe-transforms}]
+* {ref}/delete-data-frame-transform.html[Delete {dataframe-transforms}]
+* {ref}/get-data-frame-transform.html[Get {dataframe-transforms}]
+* {ref}/get-data-frame-transform-stats.html[Get {dataframe-transforms} statistics]
+* {ref}/preview-data-frame-transform.html[Preview {dataframe-transforms}]
+* {ref}/start-data-frame-transform.html[Start {dataframe-transforms}]
+* {ref}/stop-data-frame-transform.html[Stop {dataframe-transforms}]
+
+For the full list, see {ref}/data-frame-apis.html[{dataframe-transform-cap} APIs].
diff --git a/docs/reference/transform/checkpoints.asciidoc b/docs/reference/transform/checkpoints.asciidoc
new file mode 100644
index 0000000000000..808ce071ede7d
--- /dev/null
+++ b/docs/reference/transform/checkpoints.asciidoc
@@ -0,0 +1,88 @@
+[role="xpack"]
+[[ml-transform-checkpoints]]
+== How {dataframe-transform} checkpoints work
+++++
+How checkpoints work
+++++
+
+beta[]
+
+Each time a {dataframe-transform} examines the source indices and creates or
+updates the destination index, it generates a _checkpoint_.
+
+If your {dataframe-transform} runs only once, there is logically only one
+checkpoint. If your {dataframe-transform} runs continuously, however, it creates
+checkpoints as it ingests and transforms new source data.
+
+To create a checkpoint, the {cdataframe-transform}:
+
+. Checks for changes to source indices.
++
+Using a simple periodic timer, the {dataframe-transform} checks for changes to
+the source indices. This check is done based on the interval defined in the
+transform's `frequency` property.
++
+If the source indices remain unchanged or if a checkpoint is already in progress
+then it waits for the next timer.
+
+. Identifies which entities have changed.
++
+The {dataframe-transform} searches to see which entities have changed since the
+last time it checked. The transform's `sync` configuration object identifies a
+time field in the source indices. The transform uses the values in that field to
+synchronize the source and destination indices.
+
+. Updates the destination index (the {dataframe}) with the changed entities.
++
+--
+The {dataframe-transform} applies changes related to either new or changed
+entities to the destination index. The set of changed entities is paginated. For
+each page, the {dataframe-transform} performs a composite aggregation using a
+`terms` query. After all the pages of changes have been applied, the checkpoint
+is complete.
+--
+
+This checkpoint process involves both search and indexing activity on the
+cluster. We have attempted to favor control over performance while developing
+{dataframe-transforms}. We decided it was preferable for the
+{dataframe-transform} to take longer to complete, rather than to finish quickly
+and take precedence in resource consumption. That being said, the cluster still
+requires enough resources to support both the composite aggregation search and
+the indexing of its results.
+
+TIP: If the cluster experiences unsuitable performance degradation due to the
+{dataframe-transform}, stop the transform. Consider whether you can apply a
+source query to the {dataframe-transform} to reduce the scope of data it
+processes. Also consider whether the cluster has sufficient resources in place
+to support both the composite aggregation search and the indexing of its
+results.
+
+[discrete]
+[[ml-transform-checkpoint-errors]]
+==== Error handling
+
+Failures in {dataframe-transforms} tend to be related to searching or indexing.
+To increase the resiliency of {dataframe-transforms}, the cursor positions of
+the aggregated search and the changed entities search are tracked in memory and
+persisted periodically.
+
+Checkpoint failures can be categorized as follows:
+
+* Temporary failures: The checkpoint is retried. If 10 consecutive failures
+occur, the {dataframe-transform} has a failed status. For example, this
+situation might occur when there are shard failures and queries return only
+partial results.
+* Irrecoverable failures: The {dataframe-transform} immediately fails. For
+example, this situation occurs when the source index is not found.
+* Adjustment failures: The {dataframe-transform} retries with adjusted settings.
+For example, if a parent circuit breaker memory errors occur during the
+composite aggregation, the transform receives partial results. The aggregated
+search is retried with a smaller number of buckets. This retry is performed at
+the interval defined in the transform's `frequency` property. If the search
+is retried to the point where it reaches a minimal number of buckets, an
+irrecoverable failure occurs.
+
+If the node running the {dataframe-transforms} fails, the transform restarts
+from the most recent persisted cursor position. This recovery process might
+repeat some of the work the transform had already done, but it ensures data
+consistency.
diff --git a/docs/reference/transform/dataframe-examples.asciidoc b/docs/reference/transform/dataframe-examples.asciidoc
new file mode 100644
index 0000000000000..31abb4787a9b5
--- /dev/null
+++ b/docs/reference/transform/dataframe-examples.asciidoc
@@ -0,0 +1,335 @@
+[role="xpack"]
+[testenv="basic"]
+[[dataframe-examples]]
+== {dataframe-transform-cap} examples
+++++
+Examples
+++++
+
+beta[]
+
+These examples demonstrate how to use {dataframe-transforms} to derive useful
+insights from your data. All the examples use one of the
+{kibana-ref}/add-sample-data.html[{kib} sample datasets]. For a more detailed,
+step-by-step example, see
+<>.
+
+* <>
+* <>
+* <>
+* <>
+
+include::ecommerce-example.asciidoc[]
+
+[[example-best-customers]]
+=== Finding your best customers
+
+In this example, we use the eCommerce orders sample dataset to find the customers
+who spent the most in our hypothetical webshop. Let's transform the data such
+that the destination index contains the number of orders, the total price of
+the orders, the amount of unique products and the average price per order,
+and the total amount of ordered products for each customer.
+
+[source,console]
+----------------------------------
+POST _data_frame/transforms/_preview
+{
+ "source": {
+ "index": "kibana_sample_data_ecommerce"
+ },
+ "dest" : { <1>
+ "index" : "sample_ecommerce_orders_by_customer"
+ },
+ "pivot": {
+ "group_by": { <2>
+ "user": { "terms": { "field": "user" }},
+ "customer_id": { "terms": { "field": "customer_id" }}
+ },
+ "aggregations": {
+ "order_count": { "value_count": { "field": "order_id" }},
+ "total_order_amt": { "sum": { "field": "taxful_total_price" }},
+ "avg_amt_per_order": { "avg": { "field": "taxful_total_price" }},
+ "avg_unique_products_per_order": { "avg": { "field": "total_unique_products" }},
+ "total_unique_products": { "cardinality": { "field": "products.product_id" }}
+ }
+ }
+}
+----------------------------------
+// TEST[skip:setup kibana sample data]
+
+<1> This is the destination index for the {dataframe}. It is ignored by
+`_preview`.
+<2> Two `group_by` fields have been selected. This means the {dataframe} will
+contain a unique row per `user` and `customer_id` combination. Within this
+dataset both these fields are unique. By including both in the {dataframe} it
+gives more context to the final results.
+
+NOTE: In the example above, condensed JSON formatting has been used for easier
+readability of the pivot object.
+
+The preview {dataframe-transforms} API enables you to see the layout of the
+{dataframe} in advance, populated with some sample values. For example:
+
+[source,js]
+----------------------------------
+{
+ "preview" : [
+ {
+ "total_order_amt" : 3946.9765625,
+ "order_count" : 59.0,
+ "total_unique_products" : 116.0,
+ "avg_unique_products_per_order" : 2.0,
+ "customer_id" : "10",
+ "user" : "recip",
+ "avg_amt_per_order" : 66.89790783898304
+ },
+ ...
+ ]
+ }
+----------------------------------
+// NOTCONSOLE
+
+This {dataframe} makes it easier to answer questions such as:
+
+* Which customers spend the most?
+
+* Which customers spend the most per order?
+
+* Which customers order most often?
+
+* Which customers ordered the least number of different products?
+
+It's possible to answer these questions using aggregations alone, however
+{dataframes} allow us to persist this data as a customer centric index. This
+enables us to analyze data at scale and gives more flexibility to explore and
+navigate data from a customer centric perspective. In some cases, it can even
+make creating visualizations much simpler.
+
+[[example-airline]]
+=== Finding air carriers with the most delays
+
+In this example, we use the Flights sample dataset to find out which air carrier
+had the most delays. First, we filter the source data such that it excludes all
+the cancelled flights by using a query filter. Then we transform the data to
+contain the distinct number of flights, the sum of delayed minutes, and the sum
+of the flight minutes by air carrier. Finally, we use a
+{ref}/search-aggregations-pipeline-bucket-script-aggregation.html[`bucket_script`]
+to determine what percentage of the flight time was actually delay.
+
+[source,console]
+----------------------------------
+POST _data_frame/transforms/_preview
+{
+ "source": {
+ "index": "kibana_sample_data_flights",
+ "query": { <1>
+ "bool": {
+ "filter": [
+ { "term": { "Cancelled": false } }
+ ]
+ }
+ }
+ },
+ "dest" : { <2>
+ "index" : "sample_flight_delays_by_carrier"
+ },
+ "pivot": {
+ "group_by": { <3>
+ "carrier": { "terms": { "field": "Carrier" }}
+ },
+ "aggregations": {
+ "flights_count": { "value_count": { "field": "FlightNum" }},
+ "delay_mins_total": { "sum": { "field": "FlightDelayMin" }},
+ "flight_mins_total": { "sum": { "field": "FlightTimeMin" }},
+ "delay_time_percentage": { <4>
+ "bucket_script": {
+ "buckets_path": {
+ "delay_time": "delay_mins_total.value",
+ "flight_time": "flight_mins_total.value"
+ },
+ "script": "(params.delay_time / params.flight_time) * 100"
+ }
+ }
+ }
+ }
+}
+----------------------------------
+// TEST[skip:setup kibana sample data]
+
+<1> Filter the source data to select only flights that were not cancelled.
+<2> This is the destination index for the {dataframe}. It is ignored by
+`_preview`.
+<3> The data is grouped by the `Carrier` field which contains the airline name.
+<4> This `bucket_script` performs calculations on the results that are returned
+by the aggregation. In this particular example, it calculates what percentage of
+travel time was taken up by delays.
+
+The preview shows you that the new index would contain data like this for each
+carrier:
+
+[source,js]
+----------------------------------
+{
+ "preview" : [
+ {
+ "carrier" : "ES-Air",
+ "flights_count" : 2802.0,
+ "flight_mins_total" : 1436927.5130677223,
+ "delay_time_percentage" : 9.335543983955839,
+ "delay_mins_total" : 134145.0
+ },
+ ...
+ ]
+}
+----------------------------------
+// NOTCONSOLE
+
+This {dataframe} makes it easier to answer questions such as:
+
+* Which air carrier has the most delays as a percentage of flight time?
+
+NOTE: This data is fictional and does not reflect actual delays
+or flight stats for any of the featured destination or origin airports.
+
+
+[[example-clientips]]
+=== Finding suspicious client IPs by using scripted metrics
+
+With {dataframe-transforms}, you can use
+{ref}/search-aggregations-metrics-scripted-metric-aggregation.html[scripted
+metric aggregations] on your data. These aggregations are flexible and make
+it possible to perform very complex processing. Let's use scripted metrics to
+identify suspicious client IPs in the web log sample dataset.
+
+We transform the data such that the new index contains the sum of bytes and the
+number of distinct URLs, agents, incoming requests by location, and geographic
+destinations for each client IP. We also use a scripted field to count the
+specific types of HTTP responses that each client IP receives. Ultimately, the
+example below transforms web log data into an entity centric index where the
+entity is `clientip`.
+
+[source,console]
+----------------------------------
+POST _data_frame/transforms/_preview
+{
+ "source": {
+ "index": "kibana_sample_data_logs",
+ "query": { <1>
+ "range" : {
+ "timestamp" : {
+ "gte" : "now-30d/d"
+ }
+ }
+ }
+ },
+ "dest" : { <2>
+ "index" : "sample_weblogs_by_clientip"
+ },
+ "pivot": {
+ "group_by": { <3>
+ "clientip": { "terms": { "field": "clientip" } }
+ },
+ "aggregations": {
+ "url_dc": { "cardinality": { "field": "url.keyword" }},
+ "bytes_sum": { "sum": { "field": "bytes" }},
+ "geo.src_dc": { "cardinality": { "field": "geo.src" }},
+ "agent_dc": { "cardinality": { "field": "agent.keyword" }},
+ "geo.dest_dc": { "cardinality": { "field": "geo.dest" }},
+ "responses.total": { "value_count": { "field": "timestamp" }},
+ "responses.counts": { <4>
+ "scripted_metric": {
+ "init_script": "state.responses = ['error':0L,'success':0L,'other':0L]",
+ "map_script": """
+ def code = doc['response.keyword'].value;
+ if (code.startsWith('5') || code.startsWith('4')) {
+ state.responses.error += 1 ;
+ } else if(code.startsWith('2')) {
+ state.responses.success += 1;
+ } else {
+ state.responses.other += 1;
+ }
+ """,
+ "combine_script": "state.responses",
+ "reduce_script": """
+ def counts = ['error': 0L, 'success': 0L, 'other': 0L];
+ for (responses in states) {
+ counts.error += responses['error'];
+ counts.success += responses['success'];
+ counts.other += responses['other'];
+ }
+ return counts;
+ """
+ }
+ },
+ "timestamp.min": { "min": { "field": "timestamp" }},
+ "timestamp.max": { "max": { "field": "timestamp" }},
+ "timestamp.duration_ms": { <5>
+ "bucket_script": {
+ "buckets_path": {
+ "min_time": "timestamp.min.value",
+ "max_time": "timestamp.max.value"
+ },
+ "script": "(params.max_time - params.min_time)"
+ }
+ }
+ }
+ }
+}
+----------------------------------
+// TEST[skip:setup kibana sample data]
+
+<1> This range query limits the transform to documents that are within the last
+30 days at the point in time the {dataframe-transform} checkpoint is processed.
+For batch {dataframes} this occurs once.
+<2> This is the destination index for the {dataframe}. It is ignored by
+`_preview`.
+<3> The data is grouped by the `clientip` field.
+<4> This `scripted_metric` performs a distributed operation on the web log data
+to count specific types of HTTP responses (error, success, and other).
+<5> This `bucket_script` calculates the duration of the `clientip` access based
+on the results of the aggregation.
+
+The preview shows you that the new index would contain data like this for each
+client IP:
+
+[source,js]
+----------------------------------
+{
+ "preview" : [
+ {
+ "geo" : {
+ "src_dc" : 12.0,
+ "dest_dc" : 9.0
+ },
+ "clientip" : "0.72.176.46",
+ "agent_dc" : 3.0,
+ "responses" : {
+ "total" : 14.0,
+ "counts" : {
+ "other" : 0,
+ "success" : 14,
+ "error" : 0
+ }
+ },
+ "bytes_sum" : 74808.0,
+ "timestamp" : {
+ "duration_ms" : 4.919943239E9,
+ "min" : "2019-06-17T07:51:57.333Z",
+ "max" : "2019-08-13T06:31:00.572Z"
+ },
+ "url_dc" : 11.0
+ },
+ ...
+ }
+----------------------------------
+// NOTCONSOLE
+
+This {dataframe} makes it easier to answer questions such as:
+
+* Which client IPs are transferring the most amounts of data?
+
+* Which client IPs are interacting with a high number of different URLs?
+
+* Which client IPs have high error rates?
+
+* Which client IPs are interacting with a high number of destination countries?
\ No newline at end of file
diff --git a/docs/reference/transform/ecommerce-example.asciidoc b/docs/reference/transform/ecommerce-example.asciidoc
new file mode 100644
index 0000000000000..ce4193aa66584
--- /dev/null
+++ b/docs/reference/transform/ecommerce-example.asciidoc
@@ -0,0 +1,262 @@
+[role="xpack"]
+[testenv="basic"]
+[[ecommerce-dataframes]]
+=== Transforming the eCommerce sample data
+
+beta[]
+
+<> enable you to retrieve information
+from an {es} index, transform it, and store it in another index. Let's use the
+{kibana-ref}/add-sample-data.html[{kib} sample data] to demonstrate how you can
+pivot and summarize your data with {dataframe-transforms}.
+
+
+. If the {es} {security-features} are enabled, obtain a user ID with sufficient
+privileges to complete these steps.
++
+--
+You need `manage_data_frame_transforms` cluster privileges to preview and create
+{dataframe-transforms}. Members of the built-in `data_frame_transforms_admin`
+role have these privileges.
+
+You also need `read` and `view_index_metadata` index privileges on the source
+index and `read`, `create_index`, and `index` privileges on the destination
+index.
+
+For more information, see <> and <>.
+--
+
+. Choose your _source index_.
++
+--
+In this example, we'll use the eCommerce orders sample data. If you're not
+already familiar with the `kibana_sample_data_ecommerce` index, use the
+*Revenue* dashboard in {kib} to explore the data. Consider what insights you
+might want to derive from this eCommerce data.
+--
+
+. Play with various options for grouping and aggregating the data.
++
+--
+For example, you might want to group the data by product ID and calculate the
+total number of sales for each product and its average price. Alternatively, you
+might want to look at the behavior of individual customers and calculate how
+much each customer spent in total and how many different categories of products
+they purchased. Or you might want to take the currencies or geographies into
+consideration. What are the most interesting ways you can transform and
+interpret this data?
+
+_Pivoting_ your data involves using at least one field to group it and applying
+at least one aggregation. You can preview what the transformed data will look
+like, so go ahead and play with it!
+
+For example, go to *Machine Learning* > *Data Frames* in {kib} and use the
+wizard to create a {dataframe-transform}:
+
+[role="screenshot"]
+image::images/ecommerce-pivot1.jpg["Creating a simple {dataframe-transform} in {kib}"]
+
+In this case, we grouped the data by customer ID and calculated the sum of
+products each customer purchased.
+
+Let's add some more aggregations to learn more about our customers' orders. For
+example, let's calculate the total sum of their purchases, the maximum number of
+products that they purchased in a single order, and their total number of orders.
+We'll accomplish this by using the
+{ref}/search-aggregations-metrics-sum-aggregation.html[`sum` aggregation] on the
+`taxless_total_price` field, the
+{ref}/search-aggregations-metrics-max-aggregation.html[`max` aggregation] on the
+`total_quantity` field, and the
+{ref}/search-aggregations-metrics-cardinality-aggregation.html[`cardinality` aggregation]
+on the `order_id` field:
+
+[role="screenshot"]
+image::images/ecommerce-pivot2.jpg["Adding multiple aggregations to a {dataframe-transform} in {kib}"]
+
+TIP: If you're interested in a subset of the data, you can optionally include a
+{ref}/search-request-body.html#request-body-search-query[query] element. In this
+example, we've filtered the data so that we're only looking at orders with a
+`currency` of `EUR`. Alternatively, we could group the data by that field too.
+If you want to use more complex queries, you can create your {dataframe} from a
+{kibana-ref}/save-open-search.html[saved search].
+
+If you prefer, you can use the
+{ref}/preview-data-frame-transform.html[preview {dataframe-transforms} API]:
+
+[source,js]
+--------------------------------------------------
+POST _data_frame/transforms/_preview
+{
+ "source": {
+ "index": "kibana_sample_data_ecommerce",
+ "query": {
+ "bool": {
+ "filter": {
+ "term": {"currency": "EUR"}
+ }
+ }
+ }
+ },
+ "pivot": {
+ "group_by": {
+ "customer_id": {
+ "terms": {
+ "field": "customer_id"
+ }
+ }
+ },
+ "aggregations": {
+ "total_quantity.sum": {
+ "sum": {
+ "field": "total_quantity"
+ }
+ },
+ "taxless_total_price.sum": {
+ "sum": {
+ "field": "taxless_total_price"
+ }
+ },
+ "total_quantity.max": {
+ "max": {
+ "field": "total_quantity"
+ }
+ },
+ "order_id.cardinality": {
+ "cardinality": {
+ "field": "order_id"
+ }
+ }
+ }
+ }
+}
+--------------------------------------------------
+// CONSOLE
+// TEST[skip:set up sample data]
+--
+
+. When you are satisfied with what you see in the preview, create the
+{dataframe-transform}.
++
+--
+.. Supply a job ID and the name of the target (or _destination_) index.
+
+.. Decide whether you want the {dataframe-transform} to run once or continuously.
+--
++
+--
+Since this sample data index is unchanging, let's use the default behavior and
+just run the {dataframe-transform} once.
+
+[role="screenshot"]
+image::images/ecommerce-batch.jpg["Specifying the {dataframe-transform} options in {kib}"]
+
+If you want to try it out, however, go ahead and click on *Continuous mode*.
+You must choose a field that the {dataframe-transform} can use to check which
+entities have changed. In general, it's a good idea to use the ingest timestamp
+field. In this example, however, you can use the `order_date` field.
+
+If you prefer, you can use the
+{ref}/put-data-frame-transform.html[create {dataframe-transforms} API]. For
+example:
+
+[source,js]
+--------------------------------------------------
+PUT _data_frame/transforms/ecommerce-customer-transform
+{
+ "source": {
+ "index": [
+ "kibana_sample_data_ecommerce"
+ ],
+ "query": {
+ "bool": {
+ "filter": {
+ "term": {
+ "currency": "EUR"
+ }
+ }
+ }
+ }
+ },
+ "pivot": {
+ "group_by": {
+ "customer_id": {
+ "terms": {
+ "field": "customer_id"
+ }
+ }
+ },
+ "aggregations": {
+ "total_quantity.sum": {
+ "sum": {
+ "field": "total_quantity"
+ }
+ },
+ "taxless_total_price.sum": {
+ "sum": {
+ "field": "taxless_total_price"
+ }
+ },
+ "total_quantity.max": {
+ "max": {
+ "field": "total_quantity"
+ }
+ },
+ "order_id.cardinality": {
+ "cardinality": {
+ "field": "order_id"
+ }
+ }
+ }
+ },
+ "dest": {
+ "index": "ecommerce-customers"
+ }
+}
+--------------------------------------------------
+// CONSOLE
+// TEST[skip:setup kibana sample data]
+--
+
+. Start the {dataframe-transform}.
++
+--
+
+TIP: Even though resource utilization is automatically adjusted based on the
+cluster load, a {dataframe-transform} increases search and indexing load on your
+cluster while it runs. If you're experiencing an excessive load, however, you
+can stop it.
+
+You can start, stop, and manage {dataframe-transforms} in {kib}:
+
+[role="screenshot"]
+image::images/dataframe-transforms.jpg["Managing {dataframe-transforms} in {kib}"]
+
+Alternatively, you can use the
+{ref}/start-data-frame-transform.html[start {dataframe-transforms}] and
+{ref}/stop-data-frame-transform.html[stop {dataframe-transforms}] APIs. For
+example:
+
+[source,js]
+--------------------------------------------------
+POST _data_frame/transforms/ecommerce-customer-transform/_start
+--------------------------------------------------
+// CONSOLE
+// TEST[skip:setup kibana sample data]
+
+--
+
+. Explore the data in your new index.
++
+--
+For example, use the *Discover* application in {kib}:
+
+[role="screenshot"]
+image::images/ecommerce-results.jpg["Exploring the new index in {kib}"]
+
+--
+
+TIP: If you do not want to keep the {dataframe-transform}, you can delete it in
+{kib} or use the
+{ref}/delete-data-frame-transform.html[delete {dataframe-transform} API]. When
+you delete a {dataframe-transform}, its destination index and {kib} index
+patterns remain.
diff --git a/docs/reference/transform/images/dataframe-transforms.jpg b/docs/reference/transform/images/dataframe-transforms.jpg
new file mode 100644
index 0000000000000..927678f894d4b
Binary files /dev/null and b/docs/reference/transform/images/dataframe-transforms.jpg differ
diff --git a/docs/reference/transform/images/ecommerce-batch.jpg b/docs/reference/transform/images/ecommerce-batch.jpg
new file mode 100644
index 0000000000000..bed3fedd4cf01
Binary files /dev/null and b/docs/reference/transform/images/ecommerce-batch.jpg differ
diff --git a/docs/reference/transform/images/ecommerce-continuous.jpg b/docs/reference/transform/images/ecommerce-continuous.jpg
new file mode 100644
index 0000000000000..f144fc8cb9541
Binary files /dev/null and b/docs/reference/transform/images/ecommerce-continuous.jpg differ
diff --git a/docs/reference/transform/images/ecommerce-pivot1.jpg b/docs/reference/transform/images/ecommerce-pivot1.jpg
new file mode 100644
index 0000000000000..b55b88b8acfb0
Binary files /dev/null and b/docs/reference/transform/images/ecommerce-pivot1.jpg differ
diff --git a/docs/reference/transform/images/ecommerce-pivot2.jpg b/docs/reference/transform/images/ecommerce-pivot2.jpg
new file mode 100644
index 0000000000000..9af5a3c46b740
Binary files /dev/null and b/docs/reference/transform/images/ecommerce-pivot2.jpg differ
diff --git a/docs/reference/transform/images/ecommerce-results.jpg b/docs/reference/transform/images/ecommerce-results.jpg
new file mode 100644
index 0000000000000..f483c3b3c3627
Binary files /dev/null and b/docs/reference/transform/images/ecommerce-results.jpg differ
diff --git a/docs/reference/transform/images/ml-dataframepivot.jpg b/docs/reference/transform/images/ml-dataframepivot.jpg
new file mode 100644
index 0000000000000..c0c7946cf4441
Binary files /dev/null and b/docs/reference/transform/images/ml-dataframepivot.jpg differ
diff --git a/docs/reference/transform/index.asciidoc b/docs/reference/transform/index.asciidoc
new file mode 100644
index 0000000000000..2a45e1709dd01
--- /dev/null
+++ b/docs/reference/transform/index.asciidoc
@@ -0,0 +1,82 @@
+[role="xpack"]
+[[ml-dataframes]]
+= {dataframe-transforms-cap}
+
+[partintro]
+--
+
+beta[]
+
+{es} aggregations are a powerful and flexible feature that enable you to
+summarize and retrieve complex insights about your data. You can summarize
+complex things like the number of web requests per day on a busy website, broken
+down by geography and browser type. If you use the same data set to try to
+calculate something as simple as a single number for the average duration of
+visitor web sessions, however, you can quickly run out of memory.
+
+Why does this occur? A web session duration is an example of a behavioral
+attribute not held on any one log record; it has to be derived by finding the
+first and last records for each session in our weblogs. This derivation requires
+some complex query expressions and a lot of memory to connect all the data
+points. If you have an ongoing background process that fuses related events from
+one index into entity-centric summaries in another index, you get a more useful,
+joined-up picture--this is essentially what _{dataframes}_ are.
+
+
+[discrete]
+[[ml-dataframes-usage]]
+== When to use {dataframes}
+
+You might want to consider using {dataframes} instead of aggregations when:
+
+* You need a complete _feature index_ rather than a top-N set of items.
++
+In {ml}, you often need a complete set of behavioral features rather just the
+top-N. For example, if you are predicting customer churn, you might look at
+features such as the number of website visits in the last week, the total number
+of sales, or the number of emails sent. The {stack} {ml-features} create models
+based on this multi-dimensional feature space, so they benefit from full feature
+indices ({dataframes}).
++
+This scenario also applies when you are trying to search across the results of
+an aggregation or multiple aggregations. Aggregation results can be ordered or
+filtered, but there are
+{ref}/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-order[limitations to ordering]
+and
+{ref}/search-aggregations-pipeline-bucket-selector-aggregation.html[filtering by bucket selector]
+is constrained by the maximum number of buckets returned. If you want to search
+all aggregation results, you need to create the complete {dataframe}. If you
+need to sort or filter the aggregation results by multiple fields, {dataframes}
+are particularly useful.
+
+* You need to sort aggregation results by a pipeline aggregation.
++
+{ref}/search-aggregations-pipeline.html[Pipeline aggregations] cannot be used
+for sorting. Technically, this is because pipeline aggregations are run during
+the reduce phase after all other aggregations have already completed. If you
+create a {dataframe}, you can effectively perform multiple passes over the data.
+
+* You want to create summary tables to optimize queries.
++
+For example, if you
+have a high level dashboard that is accessed by a large number of users and it
+uses a complex aggregation over a large dataset, it may be more efficient to
+create a {dataframe} to cache results. Thus, each user doesn't need to run the
+aggregation query.
+
+Though there are multiple ways to create {dataframes}, this content pertains
+to one specific method: _{dataframe-transforms}_.
+
+* <>
+* <>
+* <>
+* <>
+* <>
+--
+
+include::overview.asciidoc[]
+include::checkpoints.asciidoc[]
+include::api-quickref.asciidoc[]
+include::dataframe-examples.asciidoc[]
+include::troubleshooting.asciidoc[]
+include::limitations.asciidoc[]
\ No newline at end of file
diff --git a/docs/reference/transform/limitations.asciidoc b/docs/reference/transform/limitations.asciidoc
new file mode 100644
index 0000000000000..c019018bce509
--- /dev/null
+++ b/docs/reference/transform/limitations.asciidoc
@@ -0,0 +1,219 @@
+[role="xpack"]
+[[dataframe-limitations]]
+== {dataframe-transform-cap} limitations
+[subs="attributes"]
+++++
+Limitations
+++++
+
+beta[]
+
+The following limitations and known problems apply to the 7.4 release of
+the Elastic {dataframe} feature:
+
+[float]
+[[df-compatibility-limitations]]
+=== Beta {dataframe-transforms} do not have guaranteed backwards or forwards compatibility
+
+Whilst {dataframe-transforms} are beta, it is not guaranteed that a
+{dataframe-transform} created in a previous version of the {stack} will be able
+to start and operate in a future version. Neither can support be provided for
+{dataframe-transform} tasks to be able to operate in a cluster with mixed node
+versions.
+Please note that the output of a {dataframe-transform} is persisted to a
+destination index. This is a normal {es} index and is not affected by the beta
+status.
+
+[float]
+[[df-ui-limitation]]
+=== {dataframe-cap} UI will not work during a rolling upgrade from 7.2
+
+If your cluster contains mixed version nodes, for example during a rolling
+upgrade from 7.2 to a newer version, and {dataframe-transforms} have been
+created in 7.2, the {dataframe} UI will not work. Please wait until all nodes
+have been upgraded to the newer version before using the {dataframe} UI.
+
+
+[float]
+[[df-datatype-limitations]]
+=== {dataframe-cap} data type limitation
+
+{dataframes-cap} do not (yet) support fields containing arrays – in the UI or
+the API. If you try to create one, the UI will fail to show the source index
+table.
+
+[float]
+[[df-ccs-limitations]]
+=== {ccs-cap} is not supported
+
+{ccs-cap} is not supported for {dataframe-transforms}.
+
+[float]
+[[df-kibana-limitations]]
+=== Up to 1,000 {dataframe-transforms} are supported
+
+A single cluster will support up to 1,000 {dataframe-transforms}.
+When using the
+{ref}/get-data-frame-transform.html[GET {dataframe-transforms} API] a total
+`count` of transforms is returned. Use the `size` and `from` parameters to
+enumerate through the full list.
+
+[float]
+[[df-aggresponse-limitations]]
+=== Aggregation responses may be incompatible with destination index mappings
+
+When a {dataframe-transform} is first started, it will deduce the mappings
+required for the destination index. This process is based on the field types of
+the source index and the aggregations used. If the fields are derived from
+{ref}/search-aggregations-metrics-scripted-metric-aggregation.html[`scripted_metrics`]
+or {ref}/search-aggregations-pipeline-bucket-script-aggregation.html[`bucket_scripts`],
+{ref}/dynamic-mapping.html[dynamic mappings] will be used. In some instances the
+deduced mappings may be incompatible with the actual data. For example, numeric
+overflows might occur or dynamically mapped fields might contain both numbers
+and strings. Please check {es} logs if you think this may have occurred. As a
+workaround, you may define custom mappings prior to starting the
+{dataframe-transform}. For example,
+{ref}/indices-create-index.html[create a custom destination index] or
+{ref}/indices-templates.html[define an index template].
+
+[float]
+[[df-batch-limitations]]
+=== Batch {dataframe-transforms} may not account for changed documents
+
+A batch {dataframe-transform} uses a
+{ref}/search-aggregations-bucket-composite-aggregation.html[composite aggregation]
+which allows efficient pagination through all buckets. Composite aggregations
+do not yet support a search context, therefore if the source data is changed
+(deleted, updated, added) while the batch {dataframe} is in progress, then the
+results may not include these changes.
+
+[float]
+[[df-consistency-limitations]]
+=== {cdataframe-cap} consistency does not account for deleted or updated documents
+
+While the process for {cdataframe-transforms} allows the continual recalculation
+of the {dataframe-transform} as new data is being ingested, it does also have
+some limitations.
+
+Changed entities will only be identified if their time field
+has also been updated and falls within the range of the action to check for
+changes. This has been designed in principle for, and is suited to, the use case
+where new data is given a timestamp for the time of ingest.
+
+If the indices that fall within the scope of the source index pattern are
+removed, for example when deleting historical time-based indices, then the
+composite aggregation performed in consecutive checkpoint processing will search
+over different source data, and entities that only existed in the deleted index
+will not be removed from the {dataframe} destination index.
+
+Depending on your use case, you may wish to recreate the {dataframe-transform}
+entirely after deletions. Alternatively, if your use case is tolerant to
+historical archiving, you may wish to include a max ingest timestamp in your
+aggregation. This will allow you to exclude results that have not been recently
+updated when viewing the {dataframe} destination index.
+
+
+[float]
+[[df-deletion-limitations]]
+=== Deleting a {dataframe-transform} does not delete the {dataframe} destination index or {kib} index pattern
+
+When deleting a {dataframe-transform} using `DELETE _data_frame/transforms/index`
+neither the {dataframe} destination index nor the {kib} index pattern, should
+one have been created, are deleted. These objects must be deleted separately.
+
+[float]
+[[df-aggregation-page-limitations]]
+=== Handling dynamic adjustment of aggregation page size
+
+During the development of {dataframe-transforms}, control was favoured over
+performance. In the design considerations, it is preferred for the
+{dataframe-transform} to take longer to complete quietly in the background
+rather than to finish quickly and take precedence in resource consumption.
+
+Composite aggregations are well suited for high cardinality data enabling
+pagination through results. If a {ref}/circuit-breaker.html[circuit breaker]
+memory exception occurs when performing the composite aggregated search then we
+try again reducing the number of buckets requested. This circuit breaker is
+calculated based upon all activity within the cluster, not just activity from
+{dataframe-transforms}, so it therefore may only be a temporary resource
+availability issue.
+
+For a batch {dataframe-transform}, the number of buckets requested is only ever
+adjusted downwards. The lowering of value may result in a longer duration for the
+transform checkpoint to complete. For {cdataframes}, the number of
+buckets requested is reset back to its default at the start of every checkpoint
+and it is possible for circuit breaker exceptions to occur repeatedly in the
+{es} logs.
+
+The {dataframe-transform} retrieves data in batches which means it calculates
+several buckets at once. Per default this is 500 buckets per search/index
+operation. The default can be changed using `max_page_search_size` and the
+minimum value is 10. If failures still occur once the number of buckets
+requested has been reduced to its minimum, then the {dataframe-transform} will
+be set to a failed state.
+
+[float]
+[[df-dynamic-adjustments-limitations]]
+=== Handling dynamic adjustments for many terms
+
+For each checkpoint, entities are identified that have changed since the last
+time the check was performed. This list of changed entities is supplied as a
+{ref}/query-dsl-terms-query.html[terms query] to the {dataframe-transform}
+composite aggregation, one page at a time. Then updates are applied to the
+destination index for each page of entities.
+
+The page `size` is defined by `max_page_search_size` which is also used to
+define the number of buckets returned by the composite aggregation search. The
+default value is 500, the minimum is 10.
+
+The index setting
+{ref}/index-modules.html#dynamic-index-settings[`index.max_terms_count`] defines
+the maximum number of terms that can be used in a terms query. The default value
+is 65536. If `max_page_search_size` exceeds `index.max_terms_count` the
+transform will fail.
+
+Using smaller values for `max_page_search_size` may result in a longer duration
+for the transform checkpoint to complete.
+
+[float]
+[[df-scheduling-limitations]]
+=== {cdataframe-cap} scheduling limitations
+
+A {cdataframe} periodically checks for changes to source data. The functionality
+of the scheduler is currently limited to a basic periodic timer which can be
+within the `frequency` range from 1s to 1h. The default is 1m. This is designed
+to run little and often. When choosing a `frequency` for this timer consider
+your ingest rate along with the impact that the {dataframe-transform}
+search/index operations has other users in your cluster. Also note that retries
+occur at `frequency` interval.
+
+[float]
+[[df-failed-limitations]]
+=== Handling of failed {dataframe-transforms}
+
+Failed {dataframe-transforms} remain as a persistent task and should be handled
+appropriately, either by deleting it or by resolving the root cause of the
+failure and re-starting.
+
+When using the API to delete a failed {dataframe-transform}, first stop it using
+`_stop?force=true`, then delete it.
+
+If starting a failed {dataframe-transform}, after the root cause has been
+resolved, the `_start?force=true` parameter must be specified.
+
+[float]
+[[df-availability-limitations]]
+=== {cdataframes-cap} may give incorrect results if documents are not yet available to search
+
+After a document is indexed, there is a very small delay until it is available
+to search.
+
+A {cdataframe-transform} periodically checks for changed entities between the
+time since it last checked and `now` minus `sync.time.delay`. This time window
+moves without overlapping. If the timestamp of a recently indexed document falls
+within this time window but this document is not yet available to search then
+this entity will not be updated.
+
+If using a `sync.time.field` that represents the data ingest time and using a
+zero second or very small `sync.time.delay`, then it is more likely that this
+issue will occur.
\ No newline at end of file
diff --git a/docs/reference/transform/overview.asciidoc b/docs/reference/transform/overview.asciidoc
new file mode 100644
index 0000000000000..c0a7856e28314
--- /dev/null
+++ b/docs/reference/transform/overview.asciidoc
@@ -0,0 +1,71 @@
+[role="xpack"]
+[[ml-transform-overview]]
+== {dataframe-transform-cap} overview
+++++
+Overview
+++++
+
+beta[]
+
+A _{dataframe}_ is a two-dimensional tabular data structure. In the context of
+the {stack}, it is a transformation of data that is indexed in {es}. For
+example, you can use {dataframes} to _pivot_ your data into a new entity-centric
+index. By transforming and summarizing your data, it becomes possible to
+visualize and analyze it in alternative and interesting ways.
+
+A lot of {es} indices are organized as a stream of events: each event is an
+individual document, for example a single item purchase. {dataframes-cap} enable
+you to summarize this data, bringing it into an organized, more
+analysis-friendly format. For example, you can summarize all the purchases of a
+single customer.
+
+You can create {dataframes} by using {dataframe-transforms}.
+{dataframe-transforms-cap} enable you to define a pivot, which is a set of
+features that transform the index into a different, more digestible format.
+Pivoting results in a summary of your data, which is the {dataframe}.
+
+To define a pivot, first you select one or more fields that you will use to
+group your data. You can select categorical fields (terms) and numerical fields
+for grouping. If you use numerical fields, the field values are bucketed using
+an interval that you specify.
+
+The second step is deciding how you want to aggregate the grouped data. When
+using aggregations, you practically ask questions about the index. There are
+different types of aggregations, each with its own purpose and output. To learn
+more about the supported aggregations and group-by fields, see
+{ref}/data-frame-transform-resource.html[{dataframe-transform-cap} resources].
+
+As an optional step, you can also add a query to further limit the scope of the
+aggregation.
+
+The {dataframe-transform} performs a composite aggregation that
+paginates through all the data defined by the source index query. The output of
+the aggregation is stored in a destination index. Each time the
+{dataframe-transform} queries the source index, it creates a _checkpoint_. You
+can decide whether you want the {dataframe-transform} to run once (batch
+{dataframe-transform}) or continuously ({cdataframe-transform}). A batch
+{dataframe-transform} is a single operation that has a single checkpoint.
+{cdataframe-transforms-cap} continually increment and process checkpoints as new
+source data is ingested.
+
+.Example
+
+Imagine that you run a webshop that sells clothes. Every order creates a document
+that contains a unique order ID, the name and the category of the ordered product,
+its price, the ordered quantity, the exact date of the order, and some customer
+information (name, gender, location, etc). Your dataset contains all the transactions
+from last year.
+
+If you want to check the sales in the different categories in your last fiscal
+year, define a {dataframe-transform} that groups the data by the product
+categories (women's shoes, men's clothing, etc.) and the order date. Use the
+last year as the interval for the order date. Then add a sum aggregation on the
+ordered quantity. The result is a {dataframe} that shows the number of sold
+items in every product category in the last year.
+
+[role="screenshot"]
+image::images/ml-dataframepivot.jpg["Example of a data frame pivot in {kib}"]
+
+IMPORTANT: The {dataframe-transform} leaves your source index intact. It
+creates a new index that is dedicated to the {dataframe}.
+
diff --git a/docs/reference/transform/troubleshooting.asciidoc b/docs/reference/transform/troubleshooting.asciidoc
new file mode 100644
index 0000000000000..4ea0dd8cc830d
--- /dev/null
+++ b/docs/reference/transform/troubleshooting.asciidoc
@@ -0,0 +1,29 @@
+[[dataframe-troubleshooting]]
+== Troubleshooting {dataframe-transforms}
+[subs="attributes"]
+++++
+Troubleshooting
+++++
+
+Use the information in this section to troubleshoot common problems.
+
+include::{stack-repo-dir}/help.asciidoc[tag=get-help]
+
+If you encounter problems with your {dataframe-transforms}, you can gather more
+information from the following files and APIs:
+
+* Lightweight audit messages are stored in `.data-frame-notifications-*`. Search
+by your `transform_id`.
+* The
+{ref}/get-data-frame-transform-stats.html[get {dataframe-transform} statistics API]
+provides information about the transform status and failures.
+* If the {dataframe-transform} exists as a task, you can use the
+{ref}/tasks.html[task management API] to gather task information. For example:
+`GET _tasks?actions=data_frame/transforms*&detailed`. Typically, the task exists
+when the transform is in a started or failed state.
+* The {es} logs from the node that was running the {dataframe-transform} might
+also contain useful information. You can identify the node from the notification
+messages. Alternatively, if the task still exists, you can get that information
+from the get {dataframe-transform} statistics API. For more information, see
+{ref}/logging.html[Logging configuration].
+