From e17e8d0e06e1994f0e2db406e3683a1dd70e3edd Mon Sep 17 00:00:00 2001
From: Kensei Nakada <handbomusic@gmail.com>
Date: Mon, 18 Dec 2023 16:00:03 +0900
Subject: [PATCH] improve documentations for end users

---
 README.md                                 |  57 +++++---
 api/v1beta3/tortoise_types.go             |   2 +-
 docs/{configuration.md => admin-guide.md} |   8 +-
 docs/concept.md                           |  50 -------
 docs/emergency.md                         |   9 +-
 docs/horizontal.md                        |  21 ++-
 docs/user-guide.md                        | 155 ++++++++++++++++++++++
 7 files changed, 215 insertions(+), 87 deletions(-)
 rename docs/{configuration.md => admin-guide.md} (98%)
 delete mode 100644 docs/concept.md
 create mode 100644 docs/user-guide.md
diff --git a/README.md b/README.md
index 3b0a9e53..ca31ea38 100644
--- a/README.md
+++ b/README.md
@@ -1,20 +1,12 @@
-# tortoise
+# Tortoise
 
 <img alt="Tortoise" src="docs/images/tortoise_big.jpg" width="400px"/> 
 
-Tortoise, they are living in the Kubernetes cluster. 
-
-Tortoise, you need to feed only very few parameters to them.
-
-Tortoise, they will soon start to eat historical usage data of Pods.
-
-Tortoise, once you start to live with them, you no longer need to configure autoscaling by yourself.
+Get a cute Tortoise into your Kubernetes garden and say goodbye to the days optimizing your rigid autoscalers. 
 
 ## Install
 
-Tortoise, you cannot get it from the breeder.
-
-Tortoise, you need to get it from GitHub instead.
+You cannot get it from the breeder, you need to get it from GitHub instead.
 
 ```shell
 # Install CRDs into the K8s cluster specified in ~/.kube/config.
@@ -23,41 +15,64 @@ make install
 make deploy
 ```
 
-Tortoise, you don't need a rearing cage, but need VPA in your Kubernetes cluster before installing it.
+You don't need a rearing cage, but need VPA in your Kubernetes cluster before installing it.
+
+## Motivation
+
+Many developers are working in Mercari, and not all of them are the experts of Kubernetes. 
+The platform has many tools and guides to simplify the task of optimizing resource requests, 
+but it takes a lot of human effort because the situation around the applications gets changed very frequently and we have to keep optimizing them every time. 
+(e.g., the implementation change could change the resource consumption, the amount of traffic could be changed, etc)
+
+Also, there are another important component to be optimized for the optimization, which is HorizontalPodAutoscaler. 
+It’s not a simple problem which we just set the target utilization as high as possible – 
+there are many scenarios where the actual resource utilization doesn’t reach the target resource utilization 
+(because of multiple containers, minReplicas, container’s size etc).
+
+To reduce the human effort to keep optimizing the workloads, 
+the platform team start to have Tortoise , which is designed to simplify the interface of autoscaling.
+
+It aims to move the responsibility of optimizing the workloads from the application teams to tortoises. 
+Application teams just need to set up Tortoise, and the platform team will never bother them again for the resource optimization - 
+all actual optimization is done by Tortoise automatically. 
 
 ## Usage
 
-Tortoise, they only need the deployment name.
+Tortoise has a very simple interface:
 
 ```yaml
-apiVersion: autoscaling.mercari.com/v1beta2
+apiVersion: autoscaling.mercari.com/v1beta3
 kind: Tortoise
 metadata:
   name: lovely-tortoise
   namespace: zoo
 spec:
-  updateMode: Auto
+  updateMode: Auto 
   targetRefs:
     scaleTargetRef:
       kind: Deployment
       name: sample
 ```
 
-Tortoise, then they'll prepare/keep adjusting HPA and VPA to achieve efficient autoscaling based on the past behavior of the workload.
+Yet, beneath its unassuming shell, lies a wealth of historical resource usage data, cunningly harnessed 
+to deftly orchestrate HPA and VPA with finely-tuned parameters.
+
+Please refer to [User guide](./docs/user-guide.md) for other parameters.
 
 ## Documentations 
 
-- [Concept](./docs/concept.md): describes a brief overview of tortoise.
-- [Horizontal scaling](./docs/horizontal.md): describes how the Tortoise does the horizontal autoscaling.
-- [Vertical scaling](./docs/vertical.md): describes how the Tortoise does the vertical autoscaling.
+- [User guide](./docs/user-guide.md): describes a minimum knowledge that the end-users have to know, 
+and how they can configure Tortoise so that they can let tortoises autoscale their workloads.
+- [Admin guide](./docs/admin-guide.md): describes how the cluster admin can configure the global behavior of tortoise. 
 - [Emergency mode](./docs/emergency.md): describes the emergency mode.
-- [Configurations for admin](./docs/configuration.md): describes how the cluster admin can configure the global behavior via the configuration file. 
+- [Horizontal scaling](./docs/horizontal.md): describes how the Tortoise does the horizontal autoscaling internally.
+- [Vertical scaling](./docs/vertical.md): describes how the Tortoise does the vertical autoscaling internally.
 - [Technically details](./docs/internal.md): describes the technically details of Tortoise. (mostly for the contributors)
 - [Contributor guide](./docs/contributor-guide.md): describes other stuff for the contributor. (testing etc)
 
 ## API definition
 
-- [Tortoise](./api/v1beta2/tortoise_types.go)
+- [Tortoise](./api/v1beta3/tortoise_types.go)
 
 ## Contribution
 
diff --git a/api/v1beta3/tortoise_types.go b/api/v1beta3/tortoise_types.go
index 5fa4b0d6..f16f0571 100644
--- a/api/v1beta3/tortoise_types.go
+++ b/api/v1beta3/tortoise_types.go
@@ -141,7 +141,7 @@ type TargetRefs struct {
 	// HorizontalPodAutoscalerName is the name of the target HPA.
 	// The target of this HPA should be the same as the ScaleTargetRef above.
 	// The target HPA should have the ContainerResource type metric that refers to the container resource utilization.
-	// Please check out the document for more detail: https://github.com/mercari/tortoise/blob/master/docs/horizontal.md#supported-metrics-in-hpa
+	// Please check out the document for more detail: https://github.com/mercari/tortoise/blob/master/docs/horizontal.md#attach-your-hpa
 	// Also, note that you must not edit the HPA directly after you attach the HPA to the tortoise of Auto mode.
 	// Even if you edit your HPA in that case, tortoise will overwrite the HPA with the metrics/values.
 	//
diff --git a/docs/configuration.md b/docs/admin-guide.md
similarity index 98%
rename from docs/configuration.md
rename to docs/admin-guide.md
index 5216545c..ffbee339 100644
--- a/docs/configuration.md
+++ b/docs/admin-guide.md
@@ -1,9 +1,11 @@
-## Configuration for admin
+## Admin guide
 
 <img alt="Tortoise" src="images/eating.jpg" width="400px"/>
 
-The cluster admin can set the global configurations via the configuration file.
-The configuration file is passed via `--config` flag.
+Tortoise exposes a lot of flags to configure tortoises behavior in the cluster.
+
+The cluster admin can set the global configurations via the configuration file,
+and the configuration file is passed via `--config` flag.
 
 ```
 RangeOfMinMaxReplicasRecommendationHours:     The time (hours) range of minReplicas and maxReplicas recommendation (default: 1)
diff --git a/docs/concept.md b/docs/concept.md
deleted file mode 100644
index cc4881a5..00000000
--- a/docs/concept.md
+++ /dev/null
@@ -1,50 +0,0 @@
-## Concept
-
-<img alt="Tortoise" src="images/tortoise.jpg" width="400px"/>
-
-The resource management in Kubernetes world is difficult today,
-there are many options on your table (HPA, VPA, KEDA, etc) at first, 
-there are many parameters on them,
-and you want to reduce the wasted resources as long as possible with any of them, 
-but at the same time, you need to keep the reliability of workloads.
-
-Tortoise, it aims to solve such complicated situation by system
-- give recommended values to Autoscalers from the controller and keep update them.
-- use historical resource usage of target workloads to calculate the recommended values on parameters while ensuring the safety.
-- expose only few configurations to users.
-
-### General design
-
-We only allow users to configure:
-- The way to do autoscaling (vertical or horizontal) for each container.
-  - In most cases, it should be OK to leave this configuration empty. Tortoise will use `Horizontal` for CPU and `Vertical` for memory. 
-- The minimum amount of resources given to each container. (optional)
-  - In most cases, it should be OK to leave this configuration empty as well. Tortoise will ensure safety of the resource reduction based on the values suggested by VPA.
-  - But, the application developers may want to increase the resource request before they bring something big to workloads which will affect the resource usage very much.
-
-But, for the cluster admin, we allow some global configurations 
-so that the cluster admin can make Tortoises fit their general workloads characteristic.
-
-See [Flag configurations for admin](./flag-configuration.md).
-
-### How do workloads exactly get scaled?
-
-See each document:
-- [Horizontal scaling](./horizontal.md) 
-- [Vertical scaling](./vertical.md)
-
-### Emergency mode
-
-We also have the concept "emergency mode" in Tortoise, 
-which can be used when the workloads need to get scaled up in an unusual case.
-
-See the document for more detail: [The emergency mode](./emergency.md)
-
-## Side Notes
-
-It's implemented based on our experience in mercari.com
-
-- Our workloads are mostly Golang HTTP/GRPC server.
-- Our workloads mostly get traffic from people in the same timezone, and the demand of resources is usually very similar to the same time one week ago.
-
-Depending on how your workloads look like, tortoise may or may not fit your workloads.
diff --git a/docs/emergency.md b/docs/emergency.md
index 93e0c8f3..98cb2d2a 100644
--- a/docs/emergency.md
+++ b/docs/emergency.md
@@ -11,17 +11,16 @@ you can turn on the emergency mode by setting `Emergency` on `.spec.UpdateMode`
 
 ### How emergency mode works
 
-When emergency mode is enabled, tortoise increases the `minReplicas` to the same value as `maxReplicas`.
+When emergency mode is enabled, tortoise increases the `minReplicas` of HPA to the same value as `maxReplicas`.
 
 As described in [Horizontal scaling](./horizontal.md), `maxReplicas` gets changed to be fairly higher value every hour.
 So, during emergency mode, the replicas will be kept fairly high value calculated from the past behavior for the safety.
 
-### turning emergency mode off
+### Turn off emergency mode 
 
 Also, for the safety, after reverting `UpdateMode` from `Emergency` to `Auto`,
-
 Tortoise tries to reduce the number of replicas to the original value gradually.
-(A sudden decrease is mostly dangerous.)
+(A sudden decrease in a replica number is often dangerous.)
 
 Specifically, the controller reduces `minReplicas` to the original value gradually by the following formula in one reconciliation:
 
@@ -33,5 +32,5 @@ During gradually reducing the `minReplicas`, the Tortoise is in the `BackToNorma
 
 ### Note
 
-Emergency mode is available for tortoises with `Running` or `BackToNormal` phase.
+Emergency mode is only available for tortoises with `Running` or `BackToNormal` phase.
 (because it requires enough historical data to work on)
diff --git a/docs/horizontal.md b/docs/horizontal.md
index c98c0032..1650ce3e 100644
--- a/docs/horizontal.md
+++ b/docs/horizontal.md
@@ -7,7 +7,18 @@ by setting `Horizontal` in `Spec.ResourcePolicy[*].AutoscalingPolicy`
 
 For `Horizontal` resources, Tortoise keeps changing the corresponding HPA's fields with the recommendation value calculated from the historical usage.
 
-Let's get into detail how each field gets changed.
+### Configure Horizontal scaling
+
+#### Attach your HPA
+
+You can attach your HPA via `.spec.targetRefs.HorizontalPodAutoscalerName`.
+
+Currently, Tortoise supports only `type: ContainerResource` metric. 
+
+If HPA has `type: Resource` metrics, Tortoise just removes them because they'd be conflict with `type: ContainerResource` metrics managed by Tortoise.
+If HPA has metrics other than `Resource` or `ContainerResource`, Tortoise just keeps them. 
+
+### How Tortoise 
 
 ### MaxReplicas
 
@@ -21,7 +32,7 @@ max{replica numbers at the same time on the same day of week} * MaxReplicasFacto
 max{replica numbers at the same time} * MaxReplicasFactor
 ```
 
-(refer to [configuration.md](./configuration.md) about each parameter)
+(refer to [admin-guide.md](./admin-guide.md) about each parameter)
 
 It only takes the num of replicas of the last 4 weeks into consideration.
 
@@ -37,7 +48,7 @@ max{replica numbers at the same time on the same day of week} * MinReplicasFacto
 max{replica numbers at the same time} * MinReplicasFactor
 ```
 
-(refer to [configuration.md](./configuration.md) about each parameter)
+(refer to [admin-guide.md](./admin-guide.md) about each parameter)
 
 It only takes the num of replicas of the last 4 weeks into consideration.
 
@@ -72,10 +83,6 @@ Looking back the above formula,
   - make all container's resource utilization below 100%.
 - Thus, finally `100 - (max{recommended resource usage from VPA}/{current resource request} - {current target utilization})` means the target utilization which only give the bare minimum additional resources.
 
-#### Supported metrics in HPA
-
-Currently, Tortoise supports only `type: ContainerResource` metric. 
-
 ### The container right sizing
 
 Although it says "Horizontal", 
diff --git a/docs/user-guide.md b/docs/user-guide.md
new file mode 100644
index 00000000..e268ec68
--- /dev/null
+++ b/docs/user-guide.md
@@ -0,0 +1,155 @@
+## User guide
+
+<img alt="Tortoise" src="images/tortoise.jpg" width="400px"/>
+
+This page describes a minimum knowledge that the end-users have to know, 
+and how they can configure Tortoise so that they can let tortoises autoscale their workloads.
+
+### How tortoise works
+
+Actually, Tortoise itself doesn't directly change your Pod's resource request or the number of replicas.
+It has HorizontalPodAutoscaler and VerticalPodAutoscaler under the hood, 
+and your tortoise just keeps updating them to be well-optimized based on your workload's historical resource usage.
+
+### Configuration overview
+
+Tortoise is designed to be a very simple configuration:
+
+```yaml
+apiVersion: autoscaling.mercari.com/v1beta3
+kind: Tortoise
+metadata:
+  name: lovely-tortoise
+  namespace: zoo
+spec:
+  updateMode: Auto # enable autoscaling.
+  targetRefs:      # which workload this tortoise autoscales.
+    scaleTargetRef:
+      kind: Deployment
+      name: sample 
+```
+
+This is the example for a minimum required configuration. 
+
+### updateMode
+
+```yaml
+apiVersion: autoscaling.mercari.com/v1beta3
+kind: Tortoise
+spec:
+...
+  updateMode: Auto 
+```
+
+`.spec.updateMode` could contain three values:
+- `Off` (default): DryRun mode. The tortoise doesn't change anything in your workload or autoscaler.
+- `Auto`: The tortoise keep updating your workload or autoscaler to be optimized.
+- `Emergency`: The tortoise scale up/out your workload to be big enough so that the workload can handle unexpectedly bigger traffic.
+
+#### updateMode: `Off`
+
+`Off` is the default value of `updateMode`. 
+It means a DryRun mode - the tortoise doesn't change anything in your workload or autoscaler.
+
+But, even during `Off` mode, the tortoise actually generates the recommendation for your workload's resource request, and your HPA's target utilization.
+
+You can observe the recommendation values with these metrics:
+- `mercari.tortoise.proposed_cpu_request`: CPU request a tortoise proposes.
+- `mercari.tortoise.proposed_memory_request`: memory request that a tortoise proposes.
+- `mercari.tortoise.proposed_hpa_minreplicas`: HPA `.spec.minReplicas` that a tortoise proposes.
+- `mercari.tortoise.proposed_hpa_maxreplicas`: HPA `.spec.maxReplicas` that a tortoise proposes.
+- `mercari.tortoise.proposed_hpa_utilization_target`: HPA `.spec.metrics[*].containerResource.target.averageUtilization` that a tortoise proposes.
+
+#### updateMode: `Auto`
+
+`Auto` is a update mode to let tortoise keep updating your workload or autoscaler to be optimized.
+
+#### updateMode: `Emergency`
+
+`Emergency` is a update mode to enable the emergency mode.
+Please refer to [Emergency mode](./emergency.md) for more details.
+
+### `.spec.AutoscalingPolicy`
+
+There are two primary options for configuring resource scaling within containers:
+1. Allow Tortoise to automatically determine the appropriate autoscaling policy for each resource.
+2. Manually define the autoscaling policy for each resource.
+
+The AutoscalingPolicy field is mutable; you can modify it at any time, whether from an empty state to populated or vice versa.
+
+#### 1. Allow Tortoise to automatically determine the appropriate autoscaling policy for each resource
+
+To do this, you simply leave `.spec.AutoscalingPolicy` unset. 
+
+In this case, Tortoise will adjust the autoscaling policies using the following logic:
+- If `.spec.TargetRefs.HorizontalPodAutoscalerName` is not provided, the policies default to "Horizontal" for CPU and "Vertical" for memory across all containers.
+- If `.spec.TargetRefs.HorizontalPodAutoscalerName` is specified, resources governed by the referenced Horizontal Pod Autoscaler will use a "Horizontal" policy,
+while those not managed by the HPA will use a "Vertical" policy.
+Note that Tortoise supports only the `ContainerResource` metric type for HPAs; other metric types will be disregarded.
+Additionally, if a `ContainerResource` metric is later added to an HPA associated with Tortoise,
+Tortoise will automatically update relevant resources to utilize a `Horizontal` policy in AutoscalingPolicy.
+
+#### 2. Manually define the autoscaling policy for each resource.
+
+With the second option, you must manually specify the AutoscalingPolicy for the resources of each container within this field.
+
+```yaml
+apiVersion: autoscaling.mercari.com/v1beta3
+kind: Tortoise
+spec:
+...
+  autoscalingPolicy: 
+    - containerName: istio-proxy
+      policy:
+        cpu: Horizontal
+        memory: Vertical
+    - containerName: app
+      policy:
+        cpu: Horizontal
+        memory: Vertical
+```
+
+AutoscalingPolicy is an optional field for specifying the scaling approach for each resource within each container.
+- `Horizontal`: Tortoise increases the replica number when the resource utilization goes up.
+- `Vertical`: Tortoise scales up the resource given to the container when the resource utilization goes up.
+- `Off`(default): Tortoise doesn't look at the resource of the container at all. 
+
+If policies are defined for some but not all containers or resources, Tortoise will assign a default `Off` policy to unspecified resources.
+Be aware that when new containers are introduced to the workload, the AutoscalingPolicy configuration must be manually updated 
+if you want to configure autoscaling for a new container,
+as Tortoise will default to an `Off` policy for resources within the new container, preventing scaling.
+
+### `.spec.DeletionPolicy`
+
+```yaml
+apiVersion: autoscaling.mercari.com/v1beta3
+kind: Tortoise
+spec:
+...
+  deletionPolicy: "DeleteAll"
+```
+
+DeletionPolicy is the policy how the controller deletes associated HPA and VPAs when tortoise is removed.
+
+- `DeleteAll`: tortoise deletes all associated HPA and VPAs, created by tortoise. 
+But, if the associated HPA is not created by tortoise, that is associated by `spec.targetRefs.horizontalPodAutoscalerName`, 
+tortoise doesn't delete the HPA even with `DeleteAll`.
+- `NoDelete`(default): tortoise doesn't delete any associated HPA and VPAs.
+
+### `.spec.ResourcePolicy`
+
+```yaml
+apiVersion: autoscaling.mercari.com/v1beta3
+kind: Tortoise
+spec:
+...
+  resourcePolicy:
+    - containerName: istio-proxy
+      minAllocatedResources:
+        cpu: "4"
+```
+
+ResourcePolicy contains the policy how each resource is updated.
+It currently only contains `minAllocatedResources` to indicate the minimum amount of resources which is given to the container.
+e.g., if `minAllocatedResources` is configured as the above example, Tortoise won't set cpu smaller than `4` in `istio-proxy` container
+even if the autoscaling policy for `istio-container` cpu is `Vertical` and VPA suggests changing cpu smaller than `4`.