docs update

Bringing the READMEs in line with recent code changes. Need to update the wiki next. Also some tweaks to the example YAML.
joel-bluedata · Jan 14, 2019 · 5f66315 · 5f66315
1 parent 3008211
commit 5f66315
Show file tree

Hide file tree

Showing 7 changed files with 62 additions and 41 deletions.
diff --git a/deploy/example_clusters/cr-cluster-centos7-stor.yaml b/deploy/example_clusters/cr-cluster-centos7-stor.yaml
@@ -5,19 +5,7 @@ metadata:
 spec:
   app: centos7x
   roles:
-  - id: utility
-    members: 1
-    resources:
-      requests:
-        memory: "4Gi"
-        cpu: "2"
-      limits:
-        memory: "4Gi"
-        cpu: "2"
-    storage:
-      size: "40Gi"
-      storageClassName: standard
-  - id: extra
+  - id: singlehost
     members: 1
     resources:
       requests:

diff --git a/deploy/example_clusters/cr-cluster-centos7.yaml b/deploy/example_clusters/cr-cluster-centos7.yaml
@@ -0,0 +1,16 @@
+apiVersion: "kubedirector.bluedata.io/v1alpha1"
+kind: "KubeDirectorCluster"
+metadata:
+  name: "centos7-persistent"
+spec:
+  app: centos7x
+  roles:
+  - id: singlehost
+    members: 1
+    resources:
+      requests:
+        memory: "4Gi"
+        cpu: "2"
+      limits:
+        memory: "4Gi"
+        cpu: "2"
diff --git a/...ample_clusters/cr-cluster-spark-stor.yaml → ..._clusters/cr-cluster-spark221e2-stor.yaml b/...ample_clusters/cr-cluster-spark-stor.yaml → ..._clusters/cr-cluster-spark221e2-stor.yaml
diff --git a/doc/app-authoring.md b/doc/app-authoring.md
@@ -4,7 +4,7 @@ This doc assumes that you are familiar with the topics covered on the [KubeDirec
 
 You should also be familiar with the process of [creating and managing virtual clusters with KubeDirector](virtual-clusters.md).
 
-The "deploy/example_catalog" directory contains several KubeDirectorApp resources that are applied when you do "make deploy". These determine what kinds of virtual clusters can be deployed using KubeDirectorCluster resources. Each resource also identifies the Docker image(s) and app setup package(s) that it uses. Before authoring new app definitions, examine these current examples and the contents of each component. Currently the Cassandra example is the easiest to understand, with TensorFlow a close runner-up.
+The "deploy/example_catalog" directory contains several KubeDirectorApp resources that are applied when you do "make deploy". These determine what kinds of virtual clusters can be deployed using KubeDirectorCluster resources. Each resource also identifies the Docker image(s) and app setup package(s) that it uses. Before authoring new app definitions, examine these current examples and the contents of each component. Currently the Cassandra example is the easiest non-trivial example to understand, with TensorFlow a close runner-up.
 
 The simplest authoring task would involve making a modification to an existing image or setup package, and then making a modified KubeDirectorApp to reference the modified artifact (and possibly accomodate other roles or services). A modified version of an existing KubeDirectorApp should keep the same "distro_id" value but have a new "version" and a new metadata name; currently there is not a more sophisticated framework for KubeDirectorApp versioning.
 
@@ -18,9 +18,9 @@ The KubeDirectorApp resource is the only component that will be hosted by the K8
 
 A Docker image must be hosted at a registry that is accessible to the K8s nodes, since K8s will need to pull that image in order to deploy containers.
 
-An app setup package must be hosted on a webserver that is accessible to the container network, since a process within the container will download it. (The hosting and network-accessibility requirements for app setup packages are under discussion.)
+An app setup package will usually be hosted on a webserver that is accessible to the container network, since a process within the container will download it. (The hosting and network-accessibility requirements for app setup packages are under discussion.) Alternately this package can reside on the Docker image.
 
-Part of establishing a successful app definition authoring workflow is the ability to quickly revise these hosted components. For app setup package in particular, S3 bucket hosting has proven useful.
+Part of establishing a successful app definition authoring workflow is the ability to quickly revise these hosted components. For the app setup package in particular, S3 bucket hosting has proven useful. An app setup package stored on the Docker image is less amenable to quick revision. The examples later in this document will assume a web-hosted package.
 
 #### REGISTERING THE KUBEDIRECTORAPP
 

diff --git a/doc/gke-notes.md b/doc/gke-notes.md
@@ -4,35 +4,42 @@ If you're starting from scratch with GKE, the first few sections of the [GKE Qui
 
 With gcloud configured to use the appropriate project, you can then launch a GKE cluster. For example, this gcloud command will create a 3-node GKE cluster named "my-gke":
 ```bash
-gcloud container clusters create my-gke --machine-type n1-highmem-4
+    gcloud container clusters create my-gke --machine-type n1-highmem-4
 ```
 (See [the Machine Types list](https://cloud.google.com/compute/docs/machine-types) for the details of the available GKE node resources.)
 
 If you need to grow your GKE cluster you can use gcloud to do that as well; for example, growing to 5 nodes:
 ```bash
-gcloud container clusters resize my-gke --size=5
+    gcloud container clusters resize my-gke --size=5
 ```
 
 Once your GKE cluster has been created, you will need to set up your kubectl credentials to access it. First, create a kubectl config context for the cluster:
 ```bash
-gcloud container clusters get-credentials my-gke
+    gcloud container clusters get-credentials my-gke
 ```
 
 And to deploy KubeDirector into this cluster, you will need for the user in that kubectl context (which is tied to your Google account credentials) to have the cluster-admin role in the cluster.
 ```bash
-# This should be the email that is associated with the Google account that
-# gcloud is using.
-ACCOUNT="[email protected]"
-kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=${ACCOUNT}
+    # This should be the email that is associated with the Google account that
+    # gcloud is using.
+    ACCOUNT="[email protected]"
+    kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=${ACCOUNT}
 ```
 
-From here you can proceed to deploy KubeDirector and work with virtual clusters normally. Cf. the other doc files such as [quickstart.md](quickstart.md) and [virtual-clusters.md](virtual-clusters.md).
+From here you can proceed to deploy KubeDirector as described in [quickstart.md](quickstart.md).
 
-When you're finished, you can destroy the GKE cluster:
+Note that after deploying KubeDirector but before creating virtual clusters, you will want to apply a KubeDirector configuration suitable for GKE:
 ```bash
-gcloud container clusters delete my-gke
+    kubectl create -f deploy/example_config/cr-config-gke.yaml
 ```
 
-This will also delete the related context from the kubectl config.
+Now you can deploy virtual clusters as described in [virtual-clusters.md](virtual-clusters.md).
+
+When you're finished working with KubeDirector, you can destroy the GKE cluster:
+```bash
+    gcloud container clusters delete my-gke
+```
+
+This will also delete the related context from your kubectl config.
 
 If you have some other context that you wish to return to using at this point, you will want to run "kubectl config get-contexts" to see which contexts exist, and then use "kubectl config use-context" to select one.
diff --git a/doc/quickstart.md b/doc/quickstart.md
@@ -48,7 +48,18 @@ If you have set the repo to a commit tagged with a KubeDirector release version,
 
 Once KubeDirector is deployed, you may wish to observe its activity by using "kubectl logs -f" with the KubeDirector pod name (which is printed for you at the end of "make deploy"). This will continuously tail the KubeDirector log.
 
-KubeDirector is now running. You can create and manage virtual clusters as described in [virtual-clusters.md](virtual-clusters.md).
+KubeDirector is now running. You can create and manage virtual clusters as described in [virtual-clusters.md](virtual-clusters.md). But first you may want to set a default configuration for some cluster properties.
+
+#### CONFIGURING KUBEDIRECTOR
+
+Before creating any virtual clusters, you should configure KubeDirector to set some defaults. This is done by creating a [KubeDirectorConfig object](https://github.com/bluek8s/kubedirector/wiki/App-Definition-Authoring-for-KubeDirector). Example KubeDirectorConfig objects are provided in the "deploy/example_config" directory for GKE ("cr-config-gke.yaml"), for a generic local K8s installation ("cr-config.yaml"), and for OpenShift ("cr-config-okd.yaml"). (Note however that OpenShift deployments are not currently officially supported; cf. the [known issues](https://github.com/bluek8s/kubedirector/issues/1)). You can use one of these example configs or create one that is tailored to your environment.
+
+For example, typically for a GKE deployment you would execute this command:
+```bash
+    kubectl create -f deploy/example_config/cr-config-gke.yaml
+```
+
+If you want to change this configuration at any time, you can edit the config file and use "kubectl apply" to apply the changes. Keep in mind that the defaults specified in this config are only referenced at the time a virtual cluster is created; changing this config will not retroactively affect any existing virtual clusters.
 
 #### TEARDOWN
 

diff --git a/doc/virtual-clusters.md b/doc/virtual-clusters.md
@@ -1,19 +1,17 @@
 #### DEPLOYING VIRTUAL CLUSTERS
 
+Before you deploy your first virtual cluster, make sure that appropriate defaults have been set as described in the "CONFIGURING KUBEDIRECTOR" section of [quickstart.md](quickstart.md).
+
 The "deploy/example_clusters" directory contains examples of YAML files that can be used to create virtual clusters that instantiate the defined app types. Currently these virtual clusters must be created in the same namespace as the KubeDirector deployment (a restriction that should be relaxed in later development).
 
 For example, this would create an instance of a virtual cluster from the spark221e2 app type:
 ```bash
     kubectl create -f deploy/example_clusters/cr-cluster-spark221e2.yaml
 ```
-or
-```bash
-    kubectl create -f deploy/example_clusters/cr-cluster-spark221e2-gke.yaml
-```
 
-Those two example files illustrate a couple of important differences in how you choose to deploy a cluster, depending on your K8s environment:
-1. The "serviceType" property at the top level of the virtual cluster spec, which defaults to "NodePort" if not specified. For GKE environments you should almost always include this property and set its value to "LoadBalancer". This indicates to KubeDirector that the member services of the virtual cluster should be exposed as LoadBalancer services rather than NodePort.
-2. Persistent storage for roles. If you choose to request this through the "storage" role property, make sure that the nested "storageClass" property is set to a persistent storage class that is valid in your K8s environment (verify using "kubectl get storageclasses").
+You will see that some of the YAML file basenames have the "-stor" suffix. This is just a convention used among these example files to indicate that the virtual cluster spec requests persistent storage. Several of the examples have both persistent and non-persistent variants.
+
+Note that if you are using persistent storage, you should declare a valid defaultStorageClassName when configuring KubeDirector; the example virtual cluster specs will use that default. Alternately you can declare a storageClassName in the persistent storage spec section of the virtual cluster spec.
 
 For more details, see the KubeDirector wiki for a [complete spec of the KubeDirectorCluster resource type](https://github.com/bluek8s/kubedirector/wiki/Type-Definitions-for-KubeDirectorCluster).
 
@@ -33,19 +31,20 @@ To get a report on all services related to a specific virtual cluster, you can u
     kubectl get services -l kubedirectorcluster=spark-instance
 ```
 
-Below is a line from the output of such a query. It shows that port 8080 (the Spark master Web dashboard) on the controller host of a virtual Spark cluster is available on port 30311 of any of the K8s nodes:
+Below is a line from the output of such a query, in a case where KubeDirector was configured to use LoadBalancer services (as on GKE). In this case the Spark master Web dashboard (port 8080) is available through the load-balancer IP 35.197.55.117. The port exposed on the load balancer will be the same as the native container port, 8080. The other information in this line is not relevant for access through the LoadBalancer.
 ```bash
-svc-kd-ggzpd-0   NodePort    10.107.133.249   <none>        22:31394/TCP,8080:30311/TCP,7077:30106/TCP,8081:30499/TCP   12m
+    svc-kd-rmh58-0  LoadBalancer   10.55.240.105   35.197.55.117    22:30892/TCP,8080:31786/TCP,7077:32194/TCP,8081:31026/TCP   2m48s
 ```
 
-As another example, below is a line associated with a different virtual cluster, running in GKE and configured to request a serviceType of LoadBalancer. In this case the Spark master Web dashboard is available through the load-balancer IP 35.197.55.117. The port exposed on the load balancer will be the same as the native container port, 8080.
+As another example, below is a line from a cluster in a different setup where KubeDirector was configured to use NodePort services. It shows that port 8080 on the controller host of a virtual Spark cluster is available on port 30311 of any of the K8s nodes:
 ```bash
-svc-kd-rmh58-0  LoadBalancer   10.55.240.105   35.197.55.117    22:30892/TCP,8080:31786/TCP,7077:32194/TCP,8081:31026/TCP   2m48s
+    svc-kd-ggzpd-0   NodePort    10.107.133.249   <none>        22:31394/TCP,8080:30311/TCP,7077:30106/TCP,8081:30499/TCP   12m
 ```
 
+
 You can use kubectl to examine a specific service resource in order to see more explicitly which ports are for service endpoints. Using "get -o yaml" or "get -o json", rather than "describe", will format the array of endpoints a little more clearly. For example, examining that LoadBalancer service above:
 ```bash
-kubectl get -o yaml service svc-kd-rmh58-0
+    kubectl get -o yaml service svc-kd-rmh58-0
 ```
 will result output that (among other things) contains an array that explicitly names the various endpoints, such as:
 ```
@@ -102,7 +101,7 @@ It may happen that a virtual cluster refuses to go away, either on explicit manu
 
 You can use "kubectl logs" on the KubeDirector pod to see if it stopped (and why). If you are working on KubeDirector development yourself, it may be possible to rescue KubeDirector at this point. However if you simply need to allow the virtual clusters to be deleted, without restoring KubeDirector, you need to remove the "finalizers" from each such cluster resource. Below is a kubectl command that could be used to clear the finalizers from a virtual cluster named "spark-instance":
 ```bash
-kubectl patch kubedirectorcluster spark-instance --type json --patch '[{"op": "remove", "path": "/metadata/finalizers"}]'
+    kubectl patch kubedirectorcluster spark-instance --type json --patch '[{"op": "remove", "path": "/metadata/finalizers"}]'
 ```
 
 Once the finalizers are removed, any already-requested deletion should then complete.