Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

installer/pkg/config: Support loading InstallConfig YAML #236

Closed
wants to merge 6 commits into from

Conversation

wking
Copy link
Member

@wking wking commented Sep 12, 2018

So callers can start transitioning to the new approach. For a short transitional period, we'll continue to support the old config format as well.

There are a few smaller commits in this PR to setup for the meat in the final commit. Details in the commit messages.

I've only run the Go unit tests locally, so I've stuck a WIP tag on this PR until we get some through the integration tests; I wouldn't be surprised if lots of things are still broken ;).

CC @dgoodwin.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 12, 2018
@openshift-ci-robot openshift-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Sep 12, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 12, 2018
@wking
Copy link
Member Author

wking commented Sep 12, 2018

@crawford, @abhinavdahiya, while working this up, I had a few thoughts about the InstallConfig struct. I'm happy to address some/all of these it this PR, or drop them, or punt to follow-up work.

Thoughts?

"net"
"reflect"

yaml "gopkg.in/yaml.v2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using github.com/ghodss/yaml

Kubernetes only does JSON marshal/unmarshal; github.com/ghodss/yaml does the jsontoyaml.
No need for special YAML parsing.

Copy link
Member Author

@wking wking Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes only does JSON marshal/unmarshal; github.com/ghodss/yaml does the jsontoyaml.

Adding explicit YAML support to pkg/ipnet was only 40 lines of non-test additions, so it seemed reasonable to me. Are we sure this is only ever going to be consumed internally?

The immediate motivation was that installer/pkg/config currently uses github.com/ghodss/yaml, presumably to support legacy properties with different YAML and JSON serialization (like BaseDomain). If we wanted to only support github.com/ghodss/yaml with ipnet and pkg/types, then we'd need to import both packages in installer/pkg/config with distinct names and use the right name depending on whether we wanted the nominally-JSON or nominally-YAML tags for a given (un)marshal. It seemed less confusing to just add support for the library installer/pkg/config was already using.

Copy link
Contributor

@abhinavdahiya abhinavdahiya Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseDomain string `json:"tectonic_base_domain,omitempty" yaml:"baseDomain,omitempty"`
this was done because they didn't want to write code to create terraform.tfvars.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was done because they didn't want to write code to create terraform.tfvars.

So if I drop our direct gopkg.in/yaml.v2 use, do you want me to write separate code to create terraform.tfvars?

Copy link
Contributor

@abhinavdahiya abhinavdahiya Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

terraform.tfvars is definitely a separate structure.
we shouldn't have done what we did with installer.Config to make it directly marshal into terraform.tfvars.
The installer.Config ended up with fields like IgnitionMasters that have yaml:"-", as we didn't want user's to specify.


metav1.ObjectMeta `json:"metadata"`
metav1.ObjectMeta `json:"metadata" yaml:"metadata"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't add specific YAML tags, Kubernetes doesn't do that.

examples/aws.yaml Outdated Show resolved Hide resolved
@abhinavdahiya
Copy link
Contributor

We've don't have a field for extra AWS tags (previously here). I think these are useful for setting up things like expirationDate (openshift/release#1103). Do we want to add them to AWSPlatform? Or are folks who need this supposed to reach in and tweak generated assets (once we get to the full net-gen workflow)?

I think a single platform level extraargs makes sense.

Setting the libvirt image path for each machine pool is tedious. Can we add a cross-pool fallback to LibvirtPlatform?

the config is a one time thing. we do each pool instance type in AWS too. I would want uniformity.

platformConfig is redundant (all of these are configs). Can we rename it to just platform?

sure, sounds okay.

We have several properties which are structures instead of pointers (e.g. Networking). If we make those pointers, we can get working omitempty handling for them (golang/go#11939) at the cost of slightly more complicated Go handling (but with slightly more efficient shallow copies ;).

We should try to have no embedded structs, explicit are more clearer when marshal and pointers make sense only when they are optional or one-of (AWS Or Libvirt). Networking doesn't seem like optional.

@wking
Copy link
Member Author

wking commented Sep 12, 2018

Setting the libvirt image path for each machine pool is tedious. Can we add a cross-pool fallback to LibvirtPlatform?

the config is a one time thing.

It's still nice to be convenient ;). And with #205, it seems like it might be something that is edited occasionally over the life of the cluster.

we do each pool instance type in AWS too.

But I don't have to tweak those because it's easy to give them sane defaults. I do have to provide an image path, at least until we get code that automatically grabs this from the network (and caches it somewhere for future runs?) when the caller doesn't provide a path.

I would want uniformity.

I'm fine with AWS-wide defaults for those as well, if you're ok with the cross-pool fallback approach. Should I add a LibvirtMachinePoolPlatformConfig property (as defaultMachinePlatform?) to Libvirt (and similarly for AWS). That would allow something like:

platform:
  libvirt:
    uri: qemu:///system
    defaultMachinePlatform:
      qcowImagePath: /path/to/default-image

machines:
  - name: master
    replicas: 1
    platform:
      libvirt:
        qcowImagePath: /path/to/special-case-image

  - name: worker
    replicas: 2

Networking doesn't seem like optional.

No? We currently have defaults for all of its properties, and making it omitempty (and a pointer) would allow us to write configs from Go that said "use whatever default networking you like, I don't care".

@abhinavdahiya
Copy link
Contributor

No? We currently have defaults for all of its properties, and making it omitempty (and a pointer) would allow us to write configs from Go that said "use whatever default networking you like, I don't care".

We default, so that all the operators that depend on networking block don't need to separately make decisions about the defualts.

@abhinavdahiya
Copy link
Contributor

I'm fine with AWS-wide defaults for those as well, if you're ok with the cross-pool fallback approach. Should I add a LibvirtMachinePoolPlatformConfig property (as defaultMachinePlatform?) to Libvirt (and similarly for AWS). That would allow something like:

I'm fine if we make that default for all platforms.

@wking
Copy link
Member Author

wking commented Sep 12, 2018

No? We currently have defaults for all of its properties, and making it omitempty (and a pointer) would allow us to write configs from Go that said "use whatever default networking you like, I don't care".

We default, so that all the operators that depend on networking block don't need to separately make decisions about the defualts.

This makes sense to me when we write to the cluster (#205). I'm less clear on whether it makes sense in all cases. What if someone wants to use our package to write a cluster config before feeding it into the installer? We don't want to have to make them create their own defaults for all of these settings.

@abhinavdahiya
Copy link
Contributor

We don't want to have to make them create their own defaults for all of these settings.

Only new openshift-installer init will make defaulting decisions. everybody else coming with InstallConfig directly must come with Networking options complete.

@wking
Copy link
Member Author

wking commented Sep 12, 2018

We don't want to have to make them create their own defaults for all of these settings.

Only new openshift-installer init will make defaulting decisions. everybody else coming with InstallConfig directly must come with Networking options complete.

I think you're thinking of other consumers who are loading an InstallConfig. What about other consumers who are writing an install config to feed into init? For example:

  1. Some personal generator uses our package to create an InstallConfig YAML. They feed that into...
  2. openshift-install init, which injects default opinions for networking, etc.
  3. openshift-install something-else pushes the fully-defaulted YAML into the cluster.
  4. Operators fetch the fully-defaulted YAML from the cluster.

You want to make sure (4) doesn't need local defaults. I'm fine with that. I'd also like (1) to not need local defaults, and for that, I think we need omitempty for everything that can have a default in (2).

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 13, 2018
@@ -42,6 +36,126 @@ func ParseConfig(data []byte) (*Cluster, error) {
return &cluster, nil
}

func parseInstallConfig(data []byte, cluster *Cluster) (err error) {
installConfig := &types.InstallConfig{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if the defaulting, and validation were both extracted separately here such that external callers could vendor and use that code. We're still trying to determine if we can re-use your InstallConfig or we have to use our own type and generate InstallConfig, but it might be good to do this regardless for futureproofing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if the defaulting, and validation were both extracted separately here such that external callers could vendor and use that code.

New external consumers should be using pkg/types directly; everything under installer/pkg should be considered deprecated. Having a validator for these types under pkg/ makes sense (currently pkg/asset/installconfig has user-input-time validation, but that's not useful for downstream Go consumers reading this from the cluster, e.g. openshift/machine-config-operator#47). And yeah, we'll probably need to expose the defaulting for tools that are monitoring InstallConfig to pick up post-install changes. But that will also be a pkg/ change, and this PR is trying to focus on getting support in the legacy installer/pkg/config (which has its own validation and defaulting approach). Can I punt the new APIs to follow-up work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally fine with me.

}
}
default:
return fmt.Errorf("unrecognized machine pool %q", machinePool.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These names feel more like types. Additionally it looks like there's not a lot of validation here, do we require two pools with these exact names? What should happen if we see multiple worker or master pool names?

Maybe too preliminary for that but just wanted to mention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These names feel more like types.

I agree. @abhinavdahiya, can I replace:

// Machines is the list of MachinePools that need to be installed.
Machines []MachinePool `json:"machines"`

with:

Master MachinePool `json:"master"

Worker MachinePool `json:"worker"

or similar?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would leave us with a little issue in Hive. If we use InstallConfig directly in our ClusterDeployment, this would be the long term definition of all machine pools. If we go to just one master, and one worker, then we would need to duplicate this info higher up in our own portion of ClusterDeployment for the long term definition of what machine pools should exist. That would mean a little duplication of data, and a strange section of install config that was only used during install.

Kinda feels like this should be capable of being the canonical source of machine sets that should exist in the cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking I think we decided that machinepools remain as is for now. no changes. we can expand them if needed later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go to just one master, and one worker...

Just to be clear, I was suggesting one master pool (and one worker pool).

I think we decided that machinepools remain as is for now. no changes

So just die screaming if master (say) is listed multiple times or not listed at all? And silently ignore entries with names that aren't master or worker?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest leaving name a freeform field, add an explicit type, technically the two types should probably be "control-plane" and "compute", verify you have exactly one control-plane, and at least one compute pool.

@wking wking force-pushed the install-config-for-old-cmd branch from 13e0f52 to 2faf6de Compare September 13, 2018 20:26
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 13, 2018
@wking wking force-pushed the install-config-for-old-cmd branch from 2faf6de to 3de9cef Compare September 13, 2018 20:35
@wking
Copy link
Member Author

wking commented Sep 13, 2018

I've rebased this around #231 and #239 with 13e0f52 -> 3de9cef, which also adds support for loading userTags from InstallConfig YAML files.

@wking wking force-pushed the install-config-for-old-cmd branch from 3de9cef to 41f66f4 Compare September 13, 2018 20:38
@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 14, 2018
@wking wking force-pushed the install-config-for-old-cmd branch from 41f66f4 to a87bab7 Compare September 14, 2018 10:19
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 14, 2018
@wking
Copy link
Member Author

wking commented Sep 14, 2018

Rebased around #119 with 41f66f4 -> a87bab7.

@wking
Copy link
Member Author

wking commented Sep 14, 2018

The smoke error was:

2018/09/14 10:27:17 Container cli in pod e2e-aws-smoke completed successfully
2018/09/14 13:27:13 Copying artifacts from e2e-aws-smoke into /logs/artifacts/e2e-aws-smoke
2018/09/14 13:27:13 error: unable to signal to artifacts container to terminate in pod e2e-aws-smoke, triggering deletion: could not run remote command: unable to upgrade connection: container not found ("artifacts")
2018/09/14 13:27:13 error: unable to retrieve artifacts from pod e2e-aws-smoke: could not read gzipped artifacts: unable to upgrade connection: container not found ("artifacts")
E0914 13:27:18.024001      11 event.go:200] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:".1554472e885dec97", GenerateName:"", Namespace:"ci-op-w3lsmhv3", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, InvolvedObject:v1.ObjectReference{Kind:"", Namespace:"ci-op-w3lsmhv3", Name:"", UID:"", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"CiJobFailed", Message:"Running job pull-ci-openshift-installer-e2e-aws-smoke for PR https://github.com/openshift/installer/pull/236 in namespace ci-op-w3lsmhv3 from author wking", Source:v1.EventSource{Component:"ci-op-w3lsmhv3", Host:""}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbedf0bad81515097, ext:11210761228709, loc:(*time.Location)(0x19b8fc0)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbedf0bad81515097, ext:11210761228709, loc:(*time.Location)(0x19b8fc0)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events ".1554472e885dec97" is forbidden: unable to create new content in namespace ci-op-w3lsmhv3 because it is being terminated' (will not retry!)
2018/09/14 13:27:19 Ran for 3h6m51s
error: could not run steps: could not wait for pod to complete: could not wait for pod completion: pod e2e-aws-smoke was already deleted

Looks like something just hung. In case it was a flake, I'll kick it again:

/retest

but this may be something I've broken ;).

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 15, 2018
For folks using gopkg.in/yaml.v2.  Folks using github.com/ghodss/yaml
will use the JSON (un)marshallers.  But supporting yaml.v2 is
convenient for consumers who are already importing that package
(e.g. because the want different JSON and YAML serializations).
I'd missed this while rerolling 89f05da (pkg/types/installconfig: Add
AWSPlatform.UserTags, 2018-09-12, openshift#239).

I've also updated the UserTags comment to use "for the cluster"
instead of "by the cluster".  Resources can be created by the
installer (not part of the cluster) or by operators living inside the
cluster, but regardless of the creator, these cluster resources should
get UserTags.
github.com/ghodss/yaml leans on JSON tags and (un)marshallers, but
yaml.v2 depends on distinct yaml tags like the ones I'm adding here.

This commit also adds a number of omitempty tags for a number of
properties that could have reasonable defaults.  This keeps the output
more compact when serializing a partially-initialized structure.
This is now the OpenShift installer, not the Tectonic installer.  And
the context of the examples is clear enough without the prefix.
So callers can start transitioning to the new approach.  For a short
transitional period, we'll continue to support the old config format
as well.
Generated with:

  $ bazel run //:gazelle

using:

  $ bazel version
  Build label: 0.16.1- (@Non-Git)
  Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
  Build time: Mon Aug 13 16:42:29 2018 (1534178549)
  Build timestamp: 1534178549
  Build timestamp as int: 1534178549
@wking wking force-pushed the install-config-for-old-cmd branch from a87bab7 to 48f3a4f Compare September 15, 2018 04:08
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 15, 2018
@wking wking changed the title WIP: installer/pkg/config: Support loading InstallConfig YAML installer/pkg/config: Support loading InstallConfig YAML Sep 15, 2018
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 15, 2018
@wking
Copy link
Member Author

wking commented Sep 15, 2018

Rebased around #206 with a87bab7 -> 48f3a4f.

@wking
Copy link
Member Author

wking commented Sep 15, 2018

Smoke passed, so the e2e-aws:

Waiting for API at https://ci-op-fw6dxmqd-5849d-api.origin-ci-int-aws.dev.rhcloud.com:6443 to respond ...
Interrupted

must be a flake.

/retest

@wking
Copy link
Member Author

wking commented Sep 17, 2018

We're intending to switch to cmd/openshift-install this week, which means we don't need to bother with this interim step. I've spun off #260 and #261 for the commits that will still be useful in the cmd/openshift-install world.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants