Openshift 4.2 installation on vSphere/ESXI 6.7 #2537

stefanskotte · 2019-10-20T19:39:24Z

Version

$ openshift-install version
openshift-install v4.2.0
built from commit 90ccb37ac1f85ae811c50a29f9bb7e779c5045fb
release image quay.io/openshift-release-dev/ocp-release@sha256:c5337afd85b94c93ec513f21c8545e3f9e36a227f55d41bc1dfb8fcc3f2be129

Platform:

What happened?

Installation on vSphere fails for OpenShift 4.2, following the documentation on point.

vSphere (6.7.0 Build 14368073)
VMware ESXi, 6.7.0, 13006603

For some reason the ovfEnv variables for ignition are not picked up. I have booted a RHEL 8, and I could succesfully get the vApp variables using the vmtoolsd command.

I have tried numerous times reimporting the CoreOS (at this time 4.2) as a template, clone it exactly as mentioned in the instructions. Only thing I see is that CoreOS gets correct IP+DNS from my DHCP server, but then its just stuck at the login screen (without my ssh key provisioned into it).

(tried setting the kernel argument "core.first_boot=detected", but it doesnt make ignition trigger the installation).

In the mean time, I have booted and installed a complete OC 4.2 cluster using the bare metal instructions here (https://blog.openshift.com/deploying-a-user-provisioned-infrastructure-environment-for-openshift-4-1-on-vsphere/), together with the latest 4.2 documentation for OC.

What you expected to happen?

Installation on vSphere should work, where CoreOS picks up the ovf environment.

How to reproduce it (as minimally and precisely as possible)?

Follow the current Openshift 4.2 vSphere documentation, import the ova and clone it to bootstrap-0.

Insert vApp variables as described:

boot the cloned vm - it will stall, and just boot to login screen where ssh-keys are not deployed and installation doesn't start.

rayabueg · 2019-11-01T20:22:26Z

I may have the same issue: DHCP address assigned then no more boot progress

dav1x · 2019-11-08T16:05:32Z

I'm going to try and recreate this. I'll let you know what I find out.

crissonpl · 2019-11-12T12:06:39Z

I"m hitting the same issue. Any updates @dav1x ?

dav1x · 2019-11-12T21:53:19Z

I just attempted to recreate this with rhcos-4.2 and VMware ESXi, 6.7.0, 10764712 with VCSA 10244857 and I was not able to recreate the issue. I imported the OVA and left it on vcenter as a VM.

[core@bootstrap-0 ~]$ rpm-ostree status
...omitted...
        Version: 42.80.20191002.0

I followed these steps exactly:
https://blog.openshift.com/deploying-a-user-provisioned-infrastructure-environment-for-openshift-4-1-on-vsphere/

[root@rhel-d ocp-42]# ./openshift-install wait-for bootstrap-complete
INFO Waiting up to 30m0s for the Kubernetes API at https://api.example.com:6443... 
INFO API v1.14.6+868bc38 up                       
INFO Waiting up to 30m0s for bootstrapping to complete... 
INFO It is now safe to remove the bootstrap resources

Does anyone want to share their ignition config files or install-config.yaml here or via a DM?

rayabueg · 2019-11-21T19:00:58Z

@dav1x thanks for attempting to repro...I will attempt to do so again and provide you details

crissonpl · 2019-11-27T16:08:58Z

I think this issue is the same #2552 (comment)
@dav1x ?

I will try the install one more time and update the status.

bortek · 2019-11-28T07:52:58Z

That would be great since I am stuck on this.

If I manually added base64 ignition data into guestinfo.ignition.config.data variable in advanced properties then nodes do boot up and fetch data from bootstrap URL. But then whats the point of terraform automation. Even after that manual step is performed the static IPs provided in the config are not being set in the /etc/syscong/network-script ens192 interface file. Something else is broken there too.

bortek · 2019-11-28T12:01:19Z

Have managed to get passed this and set base64 variables on vms using

extra_config {
"guestinfo.ignition.config.data" = "${base64encode(data.ignition_config.ign.*.rendered[count.index])}"
"guestinfo.ignition.config.data.encoding" = "base64"
}

which was suggested at hashicorp/terraform-provider-vsphere#243

"vapp properties" might not be working dues to some missing license vCenter/vSphere as suggested in the same link. Perhaps someone can update the code to use extra_config instead.

Still need to make statit IPs to work...

crissonpl · 2019-11-28T14:08:44Z

maybe a stupid question - does terraform require still dhcp for the initial boot phase?

bortek · 2019-11-28T14:18:02Z

I am still trying to figure out how dhcp/static_ip is though to work therefore I opened #2733 .

With DHCP disabled the bootstrap node cannot get the IP address and therefore it cannot properly boot. But to get an IP via DHCP the DHCP has to be pre-provisioned using MAC address, thus it is a manual step. Static IP address provisioned should be working since there is a config for it in TF config file. But it does not seem to work.

joaquinfll · 2019-12-02T05:39:17Z

@bortek There is no need to have the MAC address preprovisioned. You can assign an IP in from a range and after the initial ignition download it will reboot the node and use the static IP.

mostafahussein · 2020-01-08T20:48:02Z

@bortek , have you solved this issue ? I am in the same boat but I have not solved it yet. Can you tell me what have you done in order to fix it ?

bortek · 2020-01-08T21:07:46Z

Nop. Right now I am using half manual process for IP/MAC provisioning. I'm hopeful that soon I have time to look into automating it too.

joaquinfll · 2020-01-09T03:09:07Z

For the static IP install, I'm using a DHCP server with an IP range in a specific subnet. That is sufficient for DHCP requirements from OCP install process. On the first RHCOS boot the server will catch a temporary IP, download the ignition file and reboot with the fix IP.

Ex. config for the DHCP server:
/etc/dhcp/dhcpd.conf
`option domain-name "example.cluster.local";
option domain-name-servers 10.1.1.1, 10.1.1.2;

	default-lease-time 600;
	max-lease-time 7200;

	log-facility local7;

	subnet 10.1.1.0 netmask 255.255.255.0 {
		range 10.1.1.210 10.1.1.220;
		option routers 10.1.1.254;
	}`

rayabueg · 2020-03-18T20:20:55Z

@nodanero Thanks for this. I'm revisiting this issue again. Would you mind sharing the rhcos and vsphere versions used?

joaquinfll · 2020-03-19T14:04:45Z

I have tested this static IP procedure with the DHCP server with most of the versions from 4.1.x to 4.3.5 and the procedure works for me.

For RHCOS I'm currently using the template rhcos-4.3.0-x86_64-vmware.ova but I can't tell you at the moment which version is (43?).

For vSphere I've only used version 6.5

rayabueg · 2020-03-19T16:32:44Z

Thank you! Interesting that the only obvious difference is the vsphere version. I’m trying on 6.7. Get Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ From: Joaquin <[email protected]> Sent: Thursday, March 19, 2020 8:05:00 AM To: openshift/installer <[email protected]> Cc: Ray Abueg <[email protected]>; Comment <[email protected]> Subject: Re: [openshift/installer] Openshift 4.2 installation on vSphere/ESXI 6.7 (#2537) [EXTERNAL SENDER] I have tested this static IP procedure with the DHCP server with most of the versions from 4.1.x to 4.3.5 and the procedure works for me. For RHCOS I'm currently using the template rhcos-4.3.0-x86_64-vmware.ova but I can't tell you at the moment which version is (43?). For vSphere I've only used version 6.5 — You are receiving this because you commented. Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenshift%2Finstaller%2Fissues%2F2537%23issuecomment-601197362&data=02%7C01%7Craymond.abueg%40alaskaair.com%7C5799ccc5963149a99f7708d7cc0e8492%7C0f44c5d442b045c2bf55d0fea8430d33%7C1%7C0%7C637202235047896823&sdata=UBFzcMEkIhzuaiY7tG6YAstb8kQdFAxzwKANXHxVUOw%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACVZSSCXHTADXIVL2RBY4EDRIIRAZANCNFSM4JCVXXOQ&data=02%7C01%7Craymond.abueg%40alaskaair.com%7C5799ccc5963149a99f7708d7cc0e8492%7C0f44c5d442b045c2bf55d0fea8430d33%7C1%7C0%7C637202235047906817&sdata=RgVUnpTNaCTrmUMh%2BSqZ3X0EXU%2BG6OmYrK39n50RwTA%3D&reserved=0>.

rayabueg · 2020-04-01T03:27:42Z

@nodanero, would you mind sharing your ign files? I'd like to attempt to repro manually by feeding your working ign files into the vapp properties then booting. We're still stuck once machine boots, then gets a DHCP address, but makes no progress afterward

joaquinfll · 2020-04-01T05:53:16Z

@rayabueg Sorry, I can't share the working files but I can give you an example source file to ingest by openshift-install binary.

I would focus on bootstrap server. Aside from the variable in vapps it needs to boot with the ignition pulled from a web server.

I'm in the openshift channel in freenode.
https://webchat.freenode.net/#openshift

Example install-config.yaml (be careful with the quote marks):

apiVersion: v1
baseDomain: example.com
metadata:
name: dev1
networking:
machineCIDR: "10.10.10.0/24"
platform:
vsphere:
vCenter: vcenter.example.com
username: "vcenter-user"
password: "vcenter-password"
datacenter: datacenter
defaultDatastore: datastore
pullsecret: 'pull-secret-content'
sshKey: 'ssh-rsa AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'

rayabueg · 2020-04-01T15:32:59Z

Thanks for the detail @nodanero and also for the freenode webchat link. Hope to find you there!

Your install config is no different what we've customized for our environment so we're still scratching our heads on why the coreos VM is behaving differently than yours. I have a few more questions if you don't mind:

Are you using the terraform process as generally prescribed in this project? We've customized it for our IPAM but follow the process for the most part.

Regarding the coreos boot process:

Is the bootstrap node getting a DHCP address first, pulling down its ign from an http server, then setting static ip and rebooting?
Are the master and worker nodes using DHCP first, or simply booting with the provided ign data then assigning a static ip and rebooting?
As a coreos newb, I've got to ask...why even use DHCP if we can assign a static IP via ign in the first place?

Feels to me like we're running into an environment issue perhaps with dhcp, maybe even vsphere itself, but we're hoping it's simply us not configuring coreos properly. Thanks for any feedback on your particular boot process that is allowing static IPs. According to RH, to do static IPs in vsphere we need to apply them as a kernel IP (manually interrupting the boot process to input the IP) or through a boot ISO (Edit: been informed this can be automated!) which we obviously won't be doing since the goal is to automate the OCP cluster build via terraform.

openshift-bot · 2020-07-01T00:07:14Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

dwagelaar · 2020-07-17T10:16:43Z

I've noticed that Fedora CoreOS (okd 4.5) and Flatcar Container Linux (manual install) both fail to pick up any VMWare guestinfo data from our vSphere 6.7. It still worked on our previous vSphere 4.5 cluster.

dwagelaar · 2020-07-17T10:18:21Z

/remove-lifecycle stale

frenchtoasters · 2020-08-28T19:47:33Z

I can also confirm that on vSphere 6.7 we are no longer able to have Fedora CoreOS or Flatcar Container Linux pick up the guestinfo data.

vSphere: 6.7.0
Build: 15679289

openshift-bot · 2020-11-26T21:15:57Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2020-12-26T23:07:45Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2021-01-26T00:59:04Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot · 2021-01-26T00:59:20Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 1, 2020

openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 17, 2020

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 26, 2020

openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 26, 2020

openshift-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 26, 2020

openshift-ci-robot closed this as completed Jan 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Openshift 4.2 installation on vSphere/ESXI 6.7 #2537

Openshift 4.2 installation on vSphere/ESXI 6.7 #2537

stefanskotte commented Oct 20, 2019 •

edited

Loading

rayabueg commented Nov 1, 2019

dav1x commented Nov 8, 2019

crissonpl commented Nov 12, 2019

dav1x commented Nov 12, 2019 •

edited

Loading

rayabueg commented Nov 21, 2019 •

edited

Loading

crissonpl commented Nov 27, 2019

bortek commented Nov 28, 2019

bortek commented Nov 28, 2019

crissonpl commented Nov 28, 2019

bortek commented Nov 28, 2019

joaquinfll commented Dec 2, 2019

mostafahussein commented Jan 8, 2020 •

edited

Loading

bortek commented Jan 8, 2020

joaquinfll commented Jan 9, 2020 •

edited

Loading

rayabueg commented Mar 18, 2020

joaquinfll commented Mar 19, 2020

rayabueg commented Mar 19, 2020 via email

rayabueg commented Apr 1, 2020

joaquinfll commented Apr 1, 2020

rayabueg commented Apr 1, 2020 •

edited

Loading

openshift-bot commented Jul 1, 2020

dwagelaar commented Jul 17, 2020

dwagelaar commented Jul 17, 2020

frenchtoasters commented Aug 28, 2020

openshift-bot commented Nov 26, 2020

openshift-bot commented Dec 26, 2020

openshift-bot commented Jan 26, 2021

openshift-ci-robot commented Jan 26, 2021

Openshift 4.2 installation on vSphere/ESXI 6.7 #2537

Openshift 4.2 installation on vSphere/ESXI 6.7 #2537

Comments

stefanskotte commented Oct 20, 2019 • edited Loading

Version

Platform:

What happened?

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

rayabueg commented Nov 1, 2019

dav1x commented Nov 8, 2019

crissonpl commented Nov 12, 2019

dav1x commented Nov 12, 2019 • edited Loading

rayabueg commented Nov 21, 2019 • edited Loading

crissonpl commented Nov 27, 2019

bortek commented Nov 28, 2019

bortek commented Nov 28, 2019

crissonpl commented Nov 28, 2019

bortek commented Nov 28, 2019

joaquinfll commented Dec 2, 2019

mostafahussein commented Jan 8, 2020 • edited Loading

bortek commented Jan 8, 2020

joaquinfll commented Jan 9, 2020 • edited Loading

rayabueg commented Mar 18, 2020

joaquinfll commented Mar 19, 2020

rayabueg commented Mar 19, 2020 via email

rayabueg commented Apr 1, 2020

joaquinfll commented Apr 1, 2020

rayabueg commented Apr 1, 2020 • edited Loading

openshift-bot commented Jul 1, 2020

dwagelaar commented Jul 17, 2020

dwagelaar commented Jul 17, 2020

frenchtoasters commented Aug 28, 2020

openshift-bot commented Nov 26, 2020

openshift-bot commented Dec 26, 2020

openshift-bot commented Jan 26, 2021

openshift-ci-robot commented Jan 26, 2021

stefanskotte commented Oct 20, 2019 •

edited

Loading

dav1x commented Nov 12, 2019 •

edited

Loading

rayabueg commented Nov 21, 2019 •

edited

Loading

mostafahussein commented Jan 8, 2020 •

edited

Loading

joaquinfll commented Jan 9, 2020 •

edited

Loading

rayabueg commented Apr 1, 2020 •

edited

Loading