Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHCOS VMDK in AWS Private region #407

Closed
darkdatter opened this issue Jul 2, 2020 · 12 comments
Closed

RHCOS VMDK in AWS Private region #407

darkdatter opened this issue Jul 2, 2020 · 12 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@darkdatter
Copy link

darkdatter commented Jul 2, 2020

I am attempting to follow @jaredhocutt's work with modifying a RHCOS VMDK into a snapshot, which can then be loaded into AWS:

https://github.com/openshift/os/pull/396

Reference to the image's EFI issue that forces this workaround:
https://bugzilla.redhat.com/show_bug.cgi?id=1794157

I am using: rhcos-44.81.202004260825-0-aws.x86_64.vmdk

Here are my steps:

$ aws ec2 import-snapshot --disk-container --region=<region> Format=vmdk,UserBucket="{S3Bucket=<bucketname>,S3Key=rhcos-44.81.202004260825-0-aws.x86_64.vmdk}"
{
"ImportTaskId": "import-snap-04c1.........",
"SnapshotTaskDetail": {
"DiskImageSize": 0.0,
"Format": "VMDK",
"Progress": "3",
"Status": "active",
"StatusMessage": "pending",
"UserBucket": {
"S3Bucket": "<bucketname>",
"S3Key": "rhcos-44.81.202004260825-0-aws.x86_64.vmdk"
}
}
}

- Check the status of the load:

$ aws ec2 describe-import-snapshot-tasks --region <region>
{
"ImportSnapshotTasks": [
{
"ImportTaskId": "import-snap-04c1................",
"SnapshotTaskDetail": {
"DiskImageSize": 881044992.0,
"Format": "VMDK",
"SnapshotId": "snap-0728...........",
"Status": "completed",
"UserBucket": {
"S3Bucket": "<bucketname>",
"S3Key": "rhcos-44.81.202004260825-0-aws.x86_64.vmdk"
}
},
"Tags": []
}
]
}

- Once completed, use the Snapshotid from above to register the image:

$ aws ec2 register-image --architecture x86_64 --description "RHCOS 4.4 VMDK" --ena-support --name "RHCOS 4.4 VMDK" --virtualization-type hvm --root-device-name '/dev/sda1' --block-device-mappings --region <region> 'DeviceName=/dev/sda1,Ebs={DeleteOnTermination=true,SnapshotId=snap-0728.............}'
{
"ImageId": "ami-4530...."
}

However, I don't think this is working as I cannot SSH into machines created with this + ignition configs, and I can see that ELB health checks are failing for the API/etcd ports, and I can't ping the machines from my bastion. Any Ideas if this is something I am doing wrong? Has anyone else tested this workaround on OCP 4.4?

@jlebon
Copy link
Member

jlebon commented Jul 2, 2020

--root-device-name '/dev/sda1'

Hmm, that looks off. Check out the values we use here: https://github.com/coreos/coreos-assembler/blob/60bc058e37452b6406ad50aff4149a23945b9ade/mantle/platform/api/aws/images.go#L332-L355.

@jlebon
Copy link
Member

jlebon commented Jul 2, 2020

(Or better, you should be able to use ore aws upload too the same way we do.)

@jaredhocutt
Copy link

@darkdatter Here's a cleaned up set of instructions from what I posted in the original investigation in #396

https://github.com/jaredhocutt/openshift4-deploy/blob/master/docs/rhcos.md

@darkdatter
Copy link
Author

--root-device-name '/dev/sda1'

Hmm, that looks off. Check out the values we use here: https://github.com/coreos/coreos-assembler/blob/60bc058e37452b6406ad50aff4149a23945b9ade/mantle/platform/api/aws/images.go#L332-L355.

I was curious about that myself, but stuck with it since that's what @jaredhocutt's solution used/tested. Is it just '/dev/xvda' for the main device, and '/dev/xvdb' for ephemeral?

(Or better, you should be able to use ore aws upload too the same way we do.)

I am assuming ore would come with Mantle if I installed it? (sorry, not familiar with the CoreOS side)

@darkdatter Here's a cleaned up set of instructions from what I posted in the original investigation in #396

https://github.com/jaredhocutt/openshift4-deploy/blob/master/docs/rhcos.md

That's neat, thanks! Did you have no issues with --root-device-name '/dev/sda1' ? You don't need to specify '/dev/xvda' ?

@jaredhocutt
Copy link

@darkdatter

That's neat, thanks! Did you have no issues with --root-device-name '/dev/sda1' ? You don't need to specify '/dev/xvda' ?

I have not had any issues using /dev/sda1 but it's likely more appropriate to use /dev/xvda. I'll update my document to use /dev/xvda isntead.

Off the top of my head, I don't recall where I got /dev/sda1, but AWS usually does a good job of figuring out what you meant so I guess that's why it worked.

@darkdatter
Copy link
Author

I have not had any issues using /dev/sda1 but it's likely more appropriate to use /dev/xvda. I'll update my document to use /dev/xvda isntead.

Off the top of my head, I don't recall where I got /dev/sda1, but AWS usually does a good job of figuring out what you meant so I guess that's why it worked.

/dev/sda1 is what I've used in the past for bare metal RHCOS installs, and it works fine. I just figured the AWS VMDK may differ for preferred root device. Do you have any plans to test 4.4 with your method?

@jaredhocutt
Copy link

@darkdatter My apologies for not responding to your question about that earlier. I have done those same steps with 4.4 and it worked. I just haven't updated the examples in my document.

@darkdatter
Copy link
Author

@darkdatter My apologies for not responding to your question about that earlier. I have done those same steps with 4.4 and it worked. I just haven't updated the examples in my document.

Would it be possible if you (or anyone else) to validate based off the VMDK I am using?

https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/4.4/4.4.3/rhcos-4.4.3-x86_64-aws.x86_64.vmdk.gz

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 24, 2020
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 23, 2020
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants