Create a CI workflow that creates new AMIs using packer #258

gaiksaya · 2023-03-13T22:10:44Z

Is your feature request related to a problem? Please describe

Currently the AMI's used by agent nodes using a specific base image that may go out of date or need updates as new kernel updates come in.
This happens as often as per quarter. Even though we run yum update, apt updates, etc we still need to reboot the EC2 to apply those updates which does not fit jenkins' agent nodes' lifecycle management. If a SSH connection is lost (when we reboot) a new agent will be brought up.

Describe the solution you'd like

In order to apply regular updates to the base AMI image we need to build a new AMI.
Using packer it is a pretty straight forward process. https://github.com/opensearch-project/opensearch-ci/tree/main/packer

Below are 2 possible approaches:

Use GHA that will create new AMI's and create a pull request to update the same in this repository
Use jenkins workflow that will do the same.

Please keep in mind that this needs to be a blue green deployment and that's why old AMI's need to be deprecated (made private) only after confirming new AMI's are working fine. This can be a manual process to start with but can be automated via GHA too if we maintain a list somewhere.

Describe alternatives you've considered

Do the entire process manually. However building AMI even using packer takes more than half a day for the number of AMI's we have.

Additional context

No response

peterzhuamazon · 2023-03-17T19:06:04Z

Either way would work and I prefer using Jenkins.
The only part we need is role assume.

peterzhuamazon · 2023-03-21T18:07:32Z

Need to update one docker ci image to include packer.

We can use the docker-builder image for packer:
https://developer.hashicorp.com/packer/downloads

Need to create a Jenkins workflow to build packer templates in here: https://github.com/opensearch-project/opensearch-ci/tree/main/packer

peterzhuamazon · 2023-03-21T18:15:48Z

More to consider:

IAM role to assume in order to have full EC2 access or corresponding access just for EC2 instance creating and AMI creation.
SG have access from the SG of the main node of Jenkins to allow access to 22 / 5985 for EC2 instance connection during build.

peterzhuamazon · 2023-03-21T18:30:27Z

Need 3 secrets to hold the value of these:

VPC of Jenkins production cluster VPC
Subnet of Jenkins production public subnet
SG of the above mentioned SG preferably taking the Agent Node SG as it has all the requirements

peterzhuamazon · 2023-03-21T18:38:27Z

There might be another problem since the node that runs packer is the source, and needs to connect to destination on 22/5985 ports. This means if the workflow is running on an agent node, then the connection would be agent -> agent where our existing SG only allows connection from main -> agent.

Either we add a new SG to allow agent -> agent connection (which is highly not recommended for security measures), or we restrict the AMI/Packer builder workflow run on only the main node (main -> agent).

peterzhuamazon · 2023-03-21T18:40:36Z

Add @gaiksaya @rishabh6788 @prudhvigodithi into the conversation on above issues ^^.

Thanks.

gaiksaya · 2023-03-21T18:43:03Z

Why are we using jenkins? GHA can do all of these using roles. All you need to provide is right vpc and subnet right?
Anything that we use just needs to have right credentials that will build the AMI and push to Prod right?

peterzhuamazon · 2023-03-21T18:45:12Z

Why are we using jenkins? GHA can do all of these using roles. All you need to provide is right vpc and subnet right? Anything that we use just needs to have right credentials that will build the AMI and push to Prod right?

In our discussion yesterday we were already talking about using it in Jenkins.
If we are ok to use on GHA I have no issues but @prudhvigodithi raised the point where GHA can throttle if the run is too long.

Average mac build time is 2+ hrs and average windows build time is 1+ hour, cause inconsistency in the build overall.

Thanks.

prudhvigodithi · 2023-03-21T18:47:34Z

@gaiksaya AMI building is an expensive task, it requires some resources and mainly could take lot of time to complete the end to end AMI building, for this GH runners would end up same issues like we had for manifest workflow failure, so better to use jenkins job.

gaiksaya · 2023-03-21T18:54:01Z

Got it! Forgot about the resources section. But even though with that all a machine needs is right credentials which has nothing to do with agent or main node. If agent node or AMI build is provided with right credentials we should be good. Running anything on main node is restricted as a security measure so we cannot and should not run on main node.
Regarding

SG have access from the SG of the main node of Jenkins to allow access to 22 / 5985 for EC2 instance connection during build.

We already have that in place.
https://github.com/opensearch-project/opensearch-ci/blob/main/lib/security/ci-security-groups.ts#L45-L47

peterzhuamazon · 2023-03-21T18:56:31Z

Got it! Forgot about the resources section. But even though with that all a machine needs is right credentials which has nothing to do with agent or main node. If agent node or AMI build is provided with right credentials we should be good. Running anything on main node is restricted as a security measure so we cannot and should not run on main node. Regarding

SG have access from the SG of the main node of Jenkins to allow access to 22 / 5985 for EC2 instance connection during build.

We already have that in place. https://github.com/opensearch-project/opensearch-ci/blob/main/lib/security/ci-security-groups.ts#L45-L47

See #258 (comment)

zelinh · 2023-03-21T23:36:32Z

New docker image ubuntu2004-x64-docker-buildx0.6.3-qemu5.0-awscli1.22-jdk11-v2 including packer has been built and pushed to here.

peterzhuamazon · 2023-03-30T16:44:39Z

We will use the same agentnode sg after making sure jenkins agent are all running on private subnet.

peterzhuamazon · 2023-04-13T17:29:53Z

This is completed.

gaiksaya added enhancement New feature or request untriaged Issues that have not yet been triaged good first issue Good for newcomers and removed untriaged Issues that have not yet been triaged labels Mar 13, 2023

peterzhuamazon assigned peterzhuamazon and zelinh Mar 21, 2023

peterzhuamazon added this to OpenSearch Engineering Effectiveness Mar 21, 2023

peterzhuamazon moved this to In Progress in OpenSearch Engineering Effectiveness Mar 21, 2023

This was referenced Mar 29, 2023

Make Jenkins use private subnet for all nodes #262

Merged

Set all packer templates to use private ip #263

Merged

peterzhuamazon mentioned this issue Apr 3, 2023

[Enhancement] Better handle of switching node version for OSD core opensearch-project/opensearch-build#3362

Closed

zelinh mentioned this issue Apr 4, 2023

Add new Jenkins workflow for build packer. opensearch-project/opensearch-build#3368

Merged

peterzhuamazon closed this as completed Apr 13, 2023

github-project-automation bot moved this from In Progress to Done in OpenSearch Engineering Effectiveness Apr 13, 2023

peterzhuamazon linked a pull request Jul 10, 2024 that will close this issue

Set all packer templates to use private ip #263

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a CI workflow that creates new AMIs using packer #258

Create a CI workflow that creates new AMIs using packer #258

gaiksaya commented Mar 13, 2023 •

edited

Loading

peterzhuamazon commented Mar 17, 2023

peterzhuamazon commented Mar 21, 2023 •

edited

Loading

peterzhuamazon commented Mar 21, 2023 •

edited

Loading

peterzhuamazon commented Mar 21, 2023

peterzhuamazon commented Mar 21, 2023

peterzhuamazon commented Mar 21, 2023

gaiksaya commented Mar 21, 2023 •

edited

Loading

peterzhuamazon commented Mar 21, 2023

prudhvigodithi commented Mar 21, 2023

gaiksaya commented Mar 21, 2023

peterzhuamazon commented Mar 21, 2023

zelinh commented Mar 21, 2023

peterzhuamazon commented Mar 30, 2023

peterzhuamazon commented Apr 13, 2023

Create a CI workflow that creates new AMIs using packer #258

Create a CI workflow that creates new AMIs using packer #258

Comments

gaiksaya commented Mar 13, 2023 • edited Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

peterzhuamazon commented Mar 17, 2023

peterzhuamazon commented Mar 21, 2023 • edited Loading

peterzhuamazon commented Mar 21, 2023 • edited Loading

peterzhuamazon commented Mar 21, 2023

peterzhuamazon commented Mar 21, 2023

peterzhuamazon commented Mar 21, 2023

gaiksaya commented Mar 21, 2023 • edited Loading

peterzhuamazon commented Mar 21, 2023

prudhvigodithi commented Mar 21, 2023

gaiksaya commented Mar 21, 2023

peterzhuamazon commented Mar 21, 2023

zelinh commented Mar 21, 2023

peterzhuamazon commented Mar 30, 2023

peterzhuamazon commented Apr 13, 2023

gaiksaya commented Mar 13, 2023 •

edited

Loading

peterzhuamazon commented Mar 21, 2023 •

edited

Loading

peterzhuamazon commented Mar 21, 2023 •

edited

Loading

gaiksaya commented Mar 21, 2023 •

edited

Loading