Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AWS SageMaker] Add CodeBuild Steps #3668

Merged
merged 23 commits into from
May 4, 2020
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions components/aws/sagemaker/codebuild/deploy.buildspec.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
version: 0.2
phases:
pre_build:
commands:
# Log in to Dockerhub
- mkdir -p ~/.docker
- echo $DOCKER_CONFIG > ~/.docker/config.json

build:
commands:
- cd components/aws/sagemaker
- ./codebuild/scripts/deploy.sh -d "${DRY_RUN}"
18 changes: 18 additions & 0 deletions components/aws/sagemaker/codebuild/integration-test.buildspec.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
version: 0.2
phases:
build:
commands:
- cd components/aws
- docker build . -f ./sagemaker/tests/integration_tests/Dockerfile -t amazon/integration-test-image --quiet
RedbackThomson marked this conversation as resolved.
Show resolved Hide resolved

# Run the container and copy the results to /tmp
# Passes all host environment variables through to the container
- docker run --name integration-test-container $(env | cut -f1 -d= | sed 's/^/-e /') amazon/integration-test-image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design question:

  1. What is the need to run container in container here ?
  • We can have a codebuild step which builds all the images needed for the pipeline(release/unit/integ-test) first so that each step can use those as base image of codebuild job directly
    Especially with integration test and deploy - the test container needs IAM permissions, and maybe things where it need access to resources outside the codebuild job so we can use avoid passing them explicitly here.
    Also we can maintain only one image per job rather than a base image + step specific image
  1. Placement of these codebuild spec files
  • Why do we need a separate codebuild directory, can we add the specs into their respective directories? create a release directory if needed

Copy link
Contributor Author

@RedbackThomson RedbackThomson May 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* We can have a codebuild step which builds all the images needed for the pipeline(release/unit/integ-test) first so that each step can use those as base image of codebuild job directly
  Especially with integration test and deploy - the test container needs IAM permissions, and maybe things where it need access to resources outside the codebuild job so we can use avoid passing them explicitly here.
  Also we can maintain only one image per job rather than a base image + step specific image

I worry that this would add additional complexity to the build process that may be prone to failure. We now need another CodeBuild project and ECR repo to build and then host the test images, and our integration test project would need to point to a specific tag in that repo. Developers would now need to know that they had to run the build step before the integration test step for every run - rather than simply running the one step. I don't think this would save time either, since the time to spin up the codebuild run, build the container, upload to ECR and then finally run the integration test codebuild project would surely be more time than building it and running it in one.

My ideal world would be this:

  • The developer of a new integration test can edit a .env file and then build and run the integration test Dockerfile. Everything would be created automatically (except data maybe) and all of the tools needed for testing would be included in that Docker container.
  • All the pipeline needs to do is build and run the same container. The scripts would be written in such a way that we could programatically override any values (role names, bucket names, etc.) to work with either system.

The current implementation of CodeBuild uses the standard:3.0 base image, which has Docker and Python in it if we need those. I have already included in this CodeBuild spec the ability for the outer CodeBuild container to pass all of its environment variables through to the inner (integration test) container. This allows us to simply pass any values in from our CDK infra to the CodeBuild container and the integration test container would be able to access them - this includes role permissions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* Why do we need a separate codebuild directory, can we add the specs into their respective directories? create a release directory if needed

It is fairly common for projects to have a "codebuild" directory, but I've seen both solutions. I think because each of these codebuild steps does roughly the same thing (build and run a container), they should stay fairly close together so we remember to update them. They also don't really have anything to do with the actual unit or integration tests themselves - I wouldn't expect anyone else to care about them except us.

- docker cp integration-test-container:/app/integration_tests/integration_tests.log /tmp/results.xml
- docker rm -f integration-test-container

reports:
IntegrationTestReport:
files:
- "results.xml"
base-directory: "/tmp"
77 changes: 77 additions & 0 deletions components/aws/sagemaker/codebuild/scripts/deploy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
#!/usr/bin/env bash

set -e

REMOTE_REPOSITORY="amazon/aws-sagemaker-kfp-components"
DRYRUN="true"
FULL_VERSION_TAG=""

while getopts ":d:v:" opt; do
case ${opt} in
d)
if [[ "${OPTARG}" = "false" ]]; then
DRYRUN="false"
else
DRYRUN="true"
fi
;;
v)
FULL_VERSION_TAG="${OPTARG}"
;;
esac
done

function docker_tag_exists() {
curl --silent -f -lSL https://index.docker.io/v1/repositories/$1/tags/$2 > /dev/null 2> /dev/null
}

if [[ ! -z "${FULL_VERSION_TAG}" && ! "${FULL_VERSION_TAG}" =~ ^[0-9]+\.[0-9]+\.[0-9]+ ]]; then
>&2 echo "Version tag does not match SEMVER style (X.Y.Z)"
exit 1
fi

# Check version does not already exist
VERSION_LICENSE_FILE="THIRD-PARTY-LICENSES.txt"
if [[ -z "${FULL_VERSION_TAG}" ]]; then
FULL_VERSION_TAG="$(cat ${VERSION_LICENSE_FILE} | head -n1 | grep -Po '(?<=version )\d.\d.\d')"
fi

if [ -z "$FULL_VERSION_TAG" ]; then
>&2 echo "Could not find version inside ${VERSION_LICENSE_FILE} file."
exit 1
fi

echo "Deploying version ${FULL_VERSION_TAG}"

if docker_tag_exists "$REMOTE_REPOSITORY" "$FULL_VERSION_TAG"; then
>&2 echo "Tag ${REMOTE_REPOSITORY}:${FULL_VERSION_TAG} already exists. Cannot overwrite an existing image."
exit 1
fi

# Build the image
FULL_VERSION_IMAGE="${REMOTE_REPOSITORY}:${FULL_VERSION_TAG}"
docker build . -f Dockerfile -t "${FULL_VERSION_IMAGE}"

# Get the minor and major versions
[[ $FULL_VERSION_TAG =~ ^[0-9]+\.[0-9]+ ]] && MINOR_VERSION_IMAGE="${REMOTE_REPOSITORY}:${BASH_REMATCH[0]}"
[[ $FULL_VERSION_TAG =~ ^[0-9]+ ]] && MAJOR_VERSION_IMAGE="${REMOTE_REPOSITORY}:${BASH_REMATCH[0]}"
RedbackThomson marked this conversation as resolved.
Show resolved Hide resolved

# Re-tag the image with major and minor versions
docker tag "${FULL_VERSION_IMAGE}" "${MINOR_VERSION_IMAGE}"
echo "Tagged image with ${MINOR_VERSION_IMAGE}"
docker tag "${FULL_VERSION_IMAGE}" "${MAJOR_VERSION_IMAGE}"
echo "Tagged image with ${MAJOR_VERSION_IMAGE}"

# Push to the remote repository
if [ "${DRYRUN}" == "false" ]; then
docker push "${FULL_VERSION_IMAGE}"
echo "Successfully pushed tag ${FULL_VERSION_IMAGE} to Docker Hub"

docker push "${MINOR_VERSION_IMAGE}"
echo "Successfully pushed tag ${MINOR_VERSION_IMAGE} to Docker Hub"

docker push "${MAJOR_VERSION_IMAGE}"
echo "Successfully pushed tag ${MAJOR_VERSION_IMAGE} to Docker Hub"
else
echo "Dry run detected. Not pushing images."
fi
18 changes: 18 additions & 0 deletions components/aws/sagemaker/codebuild/unit-test.buildspec.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
version: 0.2
phases:
build:
commands:
- cd components/aws
- docker build . -f ./sagemaker/tests/unit_tests/Dockerfile -t amazon/unit-test-image --quiet

# Run the container and copy the results to /tmp
# Passes all host environment variables through to the container
- docker run --name unit-test-container $(env | cut -f1 -d= | sed 's/^/-e /') amazon/unit-test-image
- docker cp unit-test-container:/app/unit_tests/unit_tests.log /tmp/results.xml
- docker rm -f unit-test-container

reports:
UnitTestReport:
files:
- "results.xml"
base-directory: "/tmp"