Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image Builder Improvements #500

Merged
merged 52 commits into from
Dec 11, 2023
Merged

Image Builder Improvements #500

merged 52 commits into from
Dec 11, 2023

Conversation

ariefrahmansyah
Copy link
Contributor

@ariefrahmansyah ariefrahmansyah commented Nov 30, 2023

  • refactor pyfunc base image

What this PR does / why we need it:

Image building is a mandatory process in the deployment of PyFunc-based Merlin components. The process abstracts the complexity of building and publishing a ready-to-be-used Docker image that contains the user's model source code, artifacts, and dependencies. However, there are some issues with the current image-building approach that lead to poor development experience and inefficient resource utilization. This PR addresses them by:

  1. Refactoring Kaniko job specification generated by Merlin so that:
    a. it is more configurable -- we can set Kaniko's build arguments flags in the config and they will be propagated to the kaniko job spec
    b. it supports caching
  2. Merlin API now generates a separate user's model dependencies files and upload it to GCS. This dependencies then will be installed separately during the image building (cause we use new Dockerfile)
    a. To achieve it, we introduce a new gsutil package to interface with GCS
  3. Refactor base and application Dockerfiles used to build Docker images for pyfunc-server and batch prediction jobs
    a. Previously, the base Dockerfile had a step to install the pyfunc-server or batch prediction jobs dependencies. This step is now executed by the application Dockerfile
    i. Subsequently, this allows us to have only one base image instead of building base images for each Python version.
    b. New base Dockerfile will only install the platform components such as conda, spark, grpc library, but not the component itself
    c. New application Dockerfile will first download and install user's model dependencies; install component's dependencies; download model source code and artifacts; and finally dry-run the model

Which issue(s) this PR fixes:

Improves image building performance

Does this PR introduce a user-facing change?:

NONE

Checklist

  • Added unit test, integration, and/or e2e tests
  • Tested locally
  • Updated documentation
  • Update Swagger spec if the PR introduce API changes
  • Regenerated Golang and Python client if the PR introduce API changes

@ghost
Copy link

ghost commented Nov 30, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

.github/workflows/merlin.yml Outdated Show resolved Hide resolved
.github/workflows/merlin.yml Outdated Show resolved Hide resolved
api/pkg/gsutil/gsutil.go Outdated Show resolved Hide resolved
@ariefrahmansyah
Copy link
Contributor Author

-> Why is the python/sdk/test/pyfunc/model_2.joblib file removed in this PR? Isn't it needed by the integration tests?

When running pyfunc_integration_test.py from local machine, it will re-train the model hence updating the model. I deleted it but forgot to restore the old one. I've restored it know.

api/config/config.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@krithika369 krithika369 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ariefrahmansyah the PR largely LGTM (need to remember to revert some of the TODO in the GitHub workflows before merging). However, I left some new comments; especially around the reuse of MLP's artifact.Service instead of gsutil which I believe would be a lot simpler. Will re-review this part once it's addressed. Thanks!

Arief Rahmansyah added 3 commits December 8, 2023 14:16
2. Add SupportedPythonVersions field
3. Refactor BuildContextSubPath to be part of BaseImageConfig
Copy link
Collaborator

@krithika369 krithika369 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of small comments. The rest LGTM. Thanks for this PR and the CI steps refactoring, @ariefrahmansyah !

api/config/config.go Show resolved Hide resolved
api/pkg/imagebuilder/imagebuilder.go Outdated Show resolved Hide resolved
@ariefrahmansyah ariefrahmansyah merged commit a135bc2 into main Dec 11, 2023
28 checks passed
@ariefrahmansyah ariefrahmansyah deleted the image-builder-improvements branch December 11, 2023 02:52
ariefrahmansyah added a commit that referenced this pull request Dec 15, 2023
<!--  Thanks for sending a pull request!  Here are some tips for you:

1. Run unit tests and ensure that they are passing
2. If your change introduces any API changes, make sure to update the
e2e tests
3. Make sure documentation is updated for your PR!

-->

**What this PR does / why we need it**:
<!-- Explain here the context and why you're making the change. What is
the problem you're trying to solve. --->

For existing models that were previously successfully built using Python
3.7.* won't be able to be deployed anymore since the new
validatePythonVersion() introduced in #500 will invalidate it.

**Which issue(s) this PR fixes**:
<!--
*Automatically closes linked issue when PR is merged.
Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`.
-->

Fixes failed redeployment or re-execution of existing models with Python
3.7.*

**Does this PR introduce a user-facing change?**:
<!--
If no, just write "NONE" in the release-note block below.
If yes, a release note is required. Enter your extended release note in
the block below.
If the PR requires additional action from users switching to the new
release, include the string "action required".

For more information about release notes, see kubernetes' guide here:
http://git.k8s.io/community/contributors/guide/release-notes.md
-->

```release-note
NONE
```

**Checklist**

- [ ] Added unit test, integration, and/or e2e tests
- [x] Tested locally
- [ ] Updated documentation
- [ ] Update Swagger spec if the PR introduce API changes
- [ ] Regenerated Golang and Python client if the PR introduce API
changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants