-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image Builder Improvements #500
Conversation
- refactor pyfunc base image
…e sure this layer got cached
… env value is always defined hence the next run will not invalidated
-> Why is the python/sdk/test/pyfunc/model_2.joblib file removed in this PR? Isn't it needed by the integration tests? When running pyfunc_integration_test.py from local machine, it will re-train the model hence updating the model. I deleted it but forgot to restore the old one. I've restored it know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ariefrahmansyah the PR largely LGTM (need to remember to revert some of the TODO in the GitHub workflows before merging). However, I left some new comments; especially around the reuse of MLP's artifact.Service instead of gsutil
which I believe would be a lot simpler. Will re-review this part once it's addressed. Thanks!
2. Add SupportedPythonVersions field 3. Refactor BuildContextSubPath to be part of BaseImageConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of small comments. The rest LGTM. Thanks for this PR and the CI steps refactoring, @ariefrahmansyah !
<!-- Thanks for sending a pull request! Here are some tips for you: 1. Run unit tests and ensure that they are passing 2. If your change introduces any API changes, make sure to update the e2e tests 3. Make sure documentation is updated for your PR! --> **What this PR does / why we need it**: <!-- Explain here the context and why you're making the change. What is the problem you're trying to solve. ---> For existing models that were previously successfully built using Python 3.7.* won't be able to be deployed anymore since the new validatePythonVersion() introduced in #500 will invalidate it. **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`. --> Fixes failed redeployment or re-execution of existing models with Python 3.7.* **Does this PR introduce a user-facing change?**: <!-- If no, just write "NONE" in the release-note block below. If yes, a release note is required. Enter your extended release note in the block below. If the PR requires additional action from users switching to the new release, include the string "action required". For more information about release notes, see kubernetes' guide here: http://git.k8s.io/community/contributors/guide/release-notes.md --> ```release-note NONE ``` **Checklist** - [ ] Added unit test, integration, and/or e2e tests - [x] Tested locally - [ ] Updated documentation - [ ] Update Swagger spec if the PR introduce API changes - [ ] Regenerated Golang and Python client if the PR introduce API changes
What this PR does / why we need it:
Image building is a mandatory process in the deployment of PyFunc-based Merlin components. The process abstracts the complexity of building and publishing a ready-to-be-used Docker image that contains the user's model source code, artifacts, and dependencies. However, there are some issues with the current image-building approach that lead to poor development experience and inefficient resource utilization. This PR addresses them by:
a. it is more configurable -- we can set Kaniko's build arguments flags in the config and they will be propagated to the kaniko job spec
b. it supports caching
a. To achieve it, we introduce a new gsutil package to interface with GCS
a. Previously, the base Dockerfile had a step to install the pyfunc-server or batch prediction jobs dependencies. This step is now executed by the application Dockerfile
i. Subsequently, this allows us to have only one base image instead of building base images for each Python version.
b. New base Dockerfile will only install the platform components such as conda, spark, grpc library, but not the component itself
c. New application Dockerfile will first download and install user's model dependencies; install component's dependencies; download model source code and artifacts; and finally dry-run the model
Which issue(s) this PR fixes:
Improves image building performance
Does this PR introduce a user-facing change?:
Checklist