Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix cross-platform build image dependency package architecture i… #260

Merged
merged 3 commits into from
Apr 17, 2024

Conversation

wenhuwang
Copy link
Contributor

@wenhuwang wenhuwang commented Apr 12, 2024

Description

Fixed the issues of retina-controller startup failure caused by abnormal cross-platform build image dependency package architecture.

Related Issue

fixes #130

Checklist

  • I have read the contributing documentation.
  • I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
  • I have correctly attributed the author(s) of the code.
  • I have tested the changes locally.
  • I have followed the project's style guidelines.
  • I have updated the documentation, if necessary.
  • I have added tests, if applicable.

Testing Done

Built amd64 and arm64 image and deployed amd64 image in K8s cluster nodes
agent pod is up and running in the nodes

@wenhuwang wenhuwang requested a review from a team as a code owner April 12, 2024 06:24
@rbtr rbtr added type/fix Fixes something priority/1 P1 area/infra Test, Release, or CI Infrastructure labels Apr 12, 2024
timraymond
timraymond previously approved these changes Apr 12, 2024
Copy link
Member

@timraymond timraymond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff, TIL about $TARGETPLATFORM :)

@rbtr rbtr added this pull request to the merge queue Apr 12, 2024
@rbtr rbtr removed this pull request from the merge queue due to a manual request Apr 12, 2024
Copy link
Collaborator

@rbtr rbtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have to block this. It has significantly regressed our ARM image build.
Before:
image
After:
image
Build logs here.

@wenhuwang I think this is due to the issue I mentioned in this comment - Go builds on ARM64 are much much slower (10X!) than Go builds on AMD64 targeting ARM64 via GOARCH. Please adjust these Dockerfile changes so that the Go builds still happen in an AMD64 image and let's see if that resolves the platform issue while keeping the builds fast 🙂

@wenhuwang
Copy link
Contributor Author

@rbtr The build time is indeed an issue worthy of concern. I looked carefully at the build log and found that most of the time it was downloading dependency packages. I am not sure whether there will be cached data in the build process, and whether the ARM64 platform increases the time because there is no cached data in the first build.
So is it possible to trigger a rebuild and verify it? thank you

@rbtr
Copy link
Collaborator

rbtr commented Apr 13, 2024

@wenhuwang I re-queued it but no improvement

@rbtr
Copy link
Collaborator

rbtr commented Apr 14, 2024

I did a little more investigation here and I think we're not using the cache correctly in main. The setup-go Action sets GOCACHE to /home/runner/.cache/go-build, but the Dockerfile is hardcoded to /root/home/.cache/go-build. So, while the Action reports a cache hit, the containerized go build is not using the right cache directory and would not benefit from the cache.
The bad news is that I have fixed this in my fork here, but the ARM builds do not get any quicker 🙁

@wenhuwang
Copy link
Contributor Author

wenhuwang commented Apr 14, 2024

@rbtr thanks for your help. I will try to fix this issues by building an ARM64 image based on AMD64

@wenhuwang
Copy link
Contributor Author

wenhuwang commented Apr 15, 2024

I tested only set --platform=$TARGETPLATFORM in the agent image, the target image can run normally, but some of the library file it depends on are inconsistent with the target image architecture unless tools image --platform=$TARGETPLATFORMis set. I'm not sure if all of these dependency files are necessary, hoping someone can help explain.
And i haven't figured out how to deal with this problem yet. If you have any suggestions, please let me know. Thank you.

@rbtr
Copy link
Collaborator

rbtr commented Apr 16, 2024

Hey @wenhuwang this architecture issue is showing up in some other scenarios, so I think we will take your fix despite the performance impact on the pipeline. We can try to make that faster again later.
Would you reset this branch to your initial $TARGETPLATFORM change and we can go from there? Thanks!

@wenhuwang
Copy link
Contributor Author

@rbtr Reverted to initial $TARGETPLATFORM changes

@rbtr rbtr enabled auto-merge April 17, 2024 04:03
@rbtr rbtr added this pull request to the merge queue Apr 17, 2024
Merged via the queue into microsoft:main with commit e785347 Apr 17, 2024
21 checks passed
@wenhuwang wenhuwang deleted the fix-cross-compilation branch April 17, 2024 13:04
github-merge-queue bot pushed a commit that referenced this pull request Apr 25, 2024
# Description

This should address the arm/amd64 binary errors in the built image while
keeping the Go build stage fast by using the
BUILDPLATFORM/TARGETPLATFORM and cross-compiling.

It also moves the tools stage from bullseye to bookworm: bookworm has
clang-14 available in the package manager and directly installable, so
all of the manual downloads are removed, and the explicit installations
are cut down significantly.

Removes some unnecessary Docker cruft which may have been
well-intentioned but isn't useful.

## Related Issue

This builds on #260 which fixed #130.

## Checklist

- [x] I have read the [contributing
documentation](https://retina.sh/docs/contributing).
- [x] I signed and signed-off the commits (`git commit -S -s ...`). See
[this
documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
on signing commits.
- [x] I have correctly attributed the author(s) of the code.
- [x] I have tested the changes locally.
- [x] I have followed the project's style guidelines.
- [x] I have updated the documentation, if necessary.
- [x] I have added tests, if applicable.

## Screenshots (if applicable) or Testing Completed

Please add any relevant screenshots or GIFs to showcase the changes
made.

## Additional Notes

Add any additional notes or context about the pull request here.

---

Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more
information on how to contribute to this project.

---------

Signed-off-by: Evan Baker <[email protected]>
nddq pushed a commit that referenced this pull request May 6, 2024
#260)

# Description
Fixed the issues of retina-controller startup failure caused by abnormal
cross-platform build image dependency package architecture.


## Related Issue

fixes #130 

## Checklist

- [x] I have read the [contributing
documentation](https://retina.sh/docs/contributing).
- [x] I signed and signed-off the commits (`git commit -S -s ...`). See
[this
documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
on signing commits.
- [x] I have correctly attributed the author(s) of the code.
- [x] I have tested the changes locally.
- [ ] I have followed the project's style guidelines.
- [ ] I have updated the documentation, if necessary.
- [ ] I have added tests, if applicable.

# Testing Done
Built amd64 and arm64 image and deployed amd64 image in K8s cluster
nodes
agent pod is up and running in the nodes

---------

Signed-off-by: wenhuwang <[email protected]>
nddq pushed a commit that referenced this pull request May 6, 2024
# Description

This should address the arm/amd64 binary errors in the built image while
keeping the Go build stage fast by using the
BUILDPLATFORM/TARGETPLATFORM and cross-compiling.

It also moves the tools stage from bullseye to bookworm: bookworm has
clang-14 available in the package manager and directly installable, so
all of the manual downloads are removed, and the explicit installations
are cut down significantly.

Removes some unnecessary Docker cruft which may have been
well-intentioned but isn't useful.

## Related Issue

This builds on #260 which fixed #130.

## Checklist

- [x] I have read the [contributing
documentation](https://retina.sh/docs/contributing).
- [x] I signed and signed-off the commits (`git commit -S -s ...`). See
[this
documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
on signing commits.
- [x] I have correctly attributed the author(s) of the code.
- [x] I have tested the changes locally.
- [x] I have followed the project's style guidelines.
- [x] I have updated the documentation, if necessary.
- [x] I have added tests, if applicable.

## Screenshots (if applicable) or Testing Completed

Please add any relevant screenshots or GIFs to showcase the changes
made.

## Additional Notes

Add any additional notes or context about the pull request here.

---

Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more
information on how to contribute to this project.

---------

Signed-off-by: Evan Baker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/infra Test, Release, or CI Infrastructure priority/1 P1 type/fix Fixes something
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Unable to fork /bin/clang.
3 participants