Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] Add tests for non-native architectures #622

Merged
merged 10 commits into from
Sep 11, 2023

Conversation

tiyash-basu-frequenz
Copy link
Contributor

@tiyash-basu-frequenz tiyash-basu-frequenz commented Aug 30, 2023

This change introduces a new job to the CI to test the SDK on non-native architectures. Github CI ATM provides amd64 runners by default, so the new job targets testing on arch64 for now.

The tests are run in containers running on the target platform(s). The container definitions are supposed to exist in the directory .github/containers/nox-cross-arch. A definition for an arm64-ubuntu20.04-python3.11 container can be found there.

Since the tests take quite a while to run, given that they run on emulations, this step runs only on the merge_group event.

@tiyash-basu-frequenz tiyash-basu-frequenz added this to the v1.0.0-rc milestone Aug 30, 2023
@tiyash-basu-frequenz tiyash-basu-frequenz self-assigned this Aug 30, 2023
@tiyash-basu-frequenz tiyash-basu-frequenz requested a review from a team as a code owner August 30, 2023 14:37
@github-actions github-actions bot added part:docs Affects the documentation part:tooling Affects the development tooling (CI, deployment, dependency management, etc.) labels Aug 30, 2023
@tiyash-basu-frequenz tiyash-basu-frequenz added the cmd:skip-release-notes It is not necessary to update release notes for this PR label Aug 30, 2023
@tiyash-basu-frequenz tiyash-basu-frequenz linked an issue Aug 30, 2023 that may be closed by this pull request
@tiyash-basu-frequenz
Copy link
Contributor Author

It is a false positive. The warning reported in the CI (check here) will disappear in future CI runs once the migration to BackgroundService is completed.

For now, we can ignore the CI issue here.

Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Just a few comments and questions...

Also, wish for the future: Have our own GitHub ARM runners (https://actuated.dev/blog/native-arm64-for-github-actions)

.github/workflows/ci.yaml Outdated Show resolved Hide resolved
.github/workflows/ci.yaml Show resolved Hide resolved
.github/workflows/ci.yaml Show resolved Hide resolved
.github/containers/nox-cross-arch/README.md Outdated Show resolved Hide resolved
.github/workflows/ci.yaml Outdated Show resolved Hide resolved
@tiyash-basu-frequenz tiyash-basu-frequenz removed the cmd:skip-release-notes It is not necessary to update release notes for this PR label Sep 4, 2023
@tiyash-basu-frequenz
Copy link
Contributor Author

tiyash-basu-frequenz commented Sep 5, 2023

The CI complains:

WARNING -  Doc file 'CONTRIBUTING.md' contains a relative link '.github/containers/nox-cross-arch/README.md', but the target is not found among documentation files.

but the file exists, and I have tested the link in my local environment.
Looking into it.

Edit: Fixed. Needed an entry in the docs directory.

@tiyash-basu-frequenz tiyash-basu-frequenz force-pushed the ci_test_arm64 branch 2 times, most recently from b1bbb97 to ec6ecd6 Compare September 5, 2023 08:40
docs/.github/containers/nox-cross-arch/README.md Outdated Show resolved Hide resolved
.github/workflows/ci.yaml Outdated Show resolved Hide resolved
.github/workflows/ci.yaml Outdated Show resolved Hide resolved
.github/workflows/ci.yaml Show resolved Hide resolved
@llucax
Copy link
Contributor

llucax commented Sep 5, 2023

@tiyash-basu-frequenz

The CI complains:

WARNING -  Doc file 'CONTRIBUTING.md' contains a relative link '.github/containers/nox-cross-arch/README.md', but the target is not found among documentation files.

but the file exists, and I have tested the link in my local environment. Looking into it.

Edit: Fixed. Needed an entry in the docs directory.

There are still issues with this:

INFO - The following pages are being built only for the preview but will be excluded from mkdocs build per exclude_docs:
- http://127.0.0.1:8000/.github/containers/nox-cross-arch/

Also the URL looks very weird. I think the easiest solution for this would be to move the contents of .github/containers/nox-cross-arch/README.md to CONTRIBUTING.md and maybe leave only there a comment saying "Look at CONTRIBUTING.md for details.

Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the messy review. So as a summary, I would:

  • Try the NOP approach for running the job in PRs (and running on push too) so we can have a branch protection rule.
  • Add the pytest_max nox session to the matrix.
  • Remove -R from nox invocation.

With this I think we are ready to merge (unless the CI is unbearably slow and we need to address when to run it before merging).

--net=host \
--platform linux/${{ matrix.arch }} \
localhost/nox-cross-arch:latest \
nox -R -e ${{ matrix.nox-session }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A way to still get the debug info about dependencies would be something like /bin/sh -c "nox --install-only -e ${{ matrix.nox-session }}; pip freeze; nox -e ${{ matrix.nox-session }}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Done

@llucax
Copy link
Contributor

llucax commented Sep 5, 2023

One more comment, if you can maybe let the job run once in this PR it might be interesting to know how much time does it get to run in GitHub Actions.

@tiyash-basu-frequenz
Copy link
Contributor Author

Also the URL looks very weird. I think the easiest solution for this would be to move the contents of .github/containers/nox-cross-arch/README.md to CONTRIBUTING.md and maybe leave only there a comment saying "Look at CONTRIBUTING.md for details.

Done

@tiyash-basu-frequenz
Copy link
Contributor Author

  • Try the NOP approach for running the job in PRs (and running on push too) so we can have a branch protection rule.
  • Add the pytest_max nox session to the matrix.
  • Remove -R from nox invocation.

As mentioned in threads, pytest_max has been added already, and, also explained in thread, there is no need to remove -R from nox.

@llucax llucax removed this from the v1.0.0-rc milestone Sep 5, 2023
@tiyash-basu-frequenz
Copy link
Contributor Author

tiyash-basu-frequenz commented Sep 6, 2023

I removed the filter to run it only on merge_group events, to get some test results.
Here is a run without caches: https://github.com/frequenz-floss/frequenz-sdk-python/actions/runs/6096928673/job/16543523647?pr=622
Here is a reattempt of it, with caches: https://github.com/frequenz-floss/frequenz-sdk-python/actions/runs/6096928673/job/16546405568?pr=622

Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 2 more comments, one small and one that might be important.

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
RELEASE_NOTES.md Outdated Show resolved Hide resolved
.github/workflows/ci.yaml Outdated Show resolved Hide resolved
@llucax
Copy link
Contributor

llucax commented Sep 7, 2023

I removed the filter to run it only on merge_group events, to get some test results.
Here is a run without caches: frequenz-floss/frequenz-sdk-python/actions/runs/6096928673/job/16543523647?pr=622
Here is a reattempt of it, with caches: frequenz-floss/frequenz-sdk-python/actions/runs/6096928673/job/16546405568?pr=622

Great to see it green! So the cache makes a big difference but still with cache it is in the 15+m so I think we should proceed with only running on the merge queue. As we talked about, push is not necessary as the exact same commit tested in the merge queue will be pushed, so we'll be testing the exact same code twice.

@llucax llucax removed this pull request from the merge queue due to a manual request Sep 8, 2023
@tiyash-basu-frequenz
Copy link
Contributor Author

Pushing the workaround in a bit.

@llucax
Copy link
Contributor

llucax commented Sep 8, 2023

OK, another option I guess it would be to create a fake non matrix job with the same name, maybe that's easier and less confusing as it could still be marked as skipped

@llucax
Copy link
Contributor

llucax commented Sep 8, 2023

I'm thinking about:

  # TODO: Add comment explaining what is this for
  nox-cross-arch-dummy:
    name: Cross-arch tests with nox
    if: never()
    runs-on: ${{ matrix.os }}

    steps:
      - name: Dummy step
        run: true

  nox-cross-arch:
    name: Cross-arch tests with nox
    if: github.event_name == 'merge_group'
    strategy:
      fail-fast: false
      # Before adding new items to this matrix, make sure that a dockerfile
      # exists for the combination of items in the matrix.
      # Refer to .github/containers/nox-cross-arch/README.md to learn how to
      # add and name new dockerfiles.
      matrix:
        arch:
          - arm64
        os:
          - ubuntu-20.04
        python:
          - "3.11"
        nox-session:
          - "pytest_min"
          - "pytest_max"
    runs-on: ${{ matrix.os }}

    steps:
      - name: Fetch sources
        uses: actions/checkout@v3
    #...

@llucax
Copy link
Contributor

llucax commented Sep 8, 2023

Oh, no, we should fake the ones that run in the queue, so it is more boilerplate, as we need 2 dummy jobs (actually one per matrix item in the future).

@tiyash-basu-frequenz
Copy link
Contributor Author

tiyash-basu-frequenz commented Sep 8, 2023

Another workaround would be to create a dummy job that runs only on merge_group events, and marking it as a dependency for the nox-cross-arch job.

I need to read a bit more about the issue. Maybe the CI handles if and needs in different ways.

@llucax
Copy link
Contributor

llucax commented Sep 8, 2023

Not sure if it is worth it in this case, but just in case:

  # TODO: Add comment explaining what is this for
  nox-cross-arch-dummy-min:
    name: Cross-arch tests with nox (arm64, ubuntu-20.04, 3.11, pytest_min)
    if: never()
    runs-on: ${{ matrix.os }}

    steps:
      - name: Dummy step
        run: true

  nox-cross-arch-dummy-max:
    name: Cross-arch tests with nox (arm64, ubuntu-20.04, 3.11, pytest_max)
    if: never()
    runs-on: ${{ matrix.os }}

    steps:
      - name: Dummy step
        run: true

  nox-cross-arch:
    name: Cross-arch tests with nox
    if: github.event_name == 'merge_group'
    strategy:
      fail-fast: false
      # Before adding new items to this matrix, make sure that a dockerfile
      # exists for the combination of items in the matrix.
      # Refer to .github/containers/nox-cross-arch/README.md to learn how to
      # add and name new dockerfiles.
      matrix:
        arch:
          - arm64
        os:
          - ubuntu-20.04
        python:
          - "3.11"
        nox-session:
          - "pytest_min"
          - "pytest_max"
    runs-on: ${{ matrix.os }}

    steps:
      - name: Fetch sources
        uses: actions/checkout@v3
    #...

@llucax
Copy link
Contributor

llucax commented Sep 8, 2023

Ah, yeah, that's not bad to have a job that "merges" all the matrix jobs for cross-arch and we add that as the branch protection 👍

@llucax
Copy link
Contributor

llucax commented Sep 8, 2023

  # TODO: Add comment explaining what is this for
  nox-cross-arch-all:
    needs: ["nox-cross-arch"]
    name: Cross-arch tests with nox
    if: github.event_name == 'merge_group'
    runs-on: ubuntu-20.04

    steps:
      - name: Dummy step
        run: true

  nox-cross-arch:
    name: Cross-arch tests with nox
    if: github.event_name == 'merge_group'
    strategy:
      fail-fast: false
      # Before adding new items to this matrix, make sure that a dockerfile
      # exists for the combination of items in the matrix.
      # Refer to .github/containers/nox-cross-arch/README.md to learn how to
      # add and name new dockerfiles.
      matrix:
        arch:
          - arm64
        os:
          - ubuntu-20.04
        python:
          - "3.11"
        nox-session:
          - "pytest_min"
          - "pytest_max"
    runs-on: ${{ matrix.os }}

    steps:
      - name: Fetch sources
        uses: actions/checkout@v3
    #...

I think this should work. For merge queues nox-cross-arch-all will depend on nox-cross-arch-all and run, so when used to protect the branch merge queues should be merged.

For PRs nox-cross-arch-all should be skipped, also fulfilling the branch protection rule.

@tiyash-basu-frequenz
Copy link
Contributor Author

Yep, that's exactly what I was thinking of.

@tiyash-basu-frequenz
Copy link
Contributor Author

Added the workaround.

.github/workflows/ci.yaml Outdated Show resolved Hide resolved
llucax
llucax previously approved these changes Sep 8, 2023
@llucax
Copy link
Contributor

llucax commented Sep 8, 2023

Jobs are not being created, it seems like something is wrong:

Invalid workflow file: .github/workflows/ci.yaml#L183
The workflow is not valid. .github/workflows/ci.yaml (Line: 183, Col: 5): Required property is missing: runs-on

@@ -176,7 +284,7 @@ jobs:

publish-docs:
name: Publish documentation website to GitHub pages
needs: ["nox", "test-installation"]
needs: ["nox", "nox-cross-arch", "test-installation"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this, just to make the jobs graph look prettier :P

Suggested change
needs: ["nox", "nox-cross-arch", "test-installation"]
needs: ["nox", "nox-cross-arch-all", "test-installation"]

# https://github.com/orgs/community/discussions/9141
nox-cross-arch-all:
name: Cross-arch tests with nox
needs: ["nox-cross-arch"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
needs: ["nox-cross-arch"]
needs: ["nox-cross-arch"]
runs-on: ubuntu-20.04

@tiyash-basu-frequenz
Copy link
Contributor Author

Jobs are not being created, it seems like something is wrong:

Invalid workflow file: .github/workflows/ci.yaml#L183
The workflow is not valid. .github/workflows/ci.yaml (Line: 183, Col: 5): Required property is missing: runs-on

Oops that looks like a bug. Checking.

This job runs if `nox-cross-arch` ran and succeeded.
This is required because, when the `nox-cross-arch` job is skipped, its
inner matrix is not expanded, and branch protection rules on the
inner-matrix jobs get stuck. So instead of `nox-cross-arch`, this job can
be used in the branch protection rules.

Signed-off-by: Tiyash Basu <[email protected]>
@tiyash-basu-frequenz
Copy link
Contributor Author

Fixed.

@llucax llucax enabled auto-merge September 11, 2023 08:36
@llucax llucax added this pull request to the merge queue Sep 11, 2023
Merged via the queue into frequenz-floss:v0.x.x with commit d34aea6 Sep 11, 2023
@tiyash-basu-frequenz tiyash-basu-frequenz deleted the ci_test_arm64 branch September 11, 2023 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
part:docs Affects the documentation part:tooling Affects the development tooling (CI, deployment, dependency management, etc.)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable CI for ARM
2 participants