Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: build and push workflow failing due to missing -f option buildx build command #425

Merged
merged 4 commits into from
Jun 20, 2024

Conversation

micmarty-deepsense
Copy link
Contributor

I noticed that images on main branch are failing to build (and push) due to missing -f parameter in docker buildx build. By default it expects Dockerfile to exist, but we only have Dockerfile-amd64 and Dockerfile-arm64

image

Copy link
Contributor

@MthwRobinson MthwRobinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -68,8 +69,8 @@ jobs:
# Clear some space (https://github.com/actions/runner-images/issues/2840)
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/share/boost

ARCH=$(cut -d "/" -f2 <<< ${{ matrix.docker-platform }})
DOCKER_BUILDKIT=1 docker buildx build --platform=$ARCH --load \
DOCKER_BUILDKIT=1 docker buildx build --load -f Dockerfile-${{ matrix.arch }} \
Copy link
Contributor Author

@micmarty-deepsense micmarty-deepsense Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this single line fixes the issue, the rest is just a refactor

ARCH=$(cut -d "/" -f2 <<< ${{ matrix.docker-platform }})
DOCKER_BUILDKIT=1 docker buildx build --platform=$ARCH --load \
DOCKER_BUILDKIT=1 docker buildx build --load -f Dockerfile-${{ matrix.arch }} \
--platform=$DOCKER_PLATFORM \
Copy link
Contributor Author

@micmarty-deepsense micmarty-deepsense Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically this should be e.g. "linux/amd64" (current change), but "amd64" (as it was) would also work

if [ "${{ matrix.docker-platform }}" == "linux/arm64" ]; then
DOCKER_PLATFORM="${{ matrix.docker-platform }}" DOCKER_IMAGE="$DOCKER_BUILD_REPOSITORY:$ARCH-$SHORT_SHA" \
DOCKER_IMAGE="$DOCKER_BUILD_REPOSITORY:${{ matrix.arch }}-$SHORT_SHA" \
if [ "$DOCKER_PLATFORM" == "linux/arm64" ]; then
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOCKER_PLATFORM env variable is already set, so no need to provide it here. If you expect being explicit, please let me know

@@ -1,5 +1,5 @@
# syntax=docker/dockerfile:experimental
FROM quay.io/unstructured-io/base-images:wolfi-base@sha256:6c00a236c648ffdaf196ccbc446f5c6cc9eb4e3ab9e437178abcfac710b2b373 as base
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this image does not exist anymore, I had to update SHA

@micmarty-deepsense
Copy link
Contributor Author

FYI This PR is blocked by some random issue with apt repository being not responsive (timeout)
https://github.com/Unstructured-IO/unstructured-api/actions/runs/9598390479/job/26473331895?pr=425

Copy link
Contributor

@christinestraub christinestraub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ARCH env variable also already set or do we need to replace it with {{ matrix.arch }}?

@micmarty-deepsense
Copy link
Contributor Author

Is the ARCH env variable also already set or do we need to replace it with {{ matrix.arch }}?

It's no longer being set, but technically we could, it depends what we prefer. We could go with this:

...
somejob:
    env:
        ARCH: ${{ matrix.arch }}
    run: echo $ARCH

but IMO it's better to stick with only one way of accessing the value we want to use.

My primary goal was to eliminate redundant usages of cut and fd in many places. Even though we all understand what the command does, it's unnecessarily "complex". It's easier to have "amd64" matrix value and prepend it with "linux/" in env var. WDYT? @christinestraub

@christinestraub
Copy link
Contributor

It's no longer being set, but technically we could, it depends what we prefer. We could go with this:

...
somejob:
    env:
        ARCH: ${{ matrix.arch }}
    run: echo $ARCH

but IMO it's better to stick with only one way of accessing the value we want to use.

My primary goal was to eliminate redundant usages of cut and fd in many places. Even though we all understand what the command does, it's unnecessarily "complex". It's easier to have "amd64" matrix value and prepend it with "linux/" in env var. WDYT? @christinestraub

Sounds reasonable!

@micmarty-deepsense
Copy link
Contributor Author

It's no longer being set, but technically we could, it depends what we prefer. We could go with this:

...
somejob:
    env:
        ARCH: ${{ matrix.arch }}
    run: echo $ARCH

but IMO it's better to stick with only one way of accessing the value we want to use.
My primary goal was to eliminate redundant usages of cut and fd in many places. Even though we all understand what the command does, it's unnecessarily "complex". It's easier to have "amd64" matrix value and prepend it with "linux/" in env var. WDYT? @christinestraub

Sounds reasonable!

if everything else looks good, please approve

@MthwRobinson MthwRobinson enabled auto-merge (squash) June 20, 2024 20:23
@MthwRobinson MthwRobinson disabled auto-merge June 20, 2024 20:31
@MthwRobinson MthwRobinson enabled auto-merge (squash) June 20, 2024 20:40
@MthwRobinson MthwRobinson merged commit e8c6fa9 into main Jun 20, 2024
6 checks passed
@MthwRobinson MthwRobinson deleted the mike/fix-build-and-push-workflow branch June 20, 2024 21:12
tjtanaa added a commit to EmbeddedLLM/unstructured-api-executable that referenced this pull request Jul 11, 2024
…indows (#2)

## PR Summary
1. Merged changes from upstream.
2. Update `unstructuredio_api.spec`.
3. Update `unstructuredio_api.py`.
4. Add additional setup dependencies to the `docs/Windows.md`. 

* build(deps): version bumps for maintenance (Unstructured-IO#424)

### Summary
Version bumps for regular maintenance and to address moderate CVEs from
security scans.
- bump `unstructured` to `0.14.6`
- bump `unstructured-inference` to `0.7.35`

* build: replace rockylinux with chainguard/wolfi as a base image (Unstructured-IO#423)

### Summary
Updates the Dockerfile to use the Chainguard wolfi-base image to reduce
CVEs. Also adds a step in the docker publish job that scans the images
and checks for CVEs before publishing.

### Testing
Run `make docker-build` and  `make docker-start-api`, then try:
```
from unstructured.partition.api import partition_via_api

elements = partition_via_api(
    filename=filename,
    api_url="http://localhost:8000/general/v0/general",
    api_key="<API-KEY>",
    strategy="hi_res",
)

print("\n\n".join([str(el) for el in elements]))
```

* fix: build and push workflow failing due to missing `-f` option `buildx build` command (Unstructured-IO#425)

I noticed that images on main branch are failing to build (and push) due
to missing `-f` parameter in `docker buildx build`. By default it
expects `Dockerfile` to exist, but we only have `Dockerfile-amd64` and
`Dockerfile-arm64`


![image](https://github.com/Unstructured-IO/unstructured-api/assets/64484917/4527165a-909e-498d-b0ee-8bba4b1a13e4)

---------

Co-authored-by: christinestraub <[email protected]>

* fix: update SHA for the base images (both architectures) after `base-images` repo update (Unstructured-IO#427)

build and publish CI steps are failing, because the base images have
changed in quay (their SHAs)

![image](https://github.com/Unstructured-IO/unstructured-api/assets/64484917/fc4e9aac-0820-4c90-9ad9-68cc6d9aad03)


![image](https://github.com/Unstructured-IO/unstructured-api/assets/64484917/fafe2ca4-dab2-4610-a26b-a7a4d56723a5)

* fix: revert to rockylinux SHA that works (arm64) (Unstructured-IO#428)

unnecessary SHA update introduced in
Unstructured-IO#427 that needs
to be reverted

* fix: re-add `DOCKER_IMAGE` env var in `Test image` step (Unstructured-IO#429)

shell syntax error occurs in docker-publish.yml workflow

* fix: invalid env var setting in `docker-publish` workflow (Unstructured-IO#430)

bug introduced in previous PR causing build failure on main

* fix: `docker-publish` workflow failing on main due to inexisting `ARCH` env var (Unstructured-IO#431)

* build(deps): bump dependency versions (Unstructured-IO#434)

### Summary

Bumps dependency versions for the API. Closes Unstructured-IO#432.

* fix/Fix MS Office filetype errors and harden docker smoketest (Unstructured-IO#436)

# Changes
**Fix for docx and other office files returning `{"detail":"File type
None is not supported."}`**
After moving to the wolfi base image, the `mimetypes` lib no longer
knows about these file extensions. To avoid issues like this, let's add
an explicit mapping for all the file extensions we care about. I added a
`filetypes.py` and moved `get_validated_mimetype` over. When this file
is imported, we'll call `mimetypes.add_type` for all file extensions we
support.

**Update smoke test coverage**
This bug snuck past because we were already providing the mimetype in
the docker smoke test. I updated `test_happy_path` to test against the
container with and without passing `content_type`. I added some missing
filetypes, and sorted the test params by extension so we can see when
new types are missing.

# Testing
The new smoke test will verify that all filetypes are working. You can
also `make docker-build && make docker-start-api`, and test out the docx
in the sample docs dir. On `main`, this file will give you the error
above.
```
curl 'http://localhost:8000/general/v0/general' \
--form 'files=@"fake.docx"'
```

* merge main; validated format: xml, txv, csv, xml, json, html, docs, docx, ppt, pptx, xlsx, xls, pdf

* compilable setting

* update Windows markdown

* Disable debug mode in `unstructuredio_api.spec`

* Enable pdf test case in `test_app.py`

---------

Co-authored-by: Christine Straub <[email protected]>
Co-authored-by: Michał Martyniak <[email protected]>
Co-authored-by: Matt Robinson <[email protected]>
Co-authored-by: Austin Walker <[email protected]>
Co-authored-by: tjtanaa <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants