Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: share images between backends #708

Merged
merged 5 commits into from
Aug 17, 2023

Conversation

SdgJlbl
Copy link
Contributor

@SdgJlbl SdgJlbl commented Aug 10, 2023

Description

How has this been tested?

Checklist

  • changelog was updated with notable changes
  • documentation was updated

Signed-off-by: SdgJlbl <[email protected]>
@SdgJlbl SdgJlbl force-pushed the share-images-between-backends branch 2 times, most recently from a0f0626 to b0442ba Compare August 14, 2023 09:11
@github-actions github-actions bot added the api label Aug 14, 2023
@SdgJlbl SdgJlbl force-pushed the share-images-between-backends branch from 6456bfc to 2785c44 Compare August 16, 2023 09:09
@SdgJlbl
Copy link
Contributor Author

SdgJlbl commented Aug 16, 2023

/e2e --benchmarks mnist, camelyon

@Owlfred
Copy link

Owlfred commented Aug 16, 2023

End to end tests: ❌ FAILURE

Jobs status:

  • Camelyon: ⏭️
  • Dispatch Jobs: ✔️
  • Distributed / distributed-sdk:
  • MNIST / standalone-mnist: ✔️
  • Standalone / standalone-sdk:

Oh well.

@SdgJlbl
Copy link
Contributor Author

SdgJlbl commented Aug 16, 2023

/e2e --benchmarks camelyon

@Owlfred
Copy link

Owlfred commented Aug 16, 2023

End to end tests: ❌ FAILURE

Jobs status:

  • Camelyon / standalone-camelyon:
  • Dispatch Jobs: ✔️
  • Distributed / distributed-sdk:
  • MNIST: ⏭️
  • Standalone / standalone-sdk:

It'll stay between us, no one needs to know.

Signed-off-by: SdgJlbl <[email protected]>
@SdgJlbl SdgJlbl force-pushed the share-images-between-backends branch from ceb6df7 to 78685ff Compare August 16, 2023 14:55
@SdgJlbl SdgJlbl marked this pull request as ready for review August 17, 2023 14:51
@SdgJlbl SdgJlbl requested a review from a team as a code owner August 17, 2023 14:51
@SdgJlbl SdgJlbl changed the title Share images between backends feat: share images between backends Aug 17, 2023
@SdgJlbl SdgJlbl merged commit aaafc44 into poc-decoupled-builder Aug 17, 2023
@SdgJlbl SdgJlbl deleted the share-images-between-backends branch August 17, 2023 14:53
ThibaultFy pushed a commit that referenced this pull request Aug 22, 2023
guilhem-barthes pushed a commit that referenced this pull request Oct 6, 2023
guilhem-barthes pushed a commit that referenced this pull request Oct 6, 2023
thbcmlowk pushed a commit that referenced this pull request Oct 11, 2023
thbcmlowk pushed a commit that referenced this pull request Oct 11, 2023
SdgJlbl added a commit that referenced this pull request Oct 20, 2023
thbcmlowk pushed a commit that referenced this pull request Oct 24, 2023
SdgJlbl added a commit that referenced this pull request Oct 25, 2023
* feat: decouple image builder from worker

Signed-off-by: SdgJlbl <[email protected]>

* fix: update skaffold config

Signed-off-by: Guilhem Barthes <[email protected]>

* feat: add `ServiceAccount` and modify role

Signed-off-by: Guilhem Barthes <[email protected]>

* fix: improve `wait_for_image_built`

Signed-off-by: Guilhem Barthes <[email protected]>

* feat: build image in new pod

Signed-off-by: Guilhem Barthes <[email protected]>

* chore: rename `deployment-builder.yaml` to `stateful-builder.yaml`

Signed-off-by: Guilhem Barthes <[email protected]>

* chore: rename `stateful-builder.yaml` to `statefulset-builder.yaml`

Signed-off-by: Guilhem Barthes <[email protected]>

* chore: centralize params

Signed-off-by: Guilhem Barthes <[email protected]>

* feat: create `BuildTask`

Signed-off-by: Guilhem Barthes <[email protected]>

* feat: move some values to `builder` module

Signed-off-by: Guilhem Barthes <[email protected]>

* feat: move more code to `builder`

Signed-off-by: Guilhem Barthes <[email protected]>

* fix: remove TaskProfiling as Celery task + save Entrypoint in DB

Signed-off-by: SdgJlbl <[email protected]>

* fix: extract entrypoint from registry

Signed-off-by: SdgJlbl <[email protected]>

* fix: make doc for helm chart

Signed-off-by: SdgJlbl <[email protected]>

* feat: build function at registration (#707)

<!-- Please reference issue if any. -->

<!-- Please include a summary of your changes. -->

<!-- Please describe the tests that you ran to verify your changes.  -->

- [ ] [changelog](../CHANGELOG.md) was updated with notable changes
- [ ] documentation was updated

---------

Signed-off-by: SdgJlbl <[email protected]>
Signed-off-by: Guilhem Barthes <[email protected]>
Co-authored-by: SdgJlbl <[email protected]>

* feat: share images between backends (#708)



Signed-off-by: SdgJlbl <[email protected]>

* chore: update helm worklfow

Signed-off-by: ThibaultFy <[email protected]>

* chore: add .DS_Store to gitignore

Signed-off-by: ThibaultFy <[email protected]>

* chore: rm DS_Store

Signed-off-by: ThibaultFy <[email protected]>

* chore: rm .DS_Store

Signed-off-by: ThibaultFy <[email protected]>

* [sub]fix: add missing migration poc (#728)

## Description

Add a migration missing in the poc. 
This migration alters two things:

-  modify `ComputeTaskFailureReport.logs` 
-  modify `FunctionImage.file`

This migration has been generated automatically with `make migrations`

## How has this been tested?

<!-- Please describe the tests that you ran to verify your changes.  -->

## Checklist

- [ ] [changelog](../CHANGELOG.md) was updated with notable changes
- [ ] documentation was updated

Signed-off-by: Guilhem Barthes <[email protected]>

* [sub]feat: add function events (#714)

- Substra/orchestrator#263

Add function events, used now we decoupled the building of the function
with the execution of the compute task. For that it add a status field
on the Function. It also includes another PR (merged here), to have
functions build logs working again.

In a future PR, we will change the compute task execution to avoid
having to wait_for_function_built in compute_task()

Fixes FL-1160

As this is going to be merged on a branch that is going to be merged to
a POC branch, we use MNIST as a baseline of a working model. We will
deal with failing tests on the POC before merging on main.

- [x] [changelog](../CHANGELOG.md) was updated with notable changes
- [ ] documentation was updated

---------

Signed-off-by: SdgJlbl <[email protected]>
Signed-off-by: Guilhem Barthes <[email protected]>
Signed-off-by: Guilhem Barthés <[email protected]>
Co-authored-by: SdgJlbl <[email protected]>

* [sub]fix(app/orchestrator/resources): FunctionStatus.FUNCTION_STATUS_CREATED -> FunctionStatus.FUNCTION_STATUS_WAITING (#742)

# Issue

Backend FunctionStatus are not aligned with [orchestrator
definitions](https://github.com/Substra/orchestrator/blob/poc-decoupled-builder/lib/asset/function.proto#L29-L36).
In particular, `FunctionStatus.FUNCTION_STATUS_CREATED` leading to the
following error:

```txt
ValueError: 'FUNCTION_STATUS_WAITING' is not a valid FunctionStatus
```

## Description

FunctionStatus.FUNCTION_STATUS_CREATED ->
FunctionStatus.FUNCTION_STATUS_WAITING

## How has this been tested?

Running Camelyon benchmark on
[poc-builder-flpc](https://substra.org-1.poc-builder-flpc.cg.owkin.tech/compute_plans/a420306f-5719-412b-ab9c-688b7bed9c70/tasks?page=1&ordering=-rank)
environment.

## Checklist

- [ ] [changelog](../CHANGELOG.md) was updated with notable changes
- [ ] documentation was updated

---------

Signed-off-by: Thibault Camalon <[email protected]>

* fix: builder using builder SA (#754)

* fix: builder using builder SA

Signed-off-by: Guilhem Barthés <[email protected]>

* docs: changelog

Signed-off-by: Guilhem Barthés <[email protected]>

---------

Signed-off-by: Guilhem Barthés <[email protected]>

* fix: rebase changelog

Signed-off-by: Guilhem Barthés <[email protected]>

* fix: adapt to pydantic 2.x.x (#758)

Signed-off-by: Guilhem Barthés <[email protected]>

* [sub]fix(backend/image_transfert/encoder): update pydantic method (#763)

* fix(backend/image_transfert/encoder): update pydantic method

Signed-off-by: Thibault Camalon <[email protected]>

* fix(backend/image_transfer/decoder): parse_raw -> model_validate_json

Signed-off-by: Thibault Camalon <[email protected]>

---------

Signed-off-by: Thibault Camalon <[email protected]>

* [sub]chore: upgrade chart (#765)

* chore(charts): bump chart version

Signed-off-by: Thibault Camalon <[email protected]>

* chore(charts/substra-backend/CHANGELOG): bring back unreleased section

Signed-off-by: Thibault Camalon <[email protected]>

---------

Signed-off-by: Thibault Camalon <[email protected]>

* fix: post-rebase

Signed-off-by: SdgJlbl <[email protected]>

* chore: rationalize migrations

Signed-off-by: SdgJlbl <[email protected]>

* [sub]chore(builder): waitPostgresqlInitContainer (#764)

* fix: builder using builder SA (#754)

* fix: builder using builder SA

Signed-off-by: Guilhem Barthés <[email protected]>

* docs: changelog

Signed-off-by: Guilhem Barthés <[email protected]>

---------

Signed-off-by: Guilhem Barthés <[email protected]>

* chore(charts/substra-backend/templates/statefulset-builder): add init-container waitPostgresqlInitContainer

Signed-off-by: Thibault Camalon <[email protected]>

---------

Signed-off-by: Guilhem Barthés <[email protected]>
Signed-off-by: Thibault Camalon <[email protected]>
Co-authored-by: Guilhem Barthés <[email protected]>

---------

Signed-off-by: SdgJlbl <[email protected]>
Signed-off-by: Guilhem Barthes <[email protected]>
Signed-off-by: ThibaultFy <[email protected]>
Signed-off-by: Guilhem Barthés <[email protected]>
Signed-off-by: Thibault Camalon <[email protected]>
Co-authored-by: SdgJlbl <[email protected]>
Co-authored-by: ThibaultFy <[email protected]>
Co-authored-by: Thibault Camalon <[email protected]>
guilhem-barthes pushed a commit that referenced this pull request Oct 25, 2023
guilhem-barthes pushed a commit that referenced this pull request Feb 8, 2024
guilhem-barthes pushed a commit that referenced this pull request Feb 12, 2024
guilhem-barthes added a commit that referenced this pull request Feb 12, 2024
* feat: decouple image builder from worker

Signed-off-by: SdgJlbl <[email protected]>

* fix: update skaffold config

Signed-off-by: Guilhem Barthes <[email protected]>

* feat: add `ServiceAccount` and modify role

Signed-off-by: Guilhem Barthes <[email protected]>

* feat: build image in new pod

Signed-off-by: Guilhem Barthes <[email protected]>

* chore: rename `deployment-builder.yaml` to `stateful-builder.yaml`

Signed-off-by: Guilhem Barthes <[email protected]>

* chore: rename `stateful-builder.yaml` to `statefulset-builder.yaml`

Signed-off-by: Guilhem Barthes <[email protected]>

* chore: centralize params

Signed-off-by: Guilhem Barthes <[email protected]>

* feat: create `BuildTask`

Signed-off-by: Guilhem Barthes <[email protected]>

* feat: move more code to `builder`

Signed-off-by: Guilhem Barthes <[email protected]>

* fix: remove TaskProfiling as Celery task + save Entrypoint in DB

Signed-off-by: SdgJlbl <[email protected]>

* feat: build function at registration (#707)

<!-- Please reference issue if any. -->

<!-- Please include a summary of your changes. -->

<!-- Please describe the tests that you ran to verify your changes.  -->

- [ ] [changelog](../CHANGELOG.md) was updated with notable changes
- [ ] documentation was updated

---------

Signed-off-by: SdgJlbl <[email protected]>
Signed-off-by: Guilhem Barthes <[email protected]>
Co-authored-by: SdgJlbl <[email protected]>

* feat: share images between backends (#708)



Signed-off-by: SdgJlbl <[email protected]>

* chore: update helm worklfow

Signed-off-by: ThibaultFy <[email protected]>

* [sub]fix: add missing migration poc (#728)

## Description

Add a migration missing in the poc. 
This migration alters two things:

-  modify `ComputeTaskFailureReport.logs` 
-  modify `FunctionImage.file`

This migration has been generated automatically with `make migrations`

## How has this been tested?

<!-- Please describe the tests that you ran to verify your changes.  -->

## Checklist

- [ ] [changelog](../CHANGELOG.md) was updated with notable changes
- [ ] documentation was updated

Signed-off-by: Guilhem Barthes <[email protected]>

* [sub]feat: add function events (#714)

- Substra/orchestrator#263

Add function events, used now we decoupled the building of the function
with the execution of the compute task. For that it add a status field
on the Function. It also includes another PR (merged here), to have
functions build logs working again.

In a future PR, we will change the compute task execution to avoid
having to wait_for_function_built in compute_task()

Fixes FL-1160

As this is going to be merged on a branch that is going to be merged to
a POC branch, we use MNIST as a baseline of a working model. We will
deal with failing tests on the POC before merging on main.

- [x] [changelog](../CHANGELOG.md) was updated with notable changes
- [ ] documentation was updated

---------

Signed-off-by: SdgJlbl <[email protected]>
Signed-off-by: Guilhem Barthes <[email protected]>
Signed-off-by: Guilhem Barthés <[email protected]>
Co-authored-by: SdgJlbl <[email protected]>

* [sub]fix(app/orchestrator/resources): FunctionStatus.FUNCTION_STATUS_CREATED -> FunctionStatus.FUNCTION_STATUS_WAITING (#742)

# Issue

Backend FunctionStatus are not aligned with [orchestrator
definitions](https://github.com/Substra/orchestrator/blob/poc-decoupled-builder/lib/asset/function.proto#L29-L36).
In particular, `FunctionStatus.FUNCTION_STATUS_CREATED` leading to the
following error:

```txt
ValueError: 'FUNCTION_STATUS_WAITING' is not a valid FunctionStatus
```

## Description

FunctionStatus.FUNCTION_STATUS_CREATED ->
FunctionStatus.FUNCTION_STATUS_WAITING

## How has this been tested?

Running Camelyon benchmark on
[poc-builder-flpc](https://substra.org-1.poc-builder-flpc.cg.owkin.tech/compute_plans/a420306f-5719-412b-ab9c-688b7bed9c70/tasks?page=1&ordering=-rank)
environment.

## Checklist

- [ ] [changelog](../CHANGELOG.md) was updated with notable changes
- [ ] documentation was updated

---------

Signed-off-by: Thibault Camalon <[email protected]>

* fix: rebase changelog

Signed-off-by: Guilhem Barthés <[email protected]>

* feat: decouple image builder from worker

Signed-off-by: SdgJlbl <[email protected]>

* feat: add `ServiceAccount` and modify role

Signed-off-by: Guilhem Barthes <[email protected]>

* feat: build function at registration (#707)

<!-- Please reference issue if any. -->

<!-- Please include a summary of your changes. -->

<!-- Please describe the tests that you ran to verify your changes.  -->

- [ ] [changelog](../CHANGELOG.md) was updated with notable changes
- [ ] documentation was updated

---------

Signed-off-by: SdgJlbl <[email protected]>
Signed-off-by: Guilhem Barthes <[email protected]>
Co-authored-by: SdgJlbl <[email protected]>

* feat: save status update in orc

Signed-off-by: Guilhem Barthes <[email protected]>

* feat: use status for build waiting

Signed-off-by: Guilhem Barthes <[email protected]>

* fix: re-add `container_image_exists`

Signed-off-by: Guilhem Barthes <[email protected]>

* fix: rebase errors

Signed-off-by: Guilhem Barthes <[email protected]>

* fix: format

Signed-off-by: Guilhem Barthes <[email protected]>

* fix: tests

Signed-off-by: Guilhem Barthes <[email protected]>

* fix: add `si` to building invokations

Signed-off-by: Guilhem Barthes <[email protected]>

* fix: tests

Signed-off-by: Guilhem Barthes <[email protected]>

* fix: apply feedback

Signed-off-by: Guilhem Barthes <[email protected]>

* fix: only import during typing

Signed-off-by: Guilhem Barthes <[email protected]>

* [sub]feat: modify computetask failure report (#727)

## Companion PR

- Substra/orchestrator#277
- Substra/substra-frontend#240

## Description

The aim is to allow registering failure reports not only for compute
task but for other kind of assets (for now, functions which are not
building as part of the execution of a compute task)

- Modifies `ComputeTaskFailureReport`:
    - renamed the model to `AssetFailureReport`
- renamed field `compute_task_key` to `asset_key` (as we can now have a
function key)
    - added field `asset_type` to provide 
- Updates protobuf reflecting the previous changes
- refactor `download_file` in `PermissionMixin` to provide mroe
flexibility (and decouple from DRF)
- create new `FailableTask` (Celery task):
  - centralize the logic to submit asset failure

## How has this been tested?

As this is going to be merged on a branch that is going to be merged to
a POC branch, we use MNIST as a baseline of a working model. We will
deal with failing tests on the POC before merging on main.

## Checklist

- [x] [changelog](../CHANGELOG.md) was updated with notable changes
- [ ] documentation was updated

---------

Signed-off-by: Guilhem Barthes <[email protected]>

* feat: add config to run celery in tests

Signed-off-by: Guilhem Barthés <[email protected]>

* feat: add tests

Signed-off-by: Guilhem Barthés <[email protected]>

* fix: remove rebqse duplicate

Signed-off-by: Guilhem Barthés <[email protected]>

* docs: changelog

Signed-off-by: Guilhem Barthés <[email protected]>

* fix: adapt to pydantic 2.x.x

Signed-off-by: Guilhem Barthés <[email protected]>

* fix: remove rebase artifacts

Signed-off-by: Guilhem Barthés <[email protected]>

* fix: update to pydantic 2.x.x

Signed-off-by: Guilhem Barthés <[email protected]>

---------

Signed-off-by: SdgJlbl <[email protected]>
Signed-off-by: Guilhem Barthes <[email protected]>
Signed-off-by: ThibaultFy <[email protected]>
Signed-off-by: Guilhem Barthés <[email protected]>
Signed-off-by: Thibault Camalon <[email protected]>
Co-authored-by: SdgJlbl <[email protected]>
Co-authored-by: ThibaultFy <[email protected]>
Co-authored-by: Thibault Camalon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants