feat(ml): composable ml #9973

mertalev · 2024-06-04T02:02:41Z

Description

This PR addresses some limitations of the ML service design. Currently, detection and recognition models for facial recognition are bundled in the same class and likewise for textual and visual CLIP models. As a result, they duplicate certain shared behaviors for their own set of models. Moreover, there is no good way to choose particular detection and recognition models. This is a big limitation for OCR, as it is common to mix and match unrelated detection and recognition models. Lastly, there is no way to query an individual detection or recognition model. CLIP models have a custom mode option to do this, but this is specific to that model task.

This PR redesigns the ML service such that each model session is its own class, and broader tasks like facial recognition are modeled as dependencies (recognition being dependent on detection, etc.). This lays a solid foundation for a composable set of models with separate settings for each.

As part of this change, the API has also been updated. A given request can look like this:

{
  "facial-recognition": {
    "detection": {
      "modelName": "buffalo_l"
    },
    "recognition": {
      "modelName": "buffalo_l",
      "options": {
        "minScore": 0.5
      }
    }
  }
}

And the response can look like this:

{
  "facial-recognition": [
    {
      "boundingBox": {
        "x1": 463.0,
        "y1": 133.0,
        "x2": 763.0,
        "y2": 526.0
      },
      "embedding": "vector(512)",
      "score": 0.89526224
    }
  ],
  "imageHeight": 1440,
  "imageWidth": 1152
}

Some implementation notes:

A given request can only be visual or textual; it cannot have multiple modalities at the same time
Images are only decoded once and the decoded data is shared across all models
Dependencies can only go one level deep; a model can depend on the output of multiple models, but those models must not have any dependencies
If a model has a dependency, it is the caller's responsibility to ensure this dependency is included in the request

For facial recognition, a side effect of this change is drastically better performance when there are multiple faces. This is because they are handled in one model pass instead of fed sequentially. With 10 faces in an image, it was 70%+ faster on CPU and 6x faster on GPU. A smaller gain comes from always using Pillow to decode images, as it is several times faster than OpenCV.

While unlikely to be relevant in practice, a nice perk of this change is that it allows one to query any number of tasks at once, like so:

{
  "clip": {
    "visual": {
      "modelName": "ViT-B-32__openai"
    }
  },
  "facial-recognition": {
    "detection": {
      "modelName": "buffalo_l"
    },
    "recognition": {
      "modelName": "buffalo_l",
      "options": {
        "minScore": 0.5
      }
    }
  },
  "ocr": {
    "detection": {
      "modelName": "ch_ppocr_v4",
      "options": {
        "minScore": 0.5
      }
    },
    "recognition": {
      "modelName": "ch_ppocr_v4",
      "options": {
        "minScore": 0.3
      }
    }
  }
}

simplify

fixes remove unnecessary interface support text input, cleanup

server fixes fix typing

update locustfile fixes

formatting and typing rename

fix type actually fix typing

fix detection-only response no need for defaultdict

update api linting

zackpollard

I'm going to approve this, I don't think any of us really are going to be able to properly review this. Imo if you have tested this and are confident with the change, go ahead and merge it in.

jrasm91

LGTM

mertalev requested a review from danieldietzler as a code owner June 4, 2024 02:02

mertalev added 🗄️server 🧠machine-learning labels Jun 4, 2024

mertalev added 7 commits June 4, 2024 19:06

modularize model classes

041332f

various fixes

0b8b166

expose port

ba86641

change response

0482c2b

round coordinates

c011ebe

simplify preload

15c51ba

update server

e364667

mertalev force-pushed the refactor/composable-ml branch from 3961541 to d3a43ca Compare June 4, 2024 23:07

mertalev changed the title ~~refactor(server, ml): composable ml~~ refactor(ml): composable ml Jun 4, 2024

mertalev added 14 commits June 4, 2024 20:16

simplify interface

6b5bd9d

simplify

update tests

e5b0b60

composable endpoint

a86a46f

cleanup

12c7d8a

fixes remove unnecessary interface support text input, cleanup

ew camelcase

63a5e9c

update server

d757d0f

server fixes fix typing

ml fixes

e0fe743

update locustfile fixes

cleaner response

db4d664

better repo response

1779643

update tests

fb71b68

formatting and typing rename

undo compose change

93b0f59

linting

3705ae1

fix type actually fix typing

stricter typing

3fb1fc4

fix detection-only response no need for defaultdict

update spec file

308966d

update api linting

mertalev force-pushed the refactor/composable-ml branch from 28375f6 to 308966d Compare June 5, 2024 00:17

mertalev added 3 commits June 4, 2024 20:55

update e2e

5f504e3

unnecessary dimension

abc0c94

remove commented code

09420d9

mertalev changed the title ~~refactor(ml): composable ml~~ feat(ml): composable ml Jun 5, 2024

mertalev added 2 commits June 4, 2024 21:35

remove duplicate code

d06adb1

remove unused imports

e1cad59

immich-app deleted a comment Jun 5, 2024

zackpollard approved these changes Jun 5, 2024

View reviewed changes

jrasm91 approved these changes Jun 6, 2024

View reviewed changes

add batch dim

9bc3fa7

mertalev enabled auto-merge (squash) June 7, 2024 03:09

mertalev merged commit 2b1b43a into main Jun 7, 2024
22 checks passed

mertalev deleted the refactor/composable-ml branch June 7, 2024 03:09

mertalev mentioned this pull request Jun 17, 2024

feat(server): Import face regions from metadata #6455

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ml): composable ml #9973

feat(ml): composable ml #9973

mertalev commented Jun 4, 2024 •

edited

Loading

zackpollard left a comment

jrasm91 left a comment

feat(ml): composable ml #9973

feat(ml): composable ml #9973

Conversation

mertalev commented Jun 4, 2024 • edited Loading

Description

zackpollard left a comment

Choose a reason for hiding this comment

jrasm91 left a comment

Choose a reason for hiding this comment

mertalev commented Jun 4, 2024 •

edited

Loading