Skip to content
This repository has been archived by the owner on Sep 13, 2023. It is now read-only.

Requirements aren't found when using FunctionTransformer #666

Open
aguschin opened this issue Apr 27, 2023 · 1 comment
Open

Requirements aren't found when using FunctionTransformer #666

aguschin opened this issue Apr 27, 2023 · 1 comment
Assignees
Labels
bug Something isn't working customer Request from customer requirements Finding requirements and dependencies needed to properly serialize objects

Comments

@aguschin
Copy link
Contributor

aguschin commented Apr 27, 2023

When using sklearn's Pipeline with FunctionTransformer step, some requirements that are used in that step aren't found.

  • if function from the custom library passed to FunctionTransformer, and MLEM can't find that library.
  • if decorator func for logging (azure) is used - MLEM doesn't always find that dependency.

I have an example from a customer, so now I need to reproduce and investigate. From the top of my head, 2 solutions:

  1. Allow to pass in mlem.api.save (list of) libs to be added as requirements
  2. Enable "deep inspection mode" with an option for mlem.api.save
  3. Fixing both cases, keeping the same mechanics in place
@aguschin aguschin added bug Something isn't working requirements Finding requirements and dependencies needed to properly serialize objects customer Request from customer labels Apr 27, 2023
@aguschin aguschin moved this to In Progress in DVC May 24, 2023
@aguschin aguschin added this to DVC May 24, 2023
@aguschin aguschin self-assigned this May 24, 2023
@aguschin
Copy link
Contributor Author

aguschin commented May 24, 2023

Ok, this is an instruction how to make it work and find these requirements. First, let's see a script to reproduce the problem:

# run.py
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer
from func import f
from mlem.api import save

pipe = make_pipeline(
    FunctionTransformer(
        f,
    ),

)

save(pipe, "pipeline", sample_data=0)

This is imports f from func:

# func.py
def f(x):
    return x

Now, if you create both files and run python run.py, you'll see pipeline.mlem doesn't have other dependencies except for sklearn:

# pipeline.mlem
...
requirements:
- module: sklearn
  package_name: scikit-learn
  version: 1.1.1

there's a way to find those now, using DEEP_INSPECTION setting in MLEM config.

E.g. If you run MLEM_DEEP_INSPECTION=true python run.py, they're found:

# pipeline.mlem
...
requirements:
- module: numpy
  version: 1.23.5
- module: sklearn
  package_name: scikit-learn
  version: 1.1.1
- is_package: false
  module: func
  name: func
  source64zip: eJxLSU1TSNOo0LTiUgCCotSS0qI8hQouAE2bBoU=
  type: custom
- module: scipy
  version: 1.9.3

You can also set this as a variable for the MLEM project, so MLEM will pick it up automatically, so you don't need to set it via shell vars like in the example above:

$ mlem init
$ mlem config set core.DEEP_INSPECTION true

This will generate .mlem.yaml config file with this setting set.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working customer Request from customer requirements Finding requirements and dependencies needed to properly serialize objects
Projects
No open projects
Archived in project
Development

No branches or pull requests

1 participant