Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to import fitz on AWS lambda #430

Closed
gavinLow8128 opened this issue Jan 17, 2020 · 30 comments
Closed

Failed to import fitz on AWS lambda #430

gavinLow8128 opened this issue Jan 17, 2020 · 30 comments
Assignees
Labels

Comments

@gavinLow8128
Copy link

I am trying to develop a pdf to image serverless function by AWS lambda.
The import statement is import fitz

Howerver, I got the following error when triggering the lambda function.

[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': cannot import name '_fitz' from partially initialized module 'fitz' (most likely due to a circular import) (/var/task/fitz/__init__.py)

Thank you very much!
I am using python 3.8 ,PyMuPDF-1.16.10

@JorjMcKie
Copy link
Collaborator

Hm, never seen this error before, and I also did not understand what you are actually trying to do.
But the error looks like you are import fitz from within the installation folder of PyMuPDF (where __init__.py lives). This can never work.

@gavinLow8128
Copy link
Author

Thank you for the quickly response.
I tried to install PyMuPDF by following the instruction listed in the following link.
AWS Lambda Deployment Package in Python

It seems that PyMuPDF and my program(app.py, the python script that I imported fitz) will be placed together.
Is it the main reason for causing the error?
Please let me know if there is any solution

Screenshot 2020-01-20 at 6 18 45 PM

Thank you very much

@JorjMcKie
Copy link
Collaborator

Hm, actually not. The above structure works if executed locally on your computer - just tried it out (of course, the dist-info folder is not required). The imported fitz is confirmed to be taken from the fitz subfolder next to app.py.

So the problem must be how AWS Lambda supports this type of thing. I am no user, so do not know anything about it.

Code – The code and dependencies of your function. For scripting languages, you can edit your function code in the embedded editor. To add libraries, or for languages that the editor doesn't support, upload a deployment package. If your deployment package is larger than 50 MB, choose Upload a file from Amazon S3.

This quotation from AWS Lambda websites seems suggests that you must upload PyMuPDF as a deployment package. Did you do that?

@JorjMcKie
Copy link
Collaborator

Also have a look at this:

Note:
For libraries that use extension modules written in C or C++, build your deployment package in an Amazon Linux environment. You can use the SAM CLI build command, which uses Docker, or build your deployment package on Amazon EC2 or AWS CodeBuild.

PyMuPDF falls under this category ...

@gavinLow8128
Copy link
Author

Problem is solved by deploying the package through AWS codeBuild.
Thank you for your help!

@ale-de-vries
Copy link

@gavinLow8128 running into the same issue. Is it possible to share your steps for deploying the package through AWS codeBuild? Thanks!

@gavinLow8128
Copy link
Author

gavinLow8128 commented Apr 7, 2020

@ale-de-vries

Step 1: Go to CodeCommit and "create repository", then upload your project to CodeCommit.

Step 2: Create a file called "buildspec.yml". Here is my buildspec.yml for your reference.
Screenshot 2020-04-08 at 2 17 43 AM

Step 3: Go to CodeBuild and "Create Build Project". You may watch this youtube video as a reference for how to complete the project configuration.
https://www.youtube.com/watch?v=6YQFcd_z4gk

Step 4: After completing the configuration, "start build" the project.

Step 5: If the project is built successfully, go to S3 bucket. CodeBuild will upload the artifact file and put it into your s3Bucket. Find it and copy the "Object URL"

Step 6: Go to your lambda Function. Select "Upload a file from Amazon S3" for "Code entry type" and paste the "Object URL" to "Amazon S3 link URL". Then "save".
Screenshot 2020-04-08 at 2 37 38 AM

I hope it helps you.
Please let me know if you have any other questions.

@knightfall
Copy link

Hi @gavinLow8128 ,

I tried your solution but I am still getting the same error as you. Any idea what am I doing wrong?
Will you be able to share your folder structure?

@jmac105
Copy link

jmac105 commented Jul 17, 2020

For anyone else that ends up here with the same problem:

You can use lambda layers to provide the pymupdf dependency, instead of building it into the deployment package. There's already a maintained layer for it, see here: https://github.com/keithrozario/Klayers

That way you don't need to bother with building on amazon linux, just reference the arn of the layer you need in your lambda function creation (region specific) and import as normal.

@anilomanwar
Copy link

anilomanwar commented Nov 26, 2020

I am facing the same issue for IBM Cloud Function (similar to AWS Lambda)

I have used same build step
"pip install PyMuPDF -t ." in deployment step and can see folder structure
mentioned in #430 (comment)

In My code,

import fitz

and getting below error -
"2020-11-26T07:23:33.653912Z stderr: Traceback (most recent call last):",
"2020-11-26T07:23:33.653966Z stderr: File "exec__.py", line 42, in ",
"2020-11-26T07:23:33.653971Z stderr: from main__ import main as main",
"2020-11-26T07:23:33.653976Z stderr: File "/action/1/bin/main__.py", line 30, in ",
"2020-11-26T07:23:33.653980Z stderr: import fitz",
"2020-11-26T07:23:33.653984Z stderr: File "/action/1/bin/fitz/init.py", line 3, in ",
"2020-11-26T07:23:33.653988Z stderr: from fitz.fitz import *",
"2020-11-26T07:23:33.653992Z stderr: File "/action/1/bin/fitz/fitz.py", line 17, in ",
"2020-11-26T07:23:33.653996Z stderr: from . import _fitz",
"2020-11-26T07:23:33.654001Z stderr: ImportError: cannot import name '_fitz' from 'fitz' (/action/1/bin/fitz/init.py)",
"2020-11-26T07:23:33.785866Z stderr: Command exited abruptly during initialization.",
"2020-11-26T07:23:33.786Z stderr: The action did not initialize or run as expected. Log data might be missing."

I tried 2-3 ways doing this but getting same issue
like
"pip install PyMuPDF"
"pip install PyMuPDF==1.16.10 -t ."
"pip install PyMuPDF==1.18.10 -t ."

I am using other packages like pypdf, pdfminer using same way and they are working fine but not this one..

Not got any issues during build step only getting issue for import statement.

@MisterMahuron
Copy link

For anyone else that ends up here with the same problem:

You can use lambda layers to provide the pymupdf dependency, instead of building it into the deployment package. There's already a maintained layer for it, see here: https://github.com/keithrozario/Klayers

That way you don't need to bother with building on amazon linux, just reference the arn of the layer you need in your lambda function creation (region specific) and import as normal.

@jmac105 Any idea how this individual got it to work? I have PyMuPDF as a package in my layer but I am still getting the exact same error as the individual who opened this ticket. All other packages in my layer are importing correctly. Any help would be greatly appreciated. I also appreciate the arn repo but am hoping to avoid using this if at all possible.

@jmac105
Copy link

jmac105 commented Jan 13, 2021

For anyone else that ends up here with the same problem:
You can use lambda layers to provide the pymupdf dependency, instead of building it into the deployment package. There's already a maintained layer for it, see here: https://github.com/keithrozario/Klayers
That way you don't need to bother with building on amazon linux, just reference the arn of the layer you need in your lambda function creation (region specific) and import as normal.

@jmac105 Any idea how this individual got it to work? I have PyMuPDF as a package in my layer but I am still getting the exact same error as the individual who opened this ticket. All other packages in my layer are importing correctly. Any help would be greatly appreciated. I also appreciate the arn repo but am hoping to avoid using this if at all possible.

I'd recommend contacting maintainer of that repo and try asking them, but it looks like they are using severless framework to build the layers. I do believe that you need to build your layer on the same OS as it will run in on lambda (amazon linux or amazon linux 2 depending on python version).

@thematheusgomes
Copy link

Hey guys,

I have the same issue deploy with serverless framework

I tryed to create a layer, but the issue still the same.

This is the error I'm getting on lambda console:

image

@amiantos
Copy link

amiantos commented Jun 9, 2021

Just in case it helps anyone, I was using Fitz in Lambda just fine for the past several months (I automate the build this way) under the Python 3.6 runtime. When I switched to the Python 3.8 run time, I started getting this import error. I switched back to 3.6 and everything is working fine again.

@carlosgaonad
Copy link

carlosgaonad commented Jul 10, 2021

Hi everyone

I had the same issue deploying my lambda.

I tried many ways to solved this problem but the only solution was using a vitual machine with python 3.7 and install PyMuPDF.

The next steep was download the library from your virtual machine. And Then create your zip file using that library

image

probably this file is to heavy < _fitz.cpython-37m-x86_64-linux-gnu.so > but is necesary.

This method finally worked for me!!

@Konstantina-Paraskevopoulou
Copy link

Konstantina-Paraskevopoulou commented Aug 4, 2021

I had a similar problem when I was trying to import some tensorflow probability modules like below:
import tensorflow_probability as tfp
tfp = tfp.substrates.numpy
tfd = tfp.distributions

At least for me, I realized that the problem was not related to lambda but it was a Python circular import error. Have a look at this:
https://stackabuse.com/python-circular-imports

Changing the position of the imports solved my issue. Basically I switched the tfp and tfd
import tensorflow_probability as tfp
tfd = tfp.distributions
tfp = tfp.substrates.numpy

@VanntheRed
Copy link

I'm stuck on the same problem for a python sftp package that requires paramiko. I compiled a package and tested it on a windows EC2 instance without issue. When I then tried to make it a lambda to see if I could accomplish the task serverlessly I got the same error about non-native packages. I'm trying to recreate the CodeCommit/CodeBuild solution but I'm getting an error with the buildspec.yaml: Phase context status code: YAML_FILE_ERROR Message: mapping values are not allowed in this context at line 2

My buildspec.yaml is:
version: 0.1

phases:
install:
runtime-versions:
python: 3.8
pre_build:
commands:
build:
commands:
- echo Compiling the python code ...
- pip install paramiko==2.7.2 -t .
post-build:
commands:
artifacts:
files:
- '**/*'

I'm not certain if the problem is the yaml (it passed a yaml parser) or the contents of my CodeCommit. All I have there is my python script and the yaml document. Does a download of the package need to be there?

TIA,
VtR

@Ricardomol
Copy link

Ricardomol commented Aug 16, 2021

I'm getting this exact error as well:

Runtime.ImportModuleError: Unable to import module 'lambda': cannot import name '_fitz' from partially initialized module 'fitz' (most likely due to a circular import) (/var/task/fitz/__init__.py) 

I'm already using KLayers to generate the Layer.

This is what my zip file contains:
Screenshot 2021-08-16 at 19 09 39

Anyone else who made it work with KLayers, give us more details please.

@RodPienaar
Copy link

If you use pip to install a python package locally, which contains compiled code, the package (wheel) that is downloaded may not be compatible with AWS Lambda (in my case it was mac rather than linux). So if you deploy this locally installed file to your lambda this will cause the error, fitz not found, when you run your lambda, even if your code works locally. This will be the case with any binary package, not just fitz.

Lambdas need a linux compatible binary. As noted in some of the answers above the solution is to package the binary and load it as a lambda layers. This is easy to do in three steps that worked for me:

  1. download the relevant binary, some aws advice here. Basically you need to unzip the relevant wheel file.
  2. re-package the binary in a zip file with the right structure. See this stackoverflow answer
  3. create a layer and upload the zip file using this aws page.

@MiniMarvin
Copy link

For anyone else that ends up here with the same problem:

You can use lambda layers to provide the pymupdf dependency, instead of building it into the deployment package. There's already a maintained layer for it, see here: https://github.com/keithrozario/Klayers

That way you don't need to bother with building on amazon linux, just reference the arn of the layer you need in your lambda function creation (region specific) and import as normal.

This answer seems to be the best one up to now, it works currently, to anyone still receiving the erro [ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': cannot import name '_fitz' from partially initialized module 'fitz' (most likely due to a circular import) (/var/task/fitz/__init__.py) using this solution, the proper solution is to use the python3.8 lambda environment My function definition is as follow:

parse_pdf:
    runtime: python3.8
    handler: pdfparse/pdfparse/handler.pdf_parse
    layers:
      - arn:aws:lambda:${self:provider.region}:770693421928:layer:Klayers-p38-PyMUPDF:1

It works just fine after defining it this way

@anai-s
Copy link

anai-s commented Oct 27, 2022

If you use pip to install a python package locally, which contains compiled code, the package (wheel) that is downloaded may not be compatible with AWS Lambda (in my case it was mac rather than linux). So if you deploy this locally installed file to your lambda this will cause the error, fitz not found, when you run your lambda, even if your code works locally. This will be the case with any binary package, not just fitz.

Lambdas need a linux compatible binary. As noted in some of the answers above the solution is to package the binary and load it as a lambda layers. This is easy to do in three steps that worked for me:

  1. download the relevant binary, some aws advice here. Basically you need to unzip the relevant wheel file.
  2. re-package the binary in a zip file with the right structure. See this stackoverflow answer
  3. create a layer and upload the zip file using this aws page.

I'm working on mac and it works perfectly. For those who need an example of how to install the package you can try this:
pip install pyMUPDF --upgrade --only-binary=:all: --platform manylinux_2_17_x86_64 --python-version 38

@pschlank
Copy link

Hey all, @anai-s answer is right on point. I ran this command in the terminal:

pip install \ --platform manylinux2014_x86_64 \ --target=/Users/schlank/Documents/Code/pythonlayers/upload/python \ --implementation cp \ --python 3.9 \ --only-binary=:all: --upgrade \ --ignore-installed \ PyMuPDF

Some additional detail. Make sure you:

  1. Take note of your target directory
  2. It's key you put it inside a folder named "python"
  3. Compress the folder named "python" into a .zip
  4. Go to Lambdas > Layers in your AWS console
  5. Create a new layer
  6. Select the x86_64 architecture and python 3.9 compatibility
  7. Add your new layer arn to your serverless yaml (or attach the layer the lambda itself in the console)

Then you're to go.

@ecumene
Copy link

ecumene commented Feb 13, 2023

For those trying to get fitz working for Lambda using the python3.9 runtime, and you're on an M1 Mac... Try installing them with docker. This worked for me:

I put them in a folder named requirements/python. Then I zip that up for the layer

mkdir -p requirements/python;
docker run \
  -v "$(pwd)":/var/task "public.ecr.aws/sam/build-python3.9" \
  /bin/sh -c "yum install -y mysql-devel && \
  pip install -r requirements.txt  --only-binary=:all: --platform manylinux_2_17_x86_64 -t requirements/python; \
  exit";

@RaviWittyBrains
Copy link

I'm encountering an issue while attempting to add the Fitz library (PyMuPDF) to a Lambda layer. The error message I'm getting is:

{
  "errorMessage": "Unable to import module 'lambda_function': cannot import name '_fitz' from partially initialized module 'fitz' (most likely due to a circular import) (/opt/python/lib/python3.11/site-packages/fitz/__init__.py)",
  "errorType": "Runtime.ImportModuleError",
}

Function Logs
[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': cannot import name '_fitz' from partially initialized module 'fitz' (most likely due to a circular import) (/opt/python/lib/python3.11/site-packages/fitz/__init__.py)

I'm seeking guidance on successfully utilizing the Fitz library within a Lambda function. Below is the Lambda code snippet:

import fitz

def lambda_handler(event, context):
    try:
        print("Hello World")
    except Exception as e:
        print(f"Error in fileToTextract: {str(e)}")

Is there anyone who has successfully integrated this library into their Lambda and can offer advice on resolving this issue?

@JorjMcKie
Copy link
Collaborator

This may help you.

@egill
Copy link

egill commented Feb 19, 2024

Not sure it will help everyone, but this works for me with the latest PyMuPDF (1.3.23) and Python 3.12:

import fitz_old as fitz

I tracked my problems to the recent rebase in PyMuPDF, and using the pre-rebased version worked like a charm for me.

It's not the best solution but works for now.

@Elliotmrgn
Copy link

Elliotmrgn commented Mar 26, 2024

Worked for me when I added it as a layer. As mentioned before, the key is to install a compatible version and use the correct path in your .zip file.

pip install \
--platform manylinux2014_x86_64 \
--target=./python/lib/python3.12/site-packages \
--implementation cp \
--python-version 3.12 \
--only-binary=:all: --upgrade \
pymupdf

Then zip the code to upload as a layer. The file structure will look like:

my_layer.zip
└── python/
    └── lib/
        └── python3.12/
            └── site-packages/
                └── fitz/

@Jacer7
Copy link

Jacer7 commented Mar 31, 2024

No need of layer, no need of CodeBuild.....!!!!
This problem exists because the library fitz is written in C / C++ layering with python and it has specific set of architecture with specific OS.
So, we need to build our deployment_package.zip specifying all these.
Here is an article about it. I am sure it will solve all the pain above that I've seen here 😄
https://medium.com/@jayshwor.khadka/lambda-deployment-package-with-dependencies-and-local-built-distribution-wheels-with-different-affe82b982fa

@jacksonkasi1
Copy link

No need of layer, no need of CodeBuild.....!!!! This problem exists because the library fitz is written in C / C++ layering with python and it has specific set of architecture with specific OS. So, we need to build our deployment_package.zip specifying all these. Here is an article about it. I am sure it will solve all the pain above that I've seen here 😄 https://medium.com/@jayshwor.khadka/lambda-deployment-package-with-dependencies-and-local-built-distribution-wheels-with-different-affe82b982fa

Thanks, it's work for me :)

@Jacer7
Copy link

Jacer7 commented Sep 18, 2024

@jacksonkasi1 Glad the solution proposed worked for you :D !! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests