Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Implement Image Loading Function for Image Search and CLIP Support #3152

Open
mingshl opened this issue Oct 23, 2024 · 8 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@mingshl
Copy link
Collaborator

mingshl commented Oct 23, 2024

Is your feature request related to a problem?
To support CLIP model and image search, We need to implement a function in the Connector level that can load images from URLs or file path similar to using PIL (Python Imaging Library).

This function should support image search capabilities and be compatible with CLIP (Contrastive Language-Image Pre-training) for advanced image-text understanding

What solution would you like?
Similar to:

from PIL import Image
import requests

from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)


with the image loading, we can use the image as model input for clip model to execute prediction

inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)
logits_per_image = outputs.logits_per_image  # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1)  # we can take the softmax to get the label probabilities

Objectives:

  • Create a function that takes a URL as input and returns a PIL Image object.
  • Ensure the function can handle various image formats (JPEG, PNG, etc.).
  • Implement error handling for invalid URLs or unsupported image types.
  • Optimize the function for performance, considering potential high-volume usage in image search scenarios.
  • Ensure compatibility with CLIP for further processing and analysis.

Acceptance Criteria:

  • The function successfully loads images from valid URLs.
  • It properly handles errors for invalid URLs or unsupported image types.
  • The loaded images are compatible with our image search pipeline.
  • The function's output can be directly used with CLIP models.
  • Performance tests show the function can handle high-volume requests efficiently.
  • Code is well-documented and follows our coding standards.
  • Unit tests are implemented to cover various scenarios (successful loads, error cases, etc.).

Related issue
##3054

@mingshl mingshl added enhancement New feature or request untriaged labels Oct 23, 2024
@mingshl mingshl assigned mingshl and unassigned mingshl Oct 23, 2024
@mingshl
Copy link
Collaborator Author

mingshl commented Oct 23, 2024

There is an implemented method in connector level toString() method which will convert list/map and other data type to String. This feature can call loadImage(). Please see this PR as reference #2871

@brianf-aws
Copy link
Contributor

Hi, this looks interesting could I be assigned this please?

@dhrubo-os
Copy link
Collaborator

Just a heads up, we might need to talk with Security about this with the implementation plan.

@brianf-aws
Copy link
Contributor

Just a heads up, we might need to talk with Security about this with the implementation plan.

Yeah I was talking to @ylwu-amzn who mentioned that its a security issue to have users download from an external site. We may need to have some sort of design review to see ways to defensively implement this feature.

@brianf-aws
Copy link
Contributor

Created a ticket with Security to get their advice. Currently we talked to Flow Framework about this and they understood that something like downloading a url within ML-Commons is probably not to be approved to security.

@dblock dblock removed the untriaged label Nov 11, 2024
@dblock
Copy link
Member

dblock commented Nov 11, 2024

[Catch All Triage - 1, 2, 3, 4]

@brianf-aws
Copy link
Contributor

brianf-aws commented Nov 19, 2024

Hey everyone talked with security and they mentioned that this would not likely to pass, it would be better off that the client converts the image to base64 and that we provide validation. What we can do now is start these phases

  1. (Accept base64 string validation) client sends base64 image string
  2. (Accept url on safe endpoints) We would only consider this if we really need this functionality. Because security mention its possible to spoof a regex endpoint to point to a malicious url.

They also mentioned that in addition to a malicious script its possible they send over a big file over and stall ML-Commons from doing anything else.

@brianf-aws
Copy link
Contributor

brianf-aws commented Dec 9, 2024

For the loading image from url we have to put on pause as it has security concern.

Currently ML Commons supports for base64 strings that are used to invoke multi-modal models. Full end to end workflows were done to see that this is possible, without downloading from URL

@mingshl can we close this RFC?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In Progress
Development

No branches or pull requests

4 participants