A simple script and service for running an image through third party computer vision services that extract faces, text, colors, and tags.
- Python 3.9.*
- Virtualenv
> git clone https://github.com/harvardartmuseums/data-grinder.git
We recommend creating a virtual environment with Virtualenv and running everything within it.
> cd data-grinder
> virtualenv venv
> venv\Scripts\activate.bat
> pip install -r requirements.txt
- Clone .env-template to .env
- Open .env in a text editor.
- Enter API keys and credentials for the services you want to use.
- Google Vision requires a oauth credentials for authentication. Generate a sevice account key through your Google account API dashboard. Read more at https://cloud.google.com/vision/docs/common/auth. Paste the key as json into the .env file.
HAM Color Service: extract colors
Hashing: compute average, color, perceptual, difference, wavelet hashes
Clarifai: tag features, extract colors
Imagga: tag features, extract colors, categorize
Google Vision: tag features, find faces, find text
Microsoft Cognitive Services: categories, tags, description, faces, color
AWS Rekognition: labels, faces, text
OpenAI GPT-4 on Azure: description
OpenAI GPT-4o on Azure: description
Anthropic Claude 3 Haiku on AWS Bedrock: description
Anthropic Claude 3 Opus on AWS Bedrock: description
Anthropic Claude 3.5 Sonnet on AWS Bedrock: description
Anthropic Claude 3.5 Sonnet v2 on AWS Bedrock: description
Meta Llama 3.2 11b on AWS Bedrock: description
Meta Llama 3.2 90b on AWS Bedrock: description
Run as a script from the command line:
$ python main.py -url https://some.image/url
OR
Run as a local service:
$ flask run
Then open a browser to http://127.0.0.1:5000/extract
Parameters | Values | |
---|---|---|
url | Any Harvard NRS URL that resolves to a IIIF compatible image | |
services | (optional, default uses all services) One or more from the list of valid services separated by spaces | imagga, gv, mcs, clarifai, color, aws, hash, openai, claude, gpt-4, gpt-4o, claude-3-haiku, claude-3-opus, claude-3-5-sonnet, claude-3-5-sonnet-v-2, llama-3-2-11b, llama-3-2-90b |
Example response:
{
"lastupdated": "2018-02-07 21:58:28",
"drsstatus": "ok",
"width": 581,
"height": 768,
"widthFull": 775,
"heightFull": 1024,
"scalefactor": 1.333907056798623,
"url": "https://nrs.harvard.edu/urn-3:huam:75033B_dynmc",
"iiifbaseuri": "https://ids.lib.harvard.edu/ids/iiif/14178676",
"idsid": "14178676",
"colors": [],
"hashes": {},
"clarifai": {},
"microsoftvision": {},
"googlevision": {},
"imagga": {},
"aws": {},
"openai": {},
"gpt-4": {},
"gpt-4o": {},
"claude": {},
"claude-3-haiku": {},
"claude-3-opus": {},
"claude-3-5-sonnet": {},
"claude-3-5-sonnet-v-2": {},
"llama-3-2-11b": {},
"llama-3-2-90b": {}
}
Data in the response:
url
: The original image URL passed in to this script. This must be in the form of a NRS URL for an image in the Harvard DRS.
drsstatus
: The response from the DRS. The value will be "ok" or "bad".
width
: The width in pixels of the image supplied on the command line.
height
: The height in pixels of the image supplied on the command line.
widthFull
: The width in pixels of the full image file in the DRS.
heightFull
: The height in pixels of the full image file in the DRS.
scalefactor
: The propotional difference between teh supplied image and the full image. In the example response, the full image is 1.334 times larger than the supplied image.
iiifbaseuri
: A fully formed IIIF URI for the image in the DRS.
idsid
: The image file ID in the DRS returned when requesting the original image URL.
colors
:
hashes
:
clarifai
:
microsoftvision
:
googlevision
:
imagga
:
aws
:
openai
:
gpt-4
:
gpt-4o
:
claude
:
claude-3-haiku
:
claude-3-opus
:
claude-3-5-sonnet
:
claude-3-5-sonnet-v-2
:
llama-3-2-11b
:
llama-3-2-90b
: