NumPyCLIP

This is a pure NumPy implementation of OpenAI's CLIP neural network.

You can use NumPyCLIP to embed images or text as a 512-dimensional feature vectors. The cosine similarity between feature vectors should be high when a text corresponds to an image.

For example, the image data/CLIP.png shows a diagram. When embedding the texts "a diagram", "a dog" and "a cat" as feature vectors, the similarity of the image feature vector to the text feature vector for the text "a diagram" will be largest.

Possible applications include image classification, image captioning, visual question answering, text-based image search, image-based text search and image-filtering.

Example

import numpyclip
import numpy as np
from PIL import Image

model, preprocess = numpyclip.load("ViT-B/32")

image = preprocess(Image.open("data/CLIP.png"))[np.newaxis, :, :, :]

text = numpyclip.tokenize(["a diagram", "a dog", "a cat"])

image_features = model.encode_image(image)
text_features = model.encode_text(text)

logits_per_image, logits_per_text = model(image, text)
probs = numpyclip.softmax(logits_per_image, axis=-1)

print("Label probs:", probs)  # prints: [[0.99279356 0.00421067 0.00299573]]

Install dependencies, download and run on Debian/Ubuntu

sudo apt update
sudo apt install python3 python3-pip git
pip3 install numpy pillow
git clone --depth 1 https://github.com/99991/NumPyCLIP.git
cd NumPyCLIP
python3 example.py
python3 tests.py

This will install Python, git, NumPy and Pillow (for image loading). Once the dependencies are installed, it will download NumPyCLIP and run example.py. The first time, the file ~/.cache/clip/ViT-B-32.pt (337.6 MiB) will be downloaded, which may take a few minutes.

By default, the model weights will be downloaded to ~/.cache/CLIP, but you can also specify the directory with the CLIP_DIR environment variable:

# Download weights to "my/weights/directory"
CLIP_DIR=my/weights/directory python3 tests.py

Limitations

NumPyCLIP is slower than the original PyTorch implementation if you have a powerful GPU.
To reduce dependencies, ftfy.fix_text has been removed from the tokenization step. This may cause differences when running NumPyCLIP on badly formatted text.
The preprocessing of the image might differ from the official implementation by an offset of 1 pixel or so.
So far, only the ViT-B/32 model has been ported.
This library has not been tested much yet.
During pre-processing, the input image is resized to 224x224 and center-cropped. For best results, make sure that all important content is in the centre of the image and of a reasonable size so that it is not lost when the image is scaled down.

TODO

Implement other models
Package for PyPi

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.py		example.py
numpyclip.py		numpyclip.py
simple_tokenizer.py		simple_tokenizer.py
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NumPyCLIP

Example

Install dependencies, download and run on Debian/Ubuntu

Limitations

TODO

About

Releases

Packages

Languages

License

99991/NumPyCLIP

Folders and files

Latest commit

History

Repository files navigation

NumPyCLIP

Example

Install dependencies, download and run on Debian/Ubuntu

Limitations

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages