Skip to content

Commit

Permalink
Merge pull request #50 from jina-ai/readme-update-v005
Browse files Browse the repository at this point in the history
docs: update readme
  • Loading branch information
ZiniuYu authored Jun 13, 2023
2 parents cbebaa0 + 036ddcd commit 29a95af
Showing 1 changed file with 47 additions and 120 deletions.
167 changes: 47 additions & 120 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,19 @@
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/inference-client)](https://pypi.org/project/inference-client/)
[![PyPI - License](https://img.shields.io/pypi/l/inference-client)](https://pypi.org/project/inference-client/)

Inference Client is a library that provides a simple and efficient way to use Jina AI's Inference, a powerful platform that offers a range of AI models for common tasks such as visual reasoning, question answering, and embedding modalities like texts and images.
With Inference Client, you can easily select the task and model of your choice and integrate the API call into your workflow with zero technical overhead.
Inference-Client is a Python library that allows you to interact with the [Jina AI Inference](https://cloud.jina.ai/user/inference).
It provides a simple and intuitive API to perform various tasks such as image captioning, encoding, ranking, visual
question answering (VQA), and image upscaling.

The current version of Inference Client includes methods to call the following tasks:

📷 **Caption**: Generate captions for images

📈 **Encode**: Encode data into embeddings using various models

🔍 **Rank**: Re-rank cross-modal matches according to their joint likelihood

📷 **Caption**: Generate captions for images
🆙 **Upscale**: Increasing the resolution while preserving the quality and details

🤔 **VQA**: Answer questions related to images

Expand All @@ -32,179 +35,103 @@ pip install inference-client

## Getting Started

Before using Inference Client, please create an inference on [Jina AI Cloud](https://cloud.jina.ai/user/inference).
Before using the Inference-Client, please create an inference on [Jina AI Cloud](https://cloud.jina.ai/user/inference).

After login with your Jina AI Cloud account, you can create an inference by clicking the "Create" button in the inference page.
From there, you can select the model you want to use.

After the inference is created and the status is "Serving", you can use Inference Client to connect to it.
After the inference is created and the status is "Serving", you can use the Inference-Client to connect to it.
This could take a few minutes, depending on the model you selected.

<p align="center">
<img src=".github/README-img/jac.png" width="100%">
</p>

### Client Initialization

To use Inference Client, you need to initialize a Client object with the authentication token of your Jina AI Cloud account:
To use the Inference-Client, you first need to import the `Client` class and create a new instance of it.

```python
from inference_client import Client

client = Client(token='<your auth token>')
```

The token can be generated at the [Jina AI Cloud](https://cloud.jina.ai/settings/tokens), or via CLI as described in [this guide](https://docs.jina.ai/jina-ai-cloud/login/#create-a-new-pat):
You will need to provide your access token when creating the client. The token can be generated at the [Jina AI Cloud](https://cloud.jina.ai/settings/tokens), or via CLI as described in [this guide](https://docs.jina.ai/jina-ai-cloud/login/#create-a-new-pat):
```bash
jina auth token create <name of PAT> -e <expiration days>
```


### Connecting to Models

Once you have initialized the Client object, you can connect to the models you want to use by calling the `get_model` method, which takes the name of the model as it appears in Jina AI Cloud as an argument.
You can then use the `get_model` method of the `Client` object to get a specific model.

```python
# connect to a CLIP model
model = client.get_model('ViT-B-32::openai')
model = client.get_model('<model of your selection>')
```
As example, the above code connects to the CLIP model named "ViT-B-32::openai" on Jina AI Cloud.

You can connect to as many inference models as you want once they have been created on Jina AI Cloud, and you can use them for multiple tasks.


### Performing tasks
## Performing tasks

Now that you have connected to the models, you can use them to perform the tasks they support.

### Image Captioning

#### 1. Encoding
The `caption` method of the `Model` object takes an image as input and returns a caption as output.

The encode task is used to encode data into embeddings using various models.
For example, you can use the CLIP model to encode text or images into embeddings:

```python
model = client.get_model(
'<name of the model that supports encode>'
) # e.g. ViT-B-32::openai

# encode text
result = model.encode(text='hello world')

# encode image
result = model.encode(image='hello_world.jpg')

# encode image RGB tensor
from PIL import Image
from numpy import asarray

image_bytes = Image.open('hello_world.jpg')
image_tensor = asarray(image_bytes)
result = model.encode(image=image_tensor)
image = 'path/to/image.jpg'
caption = model.caption(image=image)
```

The output of the encode method is a DocumentArray, which contains the embeddings of the input data.
### Encoding

```bash
# print(result[0].embedding)
The `encode` method of the `Model` object takes text or image data as input and returns an embedding as output.

```python
text = 'a sentence describing the beautiful nature'
embedding = model.encode(text=text)

[-5.48706055e-02 -1.10717773e-01 5.13671875e-01 -3.22509766e-01
-1.40380859e-01 6.23535156e-01 3.07617188e-01 4.26025391e-01
...
8.04443359e-02 8.53515625e-01 -5.96008301e-02 3.61633301e-02]
# OR
image = 'path/to/image.jpg'
embedding = model.encode(image=image)
```

### 2. Ranking
### Ranking

To perform similarity-based ranking of candidate matches, you can use the `rank` method of an inference model.
The rank method takes a reference input and a list of candidates, and reorder that list of candidates based on their similarity to the reference input.
You can also construct a cross-modal Document where the root contains an image or text and `.matches` contain images or sentences to rerank.
The `rank` method of the `Model` object takes a text or image data as query and a list of candidates as input and returns a list of reordered candidates as well as their scores as output.

```python
# Connect to a model
model = client.get_model('<name of the model that supports rank>')

reference = 'singapore.jpg'
candidates = [
'a colorful photo of nature',
'a photo of blue scenery',
'a black and white photo of a cat',
'an image about dogs',
'an image about cats',
'an image about birds',
]
response = model.rank(reference=reference, candidates=candidates)

# Access the matches
for match in not response[0]:
print(match.text)
image = 'path/to/image.jpg'
result = model.rank(image=image, candidates=candidates)
```

```bash
a photo of blue scenery
a colorful photo of nature
a black and white photo of a cat
```
You may also input images as bytes or tensors similarly to the encode method.

**NOTICE**: The following tasks Caption and VQA are BLIP2 exclusive. Calling these methods on other models will fall back to the default encode method.

### 3. Captioning

You can use caption method to generate natural language descriptions of images.
### Image Upscaling


The caption method takes a DocumentArray containing images or a single plain image as input.
The plain input image can be in the form of a URL string, an image blob, or an image tensor.

For example, you can use the BLIP2 model to generate captions for images:
The `upscale` method of the `Model` object takes an image and optional configurations as input, and returns the upscaled image bytes as output.

```python
# Initialize model
model = client.get_model('Salesforce/blip2-opt-2.7b')

response = model.caption(image='singapore.jpg')

# Access the captions
print(response[0].tags['response'])
```

```bash
the merlion fountain in singapore at night
image = 'path/to/image.jpg'
result = model.upscale(image=image, output_path='upscaled_image.png', scale='800:600')
```

### Visual Question Answering (VQA)

### 4. VQA (Visual Question Answering)

Visual Question Answering (VQA) is a task that involves answering natural language questions about visual content such as images.
Given an image and a question, the goal of VQA is to provide a natural language answer.
The VQA method takes either a DocumentArray of images and questions, or a single plain image and question.

The `vqa` method of the `Model` object takes an image and a question as input and returns an answer as output.

```python
# Initialize model
model = client.get_model('Salesforce/blip2-opt-2.7b')

image = 'singapore.jpg'
question = 'Question: What is this photo about? Answer:'

response = model.vqa(image=image, question=question)

# Access the answers
print(response[0].tags['response'])
```

```bash
the merlion fountain in singapore
image = 'path/to/image.jpg'
question = 'Question: What is the name of this place? Answer:'
answer = model.vqa(image=image, question=question)
```

## Documentation
## Advanced Usage

For more information about advanced usage of Inference Client, please refer to the [documentation](https://jina.readme.io/docs/inference).
In addition to the basic usage, the Inference-Client also supports advanced features such as handling DocumentArray inputs, customizing the task parameters, and more.
Please refer to the [official documentation](https://jina.readme.io/docs/inference) for more details.

## Support

- Join our [Slack community](https://slack.jina.ai) and chat with other community members about ideas.
- Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
- Watch our [Engineering All Hands](https://youtube.com/playlist?list=PL3UBBWOUVhFYRUa_gpYYKBqEAkO4sxmne) to learn Jina's new features and stay up-to-date with the latest AI techniques.
- Subscribe to the latest video tutorials on our [YouTube channel](https://youtube.com/c/jina-ai)

## License

Inference Client is backed by [Jina AI](https://jina.ai) and licensed under [Apache-2.0](./LICENSE).
Inference-Client is backed by [Jina AI](https://jina.ai) and licensed under [Apache-2.0](./LICENSE).

0 comments on commit 29a95af

Please sign in to comment.