Skip to content

Commit

Permalink
feat: use llama3.2 vision for image to text task
Browse files Browse the repository at this point in the history
  • Loading branch information
LeafYeeXYZ committed Dec 4, 2024
1 parent 9f1c228 commit a9520cf
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 5 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ A image creator based on **free** `Cloudflare AI` and `HuggingFace` APIs. Featur

You can use either `Fullstack` or `Client-Server` mode.

> You may need to initialize `Cloudflare AI` `llama3.2 11B vision` model before using `Image-to-Prompt` feature. See [here](https://developers.cloudflare.com/workers-ai/models/llama-3.2-11b-vision-instruct/#Input) for more information.
#### 1.1.1 Fullstack

Set following environment variables in `.env` file or `Vercel`.
Expand Down
4 changes: 2 additions & 2 deletions app/api/prompt/route.ts
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
export async function POST(req: Request): Promise<Response> {
try {
const { image } = await req.json()
const url = `https://api.cloudflare.com/client/v4/accounts/${process.env.CF_USER_ID}/ai/run/@cf/unum/uform-gen2-qwen-500m`
const url = `https://api.cloudflare.com/client/v4/accounts/${process.env.CF_USER_ID}/ai/run/@cf/meta/llama-3.2-11b-vision-instruct`
const body = {
image: image as number[],
max_tokens: 4096,
prompt: 'Generate a detailed description in a single paragraph for this image',
prompt: 'Analyze the given image and provide a detailed description. Include details about the main subject/people, background, colors, composition, and mood. Ensure the description is vivid and suitable for input into a text-to-image generation model.',
}
const response = await fetch(url, {
method: 'POST',
Expand Down
6 changes: 3 additions & 3 deletions app/components/Prompt.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ export default function Prompt() {
showUploadList={false}
accept='.jpg,.jpeg,.png'
beforeUpload={async (file) => {
const MAX_SIZE_MB = 2
const MAX_SIZE_MB = 5
try {
flushSync(() => setDisabled(true))
if (file.size > MAX_SIZE_MB * 1024 * 1024) {
Expand All @@ -96,7 +96,7 @@ export default function Prompt() {
const uint8array = new Uint8Array(await file.arrayBuffer())
let res: Response | undefined
if (process.env.NEXT_PUBLIC_WORKERS_SERVER) {
res = await fetch(`${process.env.NEXT_PUBLIC_WORKERS_SERVER}/painter/genprompt`, {
res = await fetch(`${process.env.NEXT_PUBLIC_WORKERS_SERVER}/painter/genprompt/v4`, {
method: 'POST',
body: JSON.stringify({ image: Array.from(uint8array) })
})
Expand All @@ -111,7 +111,7 @@ export default function Prompt() {
return false
}
const data = await res.json()
const prompt = data.result.description as string
const prompt = data.result.response as string
form.setFieldsValue({ prompt })
return false
} finally {
Expand Down

0 comments on commit a9520cf

Please sign in to comment.