Workers AI
Use AI Gateway for analytics, caching, and security on requests to Workers AI.
REST API
To interact with a REST API, update the URL used for your request:
- Previous:
https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model_id}
- New:
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/{model_id}
For these parameters:
{account_id}
is your Cloudflare account ID.{gateway_id}
refers to the name of your existing AI Gateway.{model_id}
refers to the model ID of the Workers AI model.
Examples
First, generate an API token with Workers AI Read
access and use it in your request.
Request to Workers AI llama modelcurl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/@cf/meta/llama-3-8b-instruct \ --header 'Authorization: Bearer {cf_api_token}' \ --header 'Content-Type: application/json' \ --data '{"prompt": "What is Cloudflare?"}'
Request to Workers AI text classification modelcurl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/@cf/huggingface/distilbert-sst-2-int8 \ --header 'Authorization: Bearer {cf_api_token}' \ --header 'Content-Type: application/json' \ --data '{ "text": "Cloudflare docs are amazing!" }'
OpenAI compatible endpoints
Workers AI supports OpenAI compatible endpoints for text generation (/v1/chat/completions
) and text embedding models (/v1/embeddings
). This allows you to use the same code as you would for your OpenAI commands, but swap in Workers AI easily.
Request to OpenAI compatible endpointcurl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/v1/chat/completions \ --header 'Authorization: Bearer {cf_api_token}' \ --header 'Content-Type: application/json' \ --data '{ "model": "@cf/meta/llama-3-8b-instruct", "messages": [ { "role": "user", "content": "What is Cloudflare?" } ] }'
Worker
To include an AI Gateway within your Worker, add the gateway as an object in your Workers AI request.
export interface Env { AI: Ai;
}
export default { async fetch(request: Request, env: Env): Promise<Response> { const response = await env.AI.run( "@cf/meta/llama-3-8b-instruct", { prompt: "Why should you use Cloudflare for your AI inference?" }, { gateway: { id: "{gateway_id}", skipCache: false, cacheTtl: 3360 } } ); return new Response(JSON.stringify(response)); },
} satisfies ExportedHandler<Env>;
Workers AI supports the following parameters for AI gateways:
id
string
- Name of your existing AI Gateway. Must be in the same account as your Worker.
skipCache
boolean
- Controls whether the request should skip the cache.
cacheTtl
number
- Controls the Cache TTL.