TokenFlow - AI API Gateway & Cost OptimizerTokenFlow - AI API Gateway & Cost Optimizer
AI API Gateway: semantic caching, smart routing, prompt compression. 8+ providers, 40-70% cost savings. Next.js 14 + Stripe.TokenFlow - AI API Gateway & Cost Optimizer
AI API Gateway: semantic caching, smart routing, prompt compression. 8+ providers, 40-70% cost sa...
Overview
### Cut Your AI API Costs by 40-70%
TokenFlow is a self-hosted reverse proxy that sits between your applications and AI providers, automatically optimizing token costs through **semantic caching, smart model routing, and prompt compression**.
### The Problem
AI API costs are spiraling out of control. Companies spend thousands monthly on OpenAI, Anthropic, and other providers — often paying full price for duplicate or similar requests.
### The Solution
TokenFlow acts as an intelligent middleware. Change one URL in your code, and every request is automatically:
- **Cached semantically** — similar prompts return cached responses (FREE)
- **Compressed** — redundant tokens removed before sending
- **Routed optimally** — simple tasks sent to cheaper models automatically
- **Tracked** — full cost analytics with budget alerts
Features
**8 AI Provider Support**
OpenAI, Anthropic Claude, Google Gemini, Mistral, DeepSeek, Groq, Cohere, and Ollama (local models).
**OpenAI-Compatible API**
Drop-in replacement. Just change your `base_url` — existing code works unchanged with Python, JavaScript, cURL, PHP, LangChain, Vercel AI SDK.
**Semantic Caching**
Not just exact matches — finds similar prompts using cosine similarity. Configurable threshold and TTL.
**Smart Model Routing**
Define rules: "If prompt < 500 tokens, use gpt-4o-mini" or "If prompt contains 'code', use deepseek-coder."
**Prompt Compression**
Automatically removes redundant whitespace, deduplicates content, and summarizes long conversation histories.
**Budget Protection**
Set daily/weekly/monthly spending limits with automatic model downgrade when exceeded.
**Real-Time Analytics Dashboard**
Live request feed, cost charts, savings breakdown, provider distribution, latency tracking.
**API Playground**
Test requests through the proxy and see caching, routing, and compression in action.
**Admin Panel**
User management, API key management, model configuration, system settings.
Requirements
- Node.js 18 or higher
- npm or yarn
- No database server needed (SQLite included)
- Optional: PostgreSQL for production
- Optional: AI provider API keys (OpenAI, Anthropic, etc.)
Instructions
npm install
npx prisma db push
npx tsx prisma/seed.ts
npm run dev
```
Open http://localhost:3000 — gateway is running! Change your AI SDK base_url to http://localhost:3000/api/v1
Other items by this author
| Category | Scripts & Code / NodeJS |
| First release | 24 March 2026 |
| Last update | 24 March 2026 |
| Files included | .css, .html, Javascript .js |
| Tags | SaaS, nextjs, self-hosted, ai gateway, api proxy, token optimizer, cost reduction, llm proxy, openai proxy, semantic cache, model routing, prompt compression, cost dashboard, budget alerts, api management |








