An OpenAI-compatible proxy that optimizes every request. Smart routing, prompt compression, and semantic caching - automatic.
# Just change the base URL
client = OpenAI(
api_key="sk-promptly-...",
base_url="https://api.getpromptly.in/v1/"
)
# Same code. 60% lower costs.
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)Works with OpenAI, Anthropic, and Gemini. More providers coming soon.
60%
Up to cost reduction
<5ms
Added latency
99.9%
Uptime target
0
Code changes
Integration
Install our SDK or just swap the base URL. Your choice.
Step 01
Add your provider keys and get a Promptly key in seconds.
Step 02
pip install promptly-sdk - or just change base_url. One line either way.
Step 03
Every request is optimized automatically. View savings in your dashboard.
Optimization engine
Each strategy saves independently. Together they compound.
POST /v1/chat/completions
{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What is 2+2?"}
]
}
# Served by: gpt-4o-mini
# Cost: $0.002 instead of $0.03Not every prompt needs GPT-4o. Promptly classifies by complexity and routes simple queries to cheaper, faster models.
# Before: 847 tokens
"You are a helpful assistant that helps
users with questions. Please try to be
as helpful as possible and provide
detailed, comprehensive answers..."
# After: 312 tokens
"Helpful assistant. Detailed answers."
# Same meaning. 63% fewer tokens.Most prompts contain 40-60% redundant tokens. Promptly compresses without changing meaning.
# Request 1 (miss)
"What's the capital of France?"
β "Paris" (cost: $0.003)
# Request 2 (cache HIT)
"capital of france?"
β "Paris" (cost: $0.000)
# 1 of 2 requests = FREEWhen users ask similar questions, why pay twice? Intelligent caching detects similar requests and returns instant responses.
# 48-message conversation
# Conservative: keep last 20 messages
# Moderate: keep last 10 + dedup system
# Aggressive: keep last 6 + summarize
# Before: 4,200 tokens (full history)
# After: 980 tokens (pruned)
# Same answers. 77% fewer tokens.Long conversations waste tokens on stale context. Promptly trims old turns automatically.
Platform
See exactly where every dollar goes. Track savings across all your projects.
Costs, savings, latency, and volume over time.
Every request with model, tokens, cost, and latency.
Manage provider keys. Regenerate instantly.
Custom rules by model, prompt length, or keyword.
Toggle compression, context pruning, caching.
Hit rate, entries, memory usage, manual clearing.
Invite members, assign roles, manage access.
Spend, latency, error rate alerts via email.
OpenAI, Anthropic, Gemini. More coming soon.
You could build it yourself. Here's what that looks like.
| Capability | DIY | Promptly |
|---|---|---|
| Smart model routing | Weeks of eng | Automatic |
| Prompt compression | Build NLP pipeline | Toggle on/off |
| Semantic caching | Complex to build | Built-in |
| Multi-provider | N integrations | Add key, done |
| Analytics | Build from scratch | Included |
| Setup time | 2-6 months | 2 minutes |
| Cost | $50-200k+ eng time | Free to start |
Pricing
Free tier with no credit card. Upgrade only when you need more.
For individuals & side projects
No credit card required Β· we keep 50% of savings
For power users & builders
per month Β· we keep 40% of savings
Promptly saves ~$30. You keep $18 (60%). Promptly earns $12.
$30
Total saved
$18
You keep
$12
Promptly fee
Transparent proxy. No training. Encrypted in transit.
TLS 1.3. Encrypted at rest.
We never train on your data.
Full logs. Export anytime.
Configure TTL. Auto-purge.
βI was spending way too much on side-project API calls. Dropped Promptly in and the savings started immediately - didn't have to change a single line of my app code.β
Ashutosh Adukia
Software Engineer, Microsoft
βI prototype a lot of AI ideas and the costs add up fast. Promptly just quietly optimizes everything in the background. It's the kind of tool that should have existed from day one.β
Vijay Raisinghani
Product, Meta
βThe semantic caching is what sold me. My chatbot was making the same LLM calls over and over - Promptly eliminated most of them without any code changes.β
Aryan Pawar
AI Engineer, eBiz
Infrastructure
FastAPI + asyncio. Non-blocking.
Official SDK or drop-in base URL.
TTL, thresholds, and similarity controls.
Users wonβt notice the proxy.
3 providers. Add more in seconds.
Full SSE pass-through.
Join teams saving thousands every month. Setup in 2 minutes.