LlamaIndex
Optimize your RAG pipeline costs with Promptly
Integrate Promptly with LlamaIndex to reduce LLM costs in RAG applications. Especially effective for long-context queries where context pruning and caching deliver maximum savings.
Setup Guide
Install LlamaIndex
Install the LlamaIndex OpenAI LLM integration.
pip install llama-index-llms-openaiConfigure the LLM
Point LlamaIndex's OpenAI LLM to Promptly. All index queries, summarization, and agent calls are automatically optimized.
from llama_index.llms.openai import OpenAI
llm = OpenAI(
model="gpt-4o",
api_key="sk-promptly-...",
api_base="https://api.getpromptly.in/v1",
)Use in a RAG Pipeline
RAG queries with large retrieved contexts benefit the most from Promptly's optimization - context pruning trims irrelevant chunks, and semantic caching catches repeated queries.
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents, llm=llm)
query_engine = index.as_query_engine()
response = query_engine.query("What are the key findings?")Why Use Promptly with LlamaIndex?
- Massive savings on RAG queries (context pruning reduces large contexts)
- Semantic caching catches repeated knowledge base queries
- Smart routing sends simple extraction tasks to cheaper models
- Works with all LlamaIndex index types and query engines
- Per-query cost tracking in Promptly analytics
Start optimizing LlamaIndex costs
Sign up, grab your API key, and change your base URL. Under 2 minutes.
Other Integrations
Promptly Python SDK
Official Python SDK - the fastest way to get started
Promptly Node.js SDK
Official Node.js SDK - TypeScript-first, zero config
LangChain
Use Promptly as your LLM backend in LangChain
Vercel AI SDK
Optimize AI SDK streaming responses with Promptly
OpenAI Python SDK
Drop-in optimization for the official OpenAI Python library