🦙

LlamaIndex

Optimize your RAG pipeline costs with Promptly

Integrate Promptly with LlamaIndex to reduce LLM costs in RAG applications. Especially effective for long-context queries where context pruning and caching deliver maximum savings.


Setup Guide

1

Install LlamaIndex

Install the LlamaIndex OpenAI LLM integration.

bash
pip install llama-index-llms-openai
2

Configure the LLM

Point LlamaIndex's OpenAI LLM to Promptly. All index queries, summarization, and agent calls are automatically optimized.

python
from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="gpt-4o",
    api_key="sk-promptly-...",
    api_base="https://api.getpromptly.in/v1",
)
3

Use in a RAG Pipeline

RAG queries with large retrieved contexts benefit the most from Promptly's optimization - context pruning trims irrelevant chunks, and semantic caching catches repeated queries.

python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents, llm=llm)

query_engine = index.as_query_engine()
response = query_engine.query("What are the key findings?")

Why Use Promptly with LlamaIndex?

  • Massive savings on RAG queries (context pruning reduces large contexts)
  • Semantic caching catches repeated knowledge base queries
  • Smart routing sends simple extraction tasks to cheaper models
  • Works with all LlamaIndex index types and query engines
  • Per-query cost tracking in Promptly analytics

Start optimizing LlamaIndex costs

Sign up, grab your API key, and change your base URL. Under 2 minutes.